Summary:
When using the serial keyword a single failed host aborts the entire playbook with the error "FATAL: all hosts have already failed -- aborting"
Removing the serial keyword and all hosts are evaluated and a single failure does not abort the playbook run.
Steps To Reproduce:
Create an inventory with a "test" group with three hosts, A, B, and C. Save the following playbook.
# test_fail.yml --- - hosts: host1:host2:host3 gather_facts: no tasks: - ping: - fail: when: inventory_hostname == 'host3' - hosts: host4:host5 gather_facts: no tasks: - ping: - hosts: host6 gather_facts: no tasks: - ping: - hosts: host7:host8 gather_facts: no tasks: - ping:
Expected Results:
This playbook should iterate through all three hosts, executing each action in sequence before moving on to the next host. A failure on one host should abort any further actions on that one host and move on to the next.
Actual Results:
The first failure that occurs aborts the entire playbook.
Run output:
$ ansible-playbook -i testinv testserial.yml
PLAY [test] *******************************************************************
GATHERING FACTS ***************************************************************
ok: [A]
ok: [C]
ok: [B]
TASK: [Who am I?] *************************************************************
ok: [C] => {
"msg": "C"
}
TASK: [Test action] ***********************************************************
changed: [C]
TASK: [Test failure] **********************************************************
skipping: [C]
TASK: [Next action] ***********************************************************
changed: [C]
TASK: [Who am I?] *************************************************************
ok: [B] => {
"msg": "B"
}
TASK: [Test action] ***********************************************************
changed: [B]
TASK: [Test failure] **********************************************************
failed: [B] => {"failed": true}
msg: Bad host that fails
FATAL: all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/home/slack/testserial.retry
A : ok=1 changed=0 unreachable=0 failed=0
B : ok=3 changed=1 unreachable=0 failed=1
C : ok=4 changed=2 unreachable=0 failed=0
When using
serial: 1
, you're telling the play to only operate on 1 host at a time. The working group failure percentage is based off of the serial count (see #4407), so when there is only 1 host in the group and it fails the failure % will always be 100%. To move past this error, you should increase the serial group size or use ignore_errors: yes
on tasks that are not fatal to the workflow of your playbook.
ignore_errors causes the playbook to continue executing tasks for that host, if we want to stop executing tasks for a host but continue the playbook with the next host it doesn't work.
No comments:
Post a Comment