Ansible : Maintain Consistent Production Deployments using Bamboo and Ansible Tower
Before I start, I would like you guys should know the problem statement.
Problem Statement:
- By default Ansible will deploy to a group of servers and no failure is reported even if there are any unreachable hosts.
- By default Ansible will not report if a task is failing on a server(from a group) and proceed ahead with the deployment.
Current Toolsets Used:
- Bamboo: CI server and used for triggering Deployments on Ansible Tower.(we created custom Plugin in Bamboo to support Ansible Tower deployments)
- Ansible Tower: Ansible Tower is a web based application which acts as hub for all automation Task.
In my scenario, we are using Ansible to deploy on group of Windows/Linux servers. while doing the deployments there are groups of production servers and on top of them there is a load balancer to manage the traffic. As we are using Bamboo as CI server, we initiate the deployment which triggers an API on Ansible Tower and starts the deployment.
If Ansible finds a server unreachable, it will skip running all the tasks on that server and completes the deployment and at the end it will show a deployment summary of what all servers it deployed to and which servers the deployment done and what all tasks are failed. In Bamboo we will get a green status which represents a successful deployment. This makes the production servers inconsistent in application versions and which can cause lots of business issues.
How we can solve this problem ?
- The deployment should fail if there are any unreachable servers while doing deployments.
- The deployment should fail if any task fails on any of the servers specified in the group.
How we can achieve this with Ansible?
There are few configuration option present in Ansible. Let’s see how this will help us to achieve this.
MAX_FAIL_PERCENTAGE: By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed. In some situations, such as with the rolling updates described above, it may be desirable to abort the play when a certain threshold of failures have been reached.
- hosts: webservers
max_fail_percentage: 30
serial: 10
if more than 3 of the 10 servers in the group were to fail, the rest of the play would be aborted.
- max_fail_percentage cannot detect unreachable servers.
ANY_ERRORS_FATAL : This will abort the playbook for any failure. This also detects unreachable servers and marks the playbook failed.
Let’s how can we use the same in our Playbook. This is the default playbook and below is the output.
---
- hosts: webservers
max_fail_percentage: 0tasks:
- name: Copy File from one location to another
copy:
src: /tmp/file1
dest: /tmp/file2
remote_src: True- name: debug
debug:
msg: " Hey i still ran "
So, here I have created a simple playbook to show how we can solve the problems mentioned above. Let’s create a simple playbook and add the following content. Here we will try to solve one of the issue specified above( Playbook should abort if any task fails on any of the server specified in the group).
---
- hosts: webservers
max_fail_percentage: 0tasks:
- name: Copy File from one location to another
copy:
src: /tmp/file1
dest: /tmp/file2
remote_src: True- name: debug
debug:
msg: " Hey i still ran "
Here we have added “max_fail_percentage” and specified the percentage as 0 which means any failure will abort this playbook.
ansible-playbook -i hosts site.yml
Here if you notice, there are 1 unreachable server and still Ansible continues to run the playbook however, it fails the play as it identifies a task failure on one server because of “max_fail_percentage” defined as 0. This solves our one of the problem but we wanted to abort our deployment if there are any unreachable servers. To accomplish this lets modify our playbook with one more configuration as “any_errors_fatal” to True.
---
- hosts: webservers
any_errors_fatal: True
max_fail_percentage: 0tasks:
- name: Copy File from one location to another
copy:
src: /tmp/file1
dest: /tmp/file2
remote_src: True- name: debug
debug:
msg: " Hey i still ran "
So here we added “any_errors_fatal” to True and lets run our play.
ansible-playbook -i hosts site.yml
Here, it aborts as soon as it identifies unreachable servers and mark the playbook as failed. This way we have resolved all the problems stated above.
Now whenever we run the deployments from Bamboo, it triggers the Ansible Tower Deployment and fails if it identifies any of such conditions and this way we maintain consistent deployments for all application teams.
Hope this will help you guys and also if you have any better solution please comment.