Go to file

Julia Kreger 68fcbca4c4 ANSUPDATE-126 Backup/Restore /tftpboot

Backup and restore tftpboot as ironic does not recreate files
necessary for overcloud nodes to boot.

Change-Id: Ibdc8b41be480f9344e0ba014bb0017591c603257

2014-10-29 09:11:59 -04:00

doc/source

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

elements/tripleo-ansible

Add tripleo-ansible to /opt/stack

2014-09-16 00:55:40 -07:00

examples

Removing heat_facts because GPLv3

2014-08-21 13:22:54 -07:00

library/cloud

Move library under playbooks for simplicity

2014-08-28 16:15:28 -07:00

playbooks

ANSUPDATE-126 Backup/Restore /tftpboot

2014-10-29 09:11:59 -04:00

plugins/inventory

CORE-1913 Pass project-name to keystone

2014-10-14 20:57:28 -07:00

scripts

CORE-1165 Simple script to populate image IDs

2014-10-14 20:57:28 -07:00

tripleo_ansible

Minor cleanup for tox -epep8 to pass

2014-08-19 14:18:08 -04:00

.coveragerc

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

.gitignore

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

.gitreview

Move .gitreview to stackforge

2014-09-16 01:01:57 -07:00

.mailmap

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

.testr.conf

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

babel.cfg

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

CONTRIBUTING.rst

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

HACKING.rst

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

LICENSE

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

MANIFEST.in

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

openstack-common.conf

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

README.rst

ANSUPDATE-29 documenation improvement

2014-10-29 09:11:59 -04:00

requirements.txt

CORE-1858 - Glance Download Module

2014-10-29 09:11:59 -04:00

setup.cfg

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

setup.py

Minor cleanup for tox -epep8 to pass

2014-08-19 14:18:08 -04:00

test-requirements.txt

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

tox.ini

Create cookiecutter repo

2014-08-12 22:03:25 -07:00

Troubleshooting.rst

ANSUPDATE-29 documenation improvement

2014-10-29 09:11:59 -04:00

README.rst

Using Ansible to update images

This is a new approach to updating an in-place TripleO cloud with new images. We have chosen Ansible as it allows fine grained control of the work-flow without requiring one to write any idempotent bash or python. There are components that are bash or python scripts, and we are working hard not to replace the whole of TripleO with Ansible, but just the pieces that make updates more complicated than they need to be.

In general this update process works in the following manner:

Gather inventory and facts about the deployed cloud from Heat and Nova

Quiesce the cloud by shutting down all OpenStack services on appropriate nodes

Nova-Rebuild nodes using requested image ids

Disable os-collect-config polling of Heat

Push Metadata from Heat to rebuilt nodes using Ansible and manually trigger os-collect-config

Start OpenStack services

Installing Ansible

Please see the ansible element in tripleo-image-elements

The following patches are required for operation:

Add nova metadata for group (openstack/tripleo-heat-templates) -https://review.openstack.org/#/c/113358/2 - This heat template update labels instances such that the ansible tools can group the instances into groups to facilitate the updates.

Element to restore ssh keys from /mnt/state (openstack/tripleo-image-elements) -https://review.openstack.org/#/c/114360/ - This includes a new image element, named restore-ssh-host-keys, which is intended to restore host keys preserved by the ansible scripts after a reboot.

To make things simpler, you may want to add tripleo-ansible to /opt/stack on the seed and/or undercloud. We include elements/tripleo-ansible, which can be included in seed and undercloud image builds to allow the tripleo-ansible tools to be automatically deployed for use.

Pre-flight check

A playbook exists that can be used to check the controllers prior to the execution of the main playbook in order to quickly identify any issues in advance.

ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -u heat-admin playbooks/pre-flight_check.yml

Running the updates

You will want to set your environment variables to the appropriate values for the following: OS_AUTH_URL, OS_USERNAME, OS_PASSWORD, and OS_TENANT_NAME

source /root/stackrc

Your new images will need to be uploaded to glance, such that an instance can be booted from them, and the image ID will need to be provided to the playbook as an argument.

You can obtain the ID with the glance image-list command, and then set them to be passed into ansible as arguments.

glance image-list

It may be possible to infer the image IDs using the script "populate_image_vars". It will try to determine the latest image for each image class and set it as a group variable in inventory.

scripts/populate_image_vars

After it runs, inspect plugins/inventory/group_vars and if the data is what you expect, you can omit the image ids from the ansible command line below.

You will now want to utilize the image ID values observed in the previous step, and execute the ansible-playbook command with the appropriate values subsituted into place. Current variables for passing the image variables in are nova_compute_rebuild_image_id and controller_rebuild_image_id which are passed into the chained playbook.

ansible-playbook -vvvv -u heat-admin -i plugins/inventory/heat.py -e nova_compute_rebuild_image_id=1ae9fe6e-c0cc-4f62-8e2b-1d382b20fdcb -e controller_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e controllermgmt_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e swift_storage_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e vsa_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc playbooks/update_cloud.yml

If you have set the image ids in group vars:

ansible-playbook -vvvv -u heat-admin -i plugins/inventory/heat.py playbooks/update_cloud.yml

Below, we break down the above command so you can see what each part does:

-vvvv - Make Ansible very verbose.

-u heat-admin - Utilize the heat-admin user to connect to the remote machine.

-i plugins/inventory/heat.py - Sets the inventory plugin.

-e nova_compute_rebuild_image_id=1ae9fe6e-c0cc-4f62-8e2b-1d382b20fdcb - Sets the compute node image ID.

-e controller_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the controller node image ID.

-e controllermgmt_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the controllerMgmt node image ID.

-e swift_storage_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the swift storage node image ID.

-e vsa_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the vsa node image ID.

playbooks/update_cloud.yml is the path and file name to the ansible playbook that will be utilized.

Upon a successful completion, ansible will print a summary report:

PLAY RECAP ******************************************************************** 192.0.2.24 : ok=18 changed=9 unreachable=0 failed=0 192.0.2.25 : ok=19 changed=9 unreachable=0 failed=0 192.0.2.26 : ok=18 changed=8 unreachable=0 failed=0

Additionally:

As ansible utilizes SSH, you may encounter ssh key errors if the IP address has been re-used. The fact that SSH keys aren't preserved is a defect that is being addressed. In order to avoid problems while this defect is being fixed, you will want to set an environment variable of "ANSIBLE_HOST_KEY_CHECKING=False", example below.

ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -e controller_rebuild_image_id=4bee1a0a-2670-48e4-a3a4-17da6be795cb -e nova_compute_rebuild_image_id=bd20e098-0753-4dc8-8dba-2f739c01ee65 -u heat-admin playbooks/update_cloud.yml

Python, the language that ansible is written in, buffers IO output by default. This can be observed as long pauses between sudden bursts of log entries where multiple steps are observed, particullarlly when executed by Jenkins. This behavior can be disabled by passing setting the an environment variable of "PYTHONUNBUFFERED=1", examble below.

PYTHONUNBUFFERED=1 ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -e controller_rebuild_image_id=4bee1a0a-2670-48e4-a3a4-17da6be795cb -e nova_compute_rebuild_image_id=bd20e098-0753-4dc8-8dba-2f739c01ee65 -u heat-admin playbooks/update_cloud.yml

For more information about Ansible, please refer to the documentation at http://docs.ansible.com/

Failure Handling

Ansible has tunable options to abort the execution of a playbook upon encountering a failure.

The max_fail_percentage parameter allows users to define what percentage of nodes can fail before the playbook stops executing. This setting is pre-defined in the playbook file playbooks/update_cloud.yml. The default value is zero, which causes the playbook to abort execution if any node fails. You can read about this option at: http://docs.ansible.com/playbooks_delegation.html#maximum-failure-percentage

Additionally, it should be noted that the any_errors_fatal variable, when set to a value of True, will result in ansible aborting upon encountering any failures. This variable can be set by adding '-e any_errors_fatal=True' to the command line.

Additional Options

The plugins/inventory/group_vars/all file has the following options in order to tune behavior of the playbook execution. These options can be enabled by defining the variable name that they represent on the ansible comamnd line, or by uncommenting the appropriate line in the plugins/inventory/group-vars/all file.

force_rebuild - This option overrides the logic that prevents an instance from being rebuilt if the pre-existing image id maches the id being deployed. This may be useful for the purposes of testing. Example command line addition: -e force_rebuild=True

wait_for_hostkey - This option causes the playbook to wait for the SSH host keys to be restored. This options should only be used if the restore-ssh-host-keys element is built into the new image.

single_controller - This option is for when a single controller node is receiving an upgrade. It alters the logic so that mysql checks operate as if the mysql database cluster is being maintained online by other controller nodes during the upgrade. IF you are looking at this option due to an error indicating "Node appears to be the last node in a cluster" then consult Troubleshooting.rst.

ssh_timeout - This value, defaulted to 900 [seconds], is the maximum amount of time that the post-rebuild ssh connection test will wait for before proceeding.