
Adding details on how to resolve "node appears to be last node", directing users to try the pre-flight_check playbook, and putting a warning for users to stay away from single_controller if they encounter this issue. Change-Id: I22f86b703984d5dc44ad1e921549d1e21fb8f2ac
175 lines
8.7 KiB
ReStructuredText
175 lines
8.7 KiB
ReStructuredText
Using Ansible to update images
|
|
==============================
|
|
|
|
This is a new approach to updating an in-place TripleO cloud with new
|
|
images. We have chosen Ansible as it allows fine grained control of
|
|
the work-flow without requiring one to write any idempotent bash or
|
|
python. There are components that are bash or python scripts, and we are
|
|
working hard not to replace the whole of TripleO with Ansible, but just
|
|
the pieces that make updates more complicated than they need to be.
|
|
|
|
In general this update process works in the following manner:
|
|
|
|
* Gather inventory and facts about the deployed cloud from Heat and Nova
|
|
* Quiesce the cloud by shutting down all OpenStack services on
|
|
appropriate nodes
|
|
* Nova-Rebuild nodes using requested image ids
|
|
* Disable os-collect-config polling of Heat
|
|
* Push Metadata from Heat to rebuilt nodes using Ansible and manually
|
|
trigger os-collect-config
|
|
* Start OpenStack services
|
|
|
|
Installing Ansible
|
|
------------------
|
|
|
|
Please see the `ansible` element in `tripleo-image-elements`
|
|
|
|
The following patches are required for operation:
|
|
|
|
* Add nova metadata for group (openstack/tripleo-heat-templates) -
|
|
https://review.openstack.org/#/c/113358/2 - This heat template update
|
|
labels instances such that the ansible tools can group the instances
|
|
into groups to facilitate the updates.
|
|
* Element to restore ssh keys from
|
|
/mnt/state (openstack/tripleo-image-elements) -
|
|
https://review.openstack.org/#/c/114360/ - This includes a new image
|
|
element, named restore-ssh-host-keys, which is intended to restore host
|
|
keys preserved by the ansible scripts after a reboot.
|
|
|
|
To make things simpler, you may want to add tripleo-ansible to /opt/stack
|
|
on the seed and/or undercloud. We include elements/tripleo-ansible,
|
|
which can be included in seed and undercloud image builds to allow the
|
|
tripleo-ansible tools to be automatically deployed for use.
|
|
|
|
Pre-flight check
|
|
----------------
|
|
|
|
A playbook exists that can be used to check the controllers prior to the
|
|
execution of the main playbook in order to quickly identify any issues in
|
|
advance.
|
|
|
|
ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -u heat-admin playbooks/pre-flight_check.yml
|
|
|
|
Running the updates
|
|
-------------------
|
|
|
|
You will want to set your environment variables to the appropriate
|
|
values for the following: OS_AUTH_URL, OS_USERNAME, OS_PASSWORD, and
|
|
OS_TENANT_NAME
|
|
|
|
source /root/stackrc
|
|
|
|
Your new images will need to be uploaded to glance, such that an instance
|
|
can be booted from them, and the image ID will need to be provided to
|
|
the playbook as an argument.
|
|
|
|
You can obtain the ID with the `glance image-list` command, and then
|
|
set them to be passed into ansible as arguments.
|
|
|
|
glance image-list
|
|
|
|
It may be possible to infer the image IDs using the script
|
|
"populate_image_vars". It will try to determine the latest image for
|
|
each image class and set it as a group variable in inventory.
|
|
|
|
scripts/populate_image_vars
|
|
|
|
After it runs, inspect `plugins/inventory/group_vars` and if the data
|
|
is what you expect, you can omit the image ids from the ansible command
|
|
line below.
|
|
|
|
You will now want to utilize the image ID values observed in the previous
|
|
step, and execute the ansible-playbook command with the appropriate values
|
|
subsituted into place. Current variables for passing the image variables
|
|
in are nova_compute_rebuild_image_id and controller_rebuild_image_id
|
|
which are passed into the chained playbook.
|
|
|
|
ansible-playbook -vvvv -u heat-admin -i plugins/inventory/heat.py -e nova_compute_rebuild_image_id=1ae9fe6e-c0cc-4f62-8e2b-1d382b20fdcb -e controller_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e controllermgmt_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e swift_storage_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc -e vsa_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc playbooks/update_cloud.yml
|
|
|
|
If you have set the image ids in group vars:
|
|
|
|
ansible-playbook -vvvv -u heat-admin -i plugins/inventory/heat.py playbooks/update_cloud.yml
|
|
|
|
Below, we break down the above command so you can see what each part does:
|
|
|
|
* -vvvv - Make Ansible very verbose.
|
|
* -u heat-admin - Utilize the heat-admin user to connect to the remote machine.
|
|
* -i plugins/inventory/heat.py - Sets the inventory plugin.
|
|
* -e nova_compute_rebuild_image_id=1ae9fe6e-c0cc-4f62-8e2b-1d382b20fdcb - Sets the compute node image ID.
|
|
* -e controller_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the controller node image ID.
|
|
* -e controllermgmt_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the controllerMgmt node image ID.
|
|
* -e swift_storage_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the swift storage node image ID.
|
|
* -e vsa_rebuild_image_id=2432dd37-a072-463d-ab86-0861bb5f36cc - Sets the vsa node image ID.
|
|
* playbooks/update_cloud.yml is the path and file name to the ansible playbook that will be utilized.
|
|
|
|
Upon a successful completion, ansible will print a summary report:
|
|
|
|
PLAY RECAP ********************************************************************
|
|
192.0.2.24 : ok=18 changed=9 unreachable=0 failed=0
|
|
192.0.2.25 : ok=19 changed=9 unreachable=0 failed=0
|
|
192.0.2.26 : ok=18 changed=8 unreachable=0 failed=0
|
|
|
|
Additionally:
|
|
|
|
As ansible utilizes SSH, you may encounter ssh key errors if the IP
|
|
address has been re-used. The fact that SSH keys aren't preserved is a
|
|
defect that is being addressed. In order to avoid problems while this
|
|
defect is being fixed, you will want to set an environment variable of
|
|
"ANSIBLE_HOST_KEY_CHECKING=False", example below.
|
|
|
|
ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -e controller_rebuild_image_id=4bee1a0a-2670-48e4-a3a4-17da6be795cb -e nova_compute_rebuild_image_id=bd20e098-0753-4dc8-8dba-2f739c01ee65 -u heat-admin playbooks/update_cloud.yml
|
|
|
|
Python, the language that ansible is written in, buffers IO output by default.
|
|
This can be observed as long pauses between sudden bursts of log entries where
|
|
multiple steps are observed, particullarlly when executed by Jenkins. This
|
|
behavior can be disabled by passing setting the an environment variable of
|
|
"PYTHONUNBUFFERED=1", examble below.
|
|
|
|
PYTHONUNBUFFERED=1 ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -vvvv -M library/cloud -i plugins/inventory/heat.py -e controller_rebuild_image_id=4bee1a0a-2670-48e4-a3a4-17da6be795cb -e nova_compute_rebuild_image_id=bd20e098-0753-4dc8-8dba-2f739c01ee65 -u heat-admin playbooks/update_cloud.yml
|
|
|
|
For more information about Ansible, please refer to the documentation at http://docs.ansible.com/
|
|
|
|
Failure Handling
|
|
----------------
|
|
|
|
Ansible has tunable options to abort the execution of a playbook upon
|
|
encountering a failure.
|
|
|
|
The max_fail_percentage parameter allows users to define what percentage of
|
|
nodes can fail before the playbook stops executing. This setting is pre-defined
|
|
in the playbook file playbooks/update_cloud.yml. The default value is zero,
|
|
which causes the playbook to abort execution if any node fails. You can read
|
|
about this option at:
|
|
http://docs.ansible.com/playbooks_delegation.html#maximum-failure-percentage
|
|
|
|
Additionally, it should be noted that the any_errors_fatal variable, when
|
|
set to a value of True, will result in ansible aborting upon encountering
|
|
any failures. This variable can be set by adding '-e any_errors_fatal=True'
|
|
to the command line.
|
|
|
|
Additional Options
|
|
------------------
|
|
|
|
The plugins/inventory/group_vars/all file has the following options in order
|
|
to tune behavior of the playbook execution. These options can be enabled by
|
|
defining the variable name that they represent on the ansible comamnd line, or
|
|
by uncommenting the appropriate line in the plugins/inventory/group-vars/all
|
|
file.
|
|
|
|
* force_rebuild - This option overrides the logic that prevents an instance
|
|
from being rebuilt if the pre-existing image id maches the id being deployed.
|
|
This may be useful for the purposes of testing.
|
|
Example command line addition: -e force_rebuild=True
|
|
* wait_for_hostkey - This option causes the playbook to wait for the SSH host
|
|
keys to be restored. This options should only be used if the restore-ssh-host-keys
|
|
element is built into the new image.
|
|
* single_controller - This option is for when a single controller node is
|
|
receiving an upgrade. It alters the logic so that mysql checks operate
|
|
as if the mysql database cluster is being maintained online by other
|
|
controller nodes during the upgrade. *IF* you are looking at this option
|
|
due to an error indicating "Node appears to be the last node in a cluster"
|
|
then consult Troubleshooting.rst.
|
|
* ssh_timeout - This value, defaulted to 900 [seconds], is the maximum
|
|
amount of time that the post-rebuild ssh connection test will wait for
|
|
before proceeding.
|