openstack-ansible-ops

Author	SHA1	Message	Date
Antony Messerli	ad1a4bc9ef	Force filesystem type on swift format It currently seems to think that /dev/vmvg00/disk1 is used for btrfs, so force this operation to ensure it's changed to xfs. Change-Id: I0bcc9723fb33b557315422c3259a7ba2b75ceff6	2018-09-05 13:13:22 -05:00
Jesse Pretorius	868a559840	MNAIO: Only run systemd daemon_reload when necessary When the VM's are Ubuntu Trusty, this task causes total failure. We should only try and do the daemon_reload if the system being used supports it. Change-Id: I557856045a7735c8f351df6350f777caae526b10	2018-09-04 19:23:42 +01:00
Jesse Pretorius	4e9c1c5fd8	MNAIO: Make the galera image prep more robust Unfortunately guestfish may error out silently (no return code of 1), making hunting down the error a bit obscure. To combat this we add a bunch of stdout output to the script, and look for that final step to validate success. To make this work, we need to copy the script over and execute it with the command module, because the script module puts everything into stderr. Change-Id: I8e514ceb2462870721745c9445ec149864a45f4d	2018-09-04 19:19:50 +01:00
Jesse Pretorius	e7387a6baa	MNAIO: Make galera startup cleanly when using images In an ideal state, if the galera containers are shut down cleanly, they will leave behind a gvwstate.dat file on each node which provides the cluster member details so that it can automatically start up again without intervention. However, when imaging the MNAIO systems we only interact with the hosts, so the galera containers sometimes do no shut down cleanly. To cater for this, we inspect the disk images for the primary component, then build the gvwstate.dat file for the other galera containers. With those put back into the image, when the VM's start, the cluster forms immediately. References: http://galeracluster.com/documentation-webpages/pcrecovery.html http://galeracluster.com/documentation-webpages/restartingcluster.html Change-Id: Icfe067607baefd661147f3c22ce846f06fff7c60	2018-09-01 19:20:38 +00:00
Jesse Pretorius	0e8423d536	MNAIO: Make post-install storage provisioning idempotent The swift and cinder hosts do not use containers for services, so there is no need to do the current process of shrinking the volumes. Instead, we ensure that the lxc & machines mounts are removed, with their respective logical volumes. When setting up the swift logical volumes, we do not need to create the mount point directories, because the mount task will do that for us. As such, we remove that task. Change-Id: Ibbe6d0fede6b6965415e421161354e311708d113	2018-08-29 17:12:43 +01:00
Jesse Pretorius	7f39e408e3	MNAIO: Inject the host ssh public key into the image To allow a downloaded set of file-backed images to be used on another host, the new host's public ssh key needs to be injected into the VM disks so that ansible is able to connect to it and complete the rest of the preparation. Change-Id: I6b9b5efb88283417c15f74f40cfb91943bb8774d	2018-08-29 13:34:42 +01:00
Jesse Pretorius	934a3c2651	MNAIO: Default vm_use_snapshot is group_vars Rather than have to default it in tasks all over the place, we default it in group_vars. The default is to enable the feature if file-backed VM's are used. However, if there are no base images available, the set_fact task disables it. If a user wishes to force it not to be used, then an extra-var override is still usable. Change-Id: I5c916244a02a44da831d2a0fefd8e8aafae829b2	2018-08-29 13:23:40 +01:00
Jesse Pretorius	42189e272f	MNAIO: Use discard option for all mount points Using the discard option for all mount points ensures that the deletes actually release the blocks on the disk. This ensures that SSD performance is optimised and that file-backed images are kept as small as possible. Change-Id: I648cbaca56d75e355cf6c8af01e2e3ad20dfc398	2018-08-23 18:57:02 +01:00
Jesse Pretorius	5ce798b360	MNAIO: Use images subdirectory for VM images Instead of putting the images in the root of the disk, we use a subdirectory. This prevents silly mistakes from happening. Change-Id: I19d22b7e72de88736db410a771ec22664c641c94	2018-08-21 18:30:28 +01:00
Jesse Pretorius	9488e76bf1	MNAIO: Ensure vm_use_snapshot is defined When not using a file-backed backing store, the vriable is not defined and results in an error to that effect. Change-Id: I3142a5960bc4521f79bbdfe32b0e7a0f71742b7d	2018-08-17 10:27:26 +01:00
Jesse Pretorius	993bac94f5	MNAIO: Extend image saving to include manifest In order to more successfully reproduce an environment using saved images, we include the VM XML definition files and the output from 'pip freeze'. We capture the list of files, their checksums and the SHA for the git repo into a json manifest file. Change-Id: Ia0bf74d509b4acb10b0dd832a4cfe1bb2afb2503	2018-08-15 19:25:12 +01:00
Jesse Pretorius	484059205a	MNAIO: Enable using a data disk for file-backed VM's In order to make use of a data disk, we enable the 'file' implementation of default_vm_disk_mode to use a data disk much like the 'lvm' implementation. To simplify changing from the default_vm_disk_mode of lvm to file and back again, the setup-host playbook will remove any previous implementation and replace it. This is useful when doing testing for these different modes because it does not require cleaning up by hand. This patch also fixes the implementation of the virt storage pool. Currently the tasks only execute if 'virt_data_volume.pools is not defined', but it is always defined so the tasks never execute. We now ensure that for both backing stores the 'default' storage pool is defined, started and set to auto start (as three tasks because the virt_pool module sucks really bad and can only do one thing at a time). The pool implementation for the 'file' backed VM's uses the largest data disk it can find and creates the /data mount for it. To cater for a different configuration, we ensure that all references to the disk files use the path that is configured in the pool ,rather than assuming the path. Change-Id: If7e7e37df4d7c0ebe9d003e5b5b97811d41eff22	2018-08-15 16:58:50 +01:00
Jesse Pretorius	4a48a6874d	Optimise vm_disk_mode conditionals There is already a default in group_vars/all, so we do not need to provide a default in every conditional. Also, we move several LVM data volume tasks into a block given they have a common set of conditions. Change-Id: Iff0fafefda2bc5dc1596b7198b779f5da763086c	2018-08-15 08:49:23 +00:00
Antony Messerli	ef560373e3	Undefine existing VM configurations during rebuild Allows for configuration changes to be redeployed on a rebuild where previously it didn't attempt to update the VMs configuration. ` Change-Id: If14dbdfe7ba3e69a50127fa724ad3f2a8ed58c1a	2018-07-23 13:59:27 -05:00
Corey Wright	bdeeb39e42	mnaio: Correct LVM terminology in task names Change-Id: I23f1a245f30f45dc66b6f2ae0ff2ee5aab147dd8 Signed-off-by: Corey Wright <corey.wright@rackspace.com>	2018-07-16 21:15:39 -05:00
Corey Wright	f21bc66671	mnaio: Only resize Swift & Cinder machines00 LV when using nspawn Commit 875fa96f / change-id Ief0040f6 unintentionally tries to enlarge the "machines00" LV when LXC is the default container technology which fails due to the Debian automated installation having assigned all the space within the associated "vmvg00" VG. As the intention of the aforementioned commit was to apply when systemd-nspawn was used, codify that explicitly in a `when:` condition on the problematic Ansible task. Change-Id: I56ec1290d71d0d09db447e347d7d55432d9b81c6 Signed-off-by: Corey Wright <corey.wright@rackspace.com> Closes-Bug: #1781823	2018-07-15 15:46:56 -05:00
Antony Messerli	875fa96fb8	Make space for Cinder/Swift nodes if using nspawn Currently if CONTAINER_TECH=nspawn is uses, Cinder and Swift are unable to create volumes as space is fully allocated for machines volume. This shrinks machine00 mount to 8192 to make space for Cinder and Swift volumes when using nspawn for the container tech. Change-Id: Ief0040f638f0d3570557ac76fd5e0a8aee80df8d	2018-07-11 15:54:09 -05:00
Dave Wilde	482e845d92	Improve multi-node AIO robustness In order to improve the readability and robustness of the mnaio feature I have replaced the shell out to virsh tasks to use the virt module where available. I have also created a vm-status play that will hopefully help resolve SSH failures into the VMs. This play utilizes the block/rescue/handler pattern to attempt to restart the VM once if it fails the initial SSH check. Hopefully this will reduce the SSH failures due to a suck VM. This adds a new variable called vm_ssh_timeout which allows the deployer an easy place to override the default timeout. The python-lxml package is needed for the virt module. Change-Id: I027556b71a8c26d08a56b4ffa56b2eeaf1cbabe9	2018-06-29 10:12:16 -05:00
Jesse Pretorius	0cd5c1704f	MNAIO: Check capabilities only once The capabilities check is done on the host, so it only needs to be executed once, not once for every VM on the host. This patch eliminates the duplicated checking. Change-Id: I2bc7ebbe699e5ace82c1bcbdfd8e917661054fef	2018-06-26 11:54:16 +01:00
Jesse Pretorius	329aa472f2	MNAIO: Enable saving and re-using file-based backing images Being able to save the images and re-use them on other hosts is extremely useful to cut down deployment time. This patch allows an MNAIO setup to be setup using a file-based backing store, then have those saved and re-used on the same host or on other hosts. Change-Id: I491d04fb94352e37312891a9b9bd58093fdd00cf	2018-06-26 11:54:16 +01:00
Jesse Pretorius	bc2ced27c2	MNAIO: Ensure a consistent and readable style This patch implements the following style changes: 1. The 'environment' argument is placed in the same location for all plays, making sure it's easier to find. 2. The play tags are located in the same place, also making sure they're easier to find. 3. The line breaks between tasks and plays are set to be consistently 1 between tasks and 2 between plays. 4. Given that there are no roles being used, the use of pre/post tasks is converted to only using tasks. Change-Id: I2e22c8360d65256b8e44ca1e310e0668a651196d	2018-06-26 11:54:16 +01:00
Zuul	fade8fa287	Merge "Increase VM SSH timeouts"	2018-06-15 21:09:04 +00:00
Jesse Pretorius	80b4efab8a	MNAIO: Use file instead of command for disk image clean-up Rather than use the command module, we use the file module which is idempotent. We also remove the conditional for the clean up so that it's easier to switch between back-ends and still have the clean-up happen. Change-Id: I5d77bbd236d1cb7d45bc4dda4206475b5663b1c0	2018-06-15 16:48:22 +01:00
Dave Wilde	232be485cb	Increase VM SSH timeouts Occasionally the VM check times out, this relaxes the checks a bit to decrease failures. Change-Id: Ic327efb7a86b20bc3f97f3ced2fe3fc54b93d347	2018-06-15 10:35:14 -05:00
Bjoern Teipel	d625057b7a	Adding mnaio qemu file backends mnaio uses LVM as default VM disk backend but that requires a lot of available disk space. A alternative option, the qcow2 file based backend is added to take benefit of thin provisioning. This backend can be triggered by setting the override `default_vm_disk_mode` to `file` Change-Id: Iaf97fef601f656901b6913eaafb9a6c28d4b7ba6	2018-03-23 21:28:38 -05:00
Shannon Mitchell	4cbb0d8b98	Fix Ssh connection issues on openstack-ansible-ops mnaio builds We normally see ssh connection issues during the lxc container setup portion of OSA builds. Most people usually end up tweaking ansible ssh pipeline and retry settings or nerfing the build via ansible fork lowering to work around it. This is an old issue that we normally put a more permanent fix in our physical environments by setting the ssh maxsessions and maxstartups. On the mnaio builds I have been working around this by stopping the build before deployment and making the changes in a script. Change-Id: I54c223e1fb9edf6947bc7f76ff689bad22456420 Closes-Bug: 1752914	2018-03-02 09:54:34 -06:00
Antony Messerli	33c3fbdfe7	Stop any running VMs when re-deploying Stops running VMs when doing a deploy to ensure the VMs start fresh and can reload their config during deploy. Also removes LVs to force a redeploy of the VMs. Change-Id: I7992e25f4e0e103ae66487f2e88a99ca962a9355	2018-02-07 13:19:04 -06:00
Antony Messerli	4536c0a9ef	Abort run if VMs aren't up before timeout Occasionally the VM install will exceed the timeout if it doesn't fire correctly. Instead of treating the host as down and continuing on with the others, fail early. Change-Id: I543d8e354a5357f7059fe82497edb9b7e3a22097	2017-11-02 11:49:20 -05:00
Matt Thompson	ce29ea23d1	Updates for Trusty VMs Currently, attempting to use Trusty (14.04) VMs causes VMs to not provision correctly due to a grub-install error. With respect to this specific issue, this commit updates vm.preseed.j2 by removing some grub-installer options which were not present before the ansible rewrite. Secondly, with that change in place, VMs do not come online on their 10.0.236 addresses as something is overwriting /etc/networking/interfaces, which wipes out the source of the /etc/network/interfaces.d directory. Bug [1] seems to indicate this is in fact an issue and has been resolved, however attempts at using this preseed option (netcfg/target_network_config) were not successful. As a workaround, we simply chattr +i the interfaces file in vm-post-install-script.sh.j2, and then remove the attr in deploy-vms.yml when the instance is up an accessible. [1] https://bugs.launchpad.net/ubuntu/+source/netcfg/+bug/1361902 Change-Id: I12d0c5108d1df0ab02b69d1b8cdb271a02999602	2017-09-26 08:52:38 -04:00
Matt Thompson	28684e6c6e	Add missing features to multi-node-aio The multi-node-aio update that moved the provisioning from bash to ansible dropped a few features that we use for gating purposes. This commit re-adds the following: 1. The ability to drop iptables rules to do port redirection from the host to private IPs. This is controlled by CONFIG_PREROUTING and the ansible variable mnaio_host_iptables_prerouting_ports. 2. /etc/hosts on the physical node is now updated w/ the hostname and IP of each VM so we can access VMs by name. NOTE: With #1, we redirect to the VM's DHCP address, and not it's management address. The latter seemed to the desired address but didn't work, which is why we've resorted to DHCP. If using this address is incorrect please note so we can investigate further. Change-Id: Ib194c314280f2474a2e4dac6d0feba44b1ee696f	2017-09-13 11:47:25 -04:00
Matt Thompson	815ac51249	Wait for guest capabilities to appear Deploying the multi-node-aio from master on a machine running Ubuntu 14.04 fails frequently as libvirt doesn't think it has the hvm OS type. I was able to manually run "virsh capabilities" shortly after libvirt was installed and sure enough it didn't list any guest capabilities. Subsequent runs of "virsh capabilities" then returned the <guest> XML element w/ <os_type>hvm</os_type> defined. This commit simply adds a task that checks "virsh capabilities", retrying up to 6 times if the <guest> element is not present. From my limit testing this seems sufficient to ensure that the domains are defined and created successfully. Lastly, we add a task to create /etc/libvirt/storage which is expected to exist, but doesn't on a 14.04 deployment. Change-Id: I158987270b71d3781e91d819fdcb02da736f3c1d	2017-09-06 15:42:51 -04:00
Kevin Carter	c678e83275	General improvements * added ip alias for interfaces * Update settings and improve vm performance * Change the VG name in VMs. The VG name was changed so that the volume which is being used by VMs can be mounted on a physical host, and not conflict, with standard volume group naming. This is usful when a VM is DOA and a deployer wants to disect the instance. Change-Id: If4d10165fe08f82400772ca88f8490b01bad5cf8 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-08-14 10:35:12 -05:00
Kevin Carter	369f68832e	Add environment options and re-flow the README.rst Change-Id: I7a2640856045e36043de8508f9421fbd8a593591 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-08-01 09:13:15 -05:00
Kevin Carter	cfc76ded4a	Convert vars in files to host_vars This change allows the MNAIO to really be used as a stand alone kick system which has the potential to be developed into a stand alone project. At the very least this change improves playbook performance by scoping variables. The inventory has been converted into a typical Ansible inventory and the "servers" used in the MNAIO are now simply host_vars which will trigger specific VM builds when instructed to do so. This gives the MNAIO the ability to serve as a stand alone kick system which could be used for physical hosts as well as MNAIO testing all through the same basic set of playbooks. Should a deployer want to use this with physical servers they'd need to do nothing more than define their basic inventory and where the the required pieces of infrastructure needed to PXE boot their machines. Change-Id: I6c47e02ecfbe8ee7533e77b11041785db485a1a9 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-07-31 23:31:13 -05:00
Kevin Carter	9abaeba8c8	move deploy node to infra1 and make a LB node Sadly the log node does not have enough ram to run a full ansible run. Ansible 2.x requires more ram than one would expect, especially when the inventory gets large. this change moves the deploy node to infra1 as it will already have the ram needed to run the playbooks. Additionally the container storage for infra nodes was too small which forces builds into error. The default storage for VMs has been set to 90GiB each and the preseed will create a logical volume for VMs mounted at /var/lib/lxc. While the limited ram works well for the VMs and within a running deployment of OSA, ansible-playbook is subject to crash like so: An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory fatal: [infra1_cinder_api_container-b38b47ea]: FAILED! => {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""} So infra nodes have had the memory constraint raised to 8GiB Change-Id: I7175ea92f663dfef5966532cfc0b4beaadb9eb03 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-07-29 15:52:36 -05:00
Kevin Carter	c3800224b0	move drive layout to deploy-vms and fix deploy-osa tags/tasks Change-Id: Iaee4c3683d798320099dec77286e6fac7a10bee8 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-07-28 14:49:27 -05:00
Kevin Carter	a94f0a9026	Combine our two multi-node-aio processes into one The original mnaio was built using a lot of bash and was tailored specifically for ubuntu 14.04. The new mnaio was built using a mix of bash and ansible and was tailored specifically for ubuntu 16.04. This patch takes the two code bases and combines the best things from each method and wraps it up into a single code path all written using ansible playbooks and basic variables. While underlying system has changed the bash environment variable syntax for overrides remains the same. This allows users to continue with what has become their normal work-flow while leveraging the new structure and capabilities. High level overview: * The general performance of the VMs running within the MNAIO will now be a lot better. Before the VMs were built within QCOW2 containers, while this was flexible and portable it was slower. The new capabilities will use RAW logical volumes and native IO. * New repo management starts with preseeds and allows the user to pin to specific repositories without having to worry about flipping them post build. * CPU overhead will be a lot less. The old VM system used an un-reasonable number of processors per VM which directly translated to sockets. The new system will use cores and a single socket allowing for generally better VM performance with a lot less overhead and resource contention on the host. * Memory consumption has been greatly reduced. Each VM is now following the memory restrictions we'd find in the gate, as a MAX. Most of the VMs are using 1 - 2 GiB of RAM which should be more than enough for our purposes. Overall the deployment process is simpler and more flexible and will work on both trusty and xenial out of the box with the hope to bring centos7 and suse into the fold some time in the future. Change-Id: Idc8924452c481b08fd3b9362efa32d10d1b8f707 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2017-07-28 15:35:23 +00:00

37 Commits