It currently seems to think that /dev/vmvg00/disk1
is used for btrfs, so force this operation to
ensure it's changed to xfs.
Change-Id: I0bcc9723fb33b557315422c3259a7ba2b75ceff6
When the VM's are Ubuntu Trusty, this task causes total failure.
We should only try and do the daemon_reload if the system being
used supports it.
Change-Id: I557856045a7735c8f351df6350f777caae526b10
Unfortunately guestfish may error out silently (no return code
of 1), making hunting down the error a bit obscure. To combat
this we add a bunch of stdout output to the script, and look
for that final step to validate success. To make this work, we
need to copy the script over and execute it with the command
module, because the script module puts everything into stderr.
Change-Id: I8e514ceb2462870721745c9445ec149864a45f4d
In an ideal state, if the galera containers are shut down
cleanly, they will leave behind a gvwstate.dat file on each
node which provides the cluster member details so that it
can automatically start up again without intervention.
However, when imaging the MNAIO systems we only interact
with the hosts, so the galera containers sometimes do no
shut down cleanly.
To cater for this, we inspect the disk images for the
primary component, then build the gvwstate.dat file for
the other galera containers. With those put back into the
image, when the VM's start, the cluster forms immediately.
References:
http://galeracluster.com/documentation-webpages/pcrecovery.htmlhttp://galeracluster.com/documentation-webpages/restartingcluster.html
Change-Id: Icfe067607baefd661147f3c22ce846f06fff7c60
The swift and cinder hosts do not use containers for services,
so there is no need to do the current process of shrinking the
volumes. Instead, we ensure that the lxc & machines mounts are
removed, with their respective logical volumes.
When setting up the swift logical volumes, we do not need to
create the mount point directories, because the mount task will
do that for us. As such, we remove that task.
Change-Id: Ibbe6d0fede6b6965415e421161354e311708d113
To allow a downloaded set of file-backed images to be used on
another host, the new host's public ssh key needs to be injected
into the VM disks so that ansible is able to connect to it and
complete the rest of the preparation.
Change-Id: I6b9b5efb88283417c15f74f40cfb91943bb8774d
Rather than have to default it in tasks all over the
place, we default it in group_vars. The default is to
enable the feature if file-backed VM's are used.
However, if there are no base images available, the
set_fact task disables it. If a user wishes to force
it not to be used, then an extra-var override is still
usable.
Change-Id: I5c916244a02a44da831d2a0fefd8e8aafae829b2
Using the discard option for all mount points ensures
that the deletes actually release the blocks on the disk.
This ensures that SSD performance is optimised and that
file-backed images are kept as small as possible.
Change-Id: I648cbaca56d75e355cf6c8af01e2e3ad20dfc398
Instead of putting the images in the root of the disk,
we use a subdirectory. This prevents silly mistakes
from happening.
Change-Id: I19d22b7e72de88736db410a771ec22664c641c94
When not using a file-backed backing store, the vriable is not
defined and results in an error to that effect.
Change-Id: I3142a5960bc4521f79bbdfe32b0e7a0f71742b7d
In order to more successfully reproduce an environment using
saved images, we include the VM XML definition files and the
output from 'pip freeze'. We capture the list of files, their
checksums and the SHA for the git repo into a json manifest
file.
Change-Id: Ia0bf74d509b4acb10b0dd832a4cfe1bb2afb2503
In order to make use of a data disk, we enable the 'file'
implementation of default_vm_disk_mode to use a data disk
much like the 'lvm' implementation.
To simplify changing from the default_vm_disk_mode of lvm
to file and back again, the setup-host playbook will remove
any previous implementation and replace it. This is useful
when doing testing for these different modes because it
does not require cleaning up by hand.
This patch also fixes the implementation of the virt
storage pool. Currently the tasks only execute if
'virt_data_volume.pools is not defined', but it is always
defined so the tasks never execute. We now ensure that
for both backing stores the 'default' storage pool is
defined, started and set to auto start (as three tasks
because the virt_pool module sucks really bad and can only
do one thing at a time).
The pool implementation for the 'file' backed VM's uses
the largest data disk it can find and creates the /data
mount for it. To cater for a different configuration, we
ensure that all references to the disk files use the path
that is configured in the pool ,rather than assuming the
path.
Change-Id: If7e7e37df4d7c0ebe9d003e5b5b97811d41eff22
There is already a default in group_vars/all, so we do not need
to provide a default in every conditional.
Also, we move several LVM data volume tasks into a block given
they have a common set of conditions.
Change-Id: Iff0fafefda2bc5dc1596b7198b779f5da763086c
Allows for configuration changes to be redeployed on a
rebuild where previously it didn't attempt to update
the VMs configuration.
`
Change-Id: If14dbdfe7ba3e69a50127fa724ad3f2a8ed58c1a
Commit 875fa96f / change-id Ief0040f6 unintentionally tries to enlarge
the "machines00" LV when LXC is the default container technology which
fails due to the Debian automated installation having assigned all the
space within the associated "vmvg00" VG.
As the intention of the aforementioned commit was to apply when
systemd-nspawn was used, codify that explicitly in a `when:` condition
on the problematic Ansible task.
Change-Id: I56ec1290d71d0d09db447e347d7d55432d9b81c6
Signed-off-by: Corey Wright <corey.wright@rackspace.com>
Closes-Bug: #1781823
Currently if CONTAINER_TECH=nspawn is uses, Cinder and Swift
are unable to create volumes as space is fully allocated for
machines volume.
This shrinks machine00 mount to 8192 to make space for Cinder
and Swift volumes when using nspawn for the container tech.
Change-Id: Ief0040f638f0d3570557ac76fd5e0a8aee80df8d
In order to improve the readability and robustness of the mnaio feature
I have replaced the shell out to virsh tasks to use the virt module
where available. I have also created a vm-status play that will
hopefully help resolve SSH failures into the VMs. This play utilizes
the block/rescue/handler pattern to attempt to restart the VM once if
it fails the initial SSH check. Hopefully this will reduce the SSH
failures due to a suck VM. This adds a new variable called
vm_ssh_timeout which allows the deployer an easy place to override the
default timeout. The python-lxml package is needed for the virt module.
Change-Id: I027556b71a8c26d08a56b4ffa56b2eeaf1cbabe9
The capabilities check is done on the host, so it only needs
to be executed once, not once for every VM on the host. This
patch eliminates the duplicated checking.
Change-Id: I2bc7ebbe699e5ace82c1bcbdfd8e917661054fef
Being able to save the images and re-use them on other hosts
is extremely useful to cut down deployment time. This patch
allows an MNAIO setup to be setup using a file-based backing
store, then have those saved and re-used on the same host or
on other hosts.
Change-Id: I491d04fb94352e37312891a9b9bd58093fdd00cf
This patch implements the following style changes:
1. The 'environment' argument is placed in the same
location for all plays, making sure it's easier
to find.
2. The play tags are located in the same place, also
making sure they're easier to find.
3. The line breaks between tasks and plays are set
to be consistently 1 between tasks and 2 between
plays.
4. Given that there are no roles being used, the use
of pre/post tasks is converted to only using tasks.
Change-Id: I2e22c8360d65256b8e44ca1e310e0668a651196d
Rather than use the command module, we use the file module
which is idempotent. We also remove the conditional for the
clean up so that it's easier to switch between back-ends
and still have the clean-up happen.
Change-Id: I5d77bbd236d1cb7d45bc4dda4206475b5663b1c0
mnaio uses LVM as default VM disk backend but that
requires a lot of available disk space.
A alternative option, the qcow2 file based backend
is added to take benefit of thin provisioning.
This backend can be triggered by setting the override
`default_vm_disk_mode` to `file`
Change-Id: Iaf97fef601f656901b6913eaafb9a6c28d4b7ba6
We normally see ssh connection issues during the lxc container setup
portion of OSA builds. Most people usually end up tweaking ansible ssh
pipeline and retry settings or nerfing the build via ansible fork lowering
to work around it. This is an old issue that we normally put a more
permanent fix in our physical environments by setting the ssh maxsessions
and maxstartups. On the mnaio builds I have been working around this by
stopping the build before deployment and making the changes in a script.
Change-Id: I54c223e1fb9edf6947bc7f76ff689bad22456420
Closes-Bug: 1752914
Stops running VMs when doing a deploy to ensure
the VMs start fresh and can reload their config
during deploy.
Also removes LVs to force a redeploy of the VMs.
Change-Id: I7992e25f4e0e103ae66487f2e88a99ca962a9355
Occasionally the VM install will exceed the timeout
if it doesn't fire correctly. Instead of treating
the host as down and continuing on with the others,
fail early.
Change-Id: I543d8e354a5357f7059fe82497edb9b7e3a22097
Currently, attempting to use Trusty (14.04) VMs causes VMs to not
provision correctly due to a grub-install error. With respect to this
specific issue, this commit updates vm.preseed.j2 by removing some
grub-installer options which were not present before the ansible
rewrite.
Secondly, with that change in place, VMs do not come online on their
10.0.236 addresses as something is overwriting
/etc/networking/interfaces, which wipes out the source of the
/etc/network/interfaces.d directory. Bug [1] seems to indicate this
is in fact an issue and has been resolved, however attempts at using
this preseed option (netcfg/target_network_config) were not successful.
As a workaround, we simply chattr +i the interfaces file in
vm-post-install-script.sh.j2, and then remove the attr in
deploy-vms.yml when the instance is up an accessible.
[1] https://bugs.launchpad.net/ubuntu/+source/netcfg/+bug/1361902
Change-Id: I12d0c5108d1df0ab02b69d1b8cdb271a02999602
The multi-node-aio update that moved the provisioning from bash to
ansible dropped a few features that we use for gating purposes. This
commit re-adds the following:
1. The ability to drop iptables rules to do port redirection from the
host to private IPs. This is controlled by CONFIG_PREROUTING and
the ansible variable mnaio_host_iptables_prerouting_ports.
2. /etc/hosts on the physical node is now updated w/ the hostname and
IP of each VM so we can access VMs by name.
NOTE: With #1, we redirect to the VM's DHCP address, and not it's
management address. The latter seemed to the desired address
but didn't work, which is why we've resorted to DHCP. If using
this address is incorrect please note so we can investigate
further.
Change-Id: Ib194c314280f2474a2e4dac6d0feba44b1ee696f
Deploying the multi-node-aio from master on a machine running Ubuntu
14.04 fails frequently as libvirt doesn't think it has the hvm
OS type. I was able to manually run "virsh capabilities" shortly after
libvirt was installed and sure enough it didn't list any guest
capabilities. Subsequent runs of "virsh capabilities" then returned
the <guest> XML element w/ <os_type>hvm</os_type> defined.
This commit simply adds a task that checks "virsh capabilities",
retrying up to 6 times if the <guest> element is not present. From my
limit testing this seems sufficient to ensure that the domains are
defined and created successfully.
Lastly, we add a task to create /etc/libvirt/storage which is expected
to exist, but doesn't on a 14.04 deployment.
Change-Id: I158987270b71d3781e91d819fdcb02da736f3c1d
* added ip alias for interfaces
* Update settings and improve vm performance
* Change the VG name in VMs. The VG name was changed so that the volume
which is being used by VMs can be mounted on a physical host, and not
conflict, with standard volume group naming. This is usful when a VM
is DOA and a deployer wants to disect the instance.
Change-Id: If4d10165fe08f82400772ca88f8490b01bad5cf8
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change allows the MNAIO to really be used as a stand alone kick
system which has the potential to be developed into a stand alone
project. At the very least this change improves playbook performance
by scoping variables.
The inventory has been converted into a typical Ansible inventory and
the "servers" used in the MNAIO are now simply host_vars
which will trigger specific VM builds when instructed to do so. This
gives the MNAIO the ability to serve as a stand alone kick system which
could be used for physical hosts as well as MNAIO testing all through
the same basic set of playbooks. Should a deployer want to use this with
physical servers they'd need to do nothing more than define their basic
inventory and where the the required pieces of infrastructure needed to
PXE boot their machines.
Change-Id: I6c47e02ecfbe8ee7533e77b11041785db485a1a9
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Sadly the log node does not have enough ram to run a full ansible run.
Ansible 2.x requires more ram than one would expect, especially when the
inventory gets large. this change moves the deploy node to infra1 as it
will already have the ram needed to run the playbooks. Additionally the
container storage for infra nodes was too small which forces builds into
error. The default storage for VMs has been set to 90GiB each and the
preseed will create a logical volume for VMs mounted at /var/lib/lxc.
While the limited ram works well for the VMs and within a running
deployment of OSA, ansible-playbook is subject to crash like so:
An exception occurred during task execution. To see the full traceback,
use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [infra1_cinder_api_container-b38b47ea]: FAILED! =>
{"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}
So infra nodes have had the memory constraint raised to 8GiB
Change-Id: I7175ea92f663dfef5966532cfc0b4beaadb9eb03
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The original mnaio was built using a lot of bash and was tailored
specifically for ubuntu 14.04. The new mnaio was built using a mix of
bash and ansible and was tailored specifically for ubuntu 16.04. This
patch takes the two code bases and combines the best things from each
method and wraps it up into a single code path all written using ansible
playbooks and basic variables.
While underlying system has changed the bash environment variable syntax
for overrides remains the same. This allows users to continue with what
has become their normal work-flow while leveraging the new structure and
capabilities.
High level overview:
* The general performance of the VMs running within the MNAIO will now
be a lot better. Before the VMs were built within QCOW2 containers,
while this was flexible and portable it was slower. The new
capabilities will use RAW logical volumes and native IO.
* New repo management starts with preseeds and allows the user to pin
to specific repositories without having to worry about flipping them
post build.
* CPU overhead will be a lot less. The old VM system used an
un-reasonable number of processors per VM which directly translated
to sockets. The new system will use cores and a single socket
allowing for generally better VM performance with a lot less
overhead and resource contention on the host.
* Memory consumption has been greatly reduced. Each VM is now
following the memory restrictions we'd find in the gate, as a MAX.
Most of the VMs are using 1 - 2 GiB of RAM which should be more than
enough for our purposes.
Overall the deployment process is simpler and more flexible and will
work on both trusty and xenial out of the box with the hope to bring
centos7 and suse into the fold some time in the future.
Change-Id: Idc8924452c481b08fd3b9362efa32d10d1b8f707
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>