This also enables Placement when Zun is enabled like Kolla Ansible
already does with Nova.
Change-Id: Id2a09f702e8503b49d2b9e73e06b2ce9f4d168a9
Closes-bug: #1840573
This patch adds initial support for deploying multiple Nova cells.
Splitting a nova-cell role out from the Nova role allows a more granular
approach to deploying and configuring Nova services.
A new enable_cells flag has been added that enables the support of
multiple cells via the introduction of a super conductor in addition to
cell-specific conductors. When this flag is not set (the default), nova
is configured in the same manner as before - with a single conductor.
The nova role now deploys the global services:
* nova-api
* nova-scheduler
* nova-super-conductor (if enable_cells is true)
The nova-cell role handles services specific to a cell:
* nova-compute
* nova-compute-ironic
* nova-conductor
* nova-libvirt
* nova-novncproxy
* nova-serialproxy
* nova-spicehtml5proxy
* nova-ssh
This patch does not support using a single cell controller for managing
more than one cell. Support for sharing a cell controller will be added
in a future patch.
This patch should be backwards compatible and is tested by existing CI
jobs. A new CI job has been added that tests a multi-cell environment.
ceph-mon has been removed from the play hosts list as it is not
necessary - delegate_to does not require the host to be in the play.
Documentation will be added in a separate patch.
Partially Implements: blueprint support-nova-cells
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I810aad7d49db3f5a7fd9a2f0f746fd912fe03917
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
1. Fix yamllint errors in .yamllint file(!)
YAML lint is currently failling on its own configuration file,
.yamllint. This change fixes the issues.
2. Run bindep role in Zuul jobs
This fixes an issue where libffi is not available.
Change-Id: Ic08a8e53a6905a68f0fe26d4b28184e62a64324f
This ensures that failure of a single host fails the whole play at that
task. This can avoid confusing errors such as when the task
"Assert that the nodepool private IPv4 address is assigned" fails on one
host, causing subsequent errors on other hosts.
Note that this only affects the Zuul playbooks, not Kolla Ansible's
playbooks.
Change-Id: I77a6534dd2ddd188f795e17d17a44be249d01f31
This is not required since enabling HAProxy over VXLAN [1].
[1] https://review.opendev.org/670690
Change-Id: I239a7c60d6ae0c80640ff10209a80c7a9ca74cd6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
After modernising docker configuration
(I1215e04ec15b01c0b43bac8c0e81293f6724f278), we lost our
registry-mirrors configuration in CI that lets us use a mirror of
Dockerhub.
This change uses the new docker_custom_config variable to configure the
registry mirror.
Change-Id: I1430413c12e9d0b59e4f216ff66372de0f3a4f21
This script has a few issues:
* It catches false positives, due to log levels in config options.
* It doesn't fail on CRITICAL logs, due to variable reset issue.
This change fixes these.
Change-Id: I50c859eb2991e498eeb64bca45daf1e6f237761f
This patch adds configs relevant to name resolution.
Change-Id: I7ebc2409e9ec0bd875abf0bf4e452bc89efe940d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
VXLAN is necessary to run HA in CI (due to floating VIP
address handled by keepalived).
It also turned out to be required to have private
IPv6 address assignments.
This patch is based on linux bridge rather than OVS
to avoid problems with OVS deployed in containers.
This patch enables haproxy in multinode jobs.
Includes saving of linux networking details.
Makes DASHBOARD_URL agree with OS_AUTH_URL - properly uses the
pre-upgrade value for testing.
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Depends-on: https://review.opendev.org/683068
Depends-on: https://review.opendev.org/682957
Change-Id: I66888712da80c3d6f84ee4949762961664d3adea
This lets us control the upgrade process entirely from the
current branch.
Change-Id: Ic8c39e415846596c23dae93c2839375a24e8b888
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This commit follows up the work in Kolla to provide deploy and configure the
Prometheus blackbox exporter.
An example blackbox-exporter module has been added (disabled by default)
called os_endpoint. This allows for the probing of endpoints over HTTP
and HTTPS. This can be used to monitor that OpenStack endpoints return a status
code of either 200 or 300, and the word 'versions' in the payload.
This change introduces a new variable `prometheus_blackbox_exporter_endpoints`.
Currently no defaults are specified because the configuration is heavily
dependent on the deployment.
Co-authored-by: Jack Heskett <Jack.Heskett@gresearch.co.uk>
Change-Id: I36ad4961078d90e2fd70c9a3368f5157d6fd89cd
The kolla_toolbox Ansible module executes as-hoc ansible commands in the
kolla_toolbox container, and parses the output to make it look as if
ansible-playbook executed the command. Currently however, this module
sometimes fails to catch failures of the underlying command, and also
sometimes shows tasks as 'ok' when the underlying command was changed.
This has been tested both before and after the upgrade to ansible 2.8.
This change fixes this issue by configuring ansible to emit output in
JSON format, to make parsing simpler. We can now pick up errors and
changes, and signal them to the caller.
This change also adds an ansible playbook, tests/test-kolla-toolbox.yml,
that can be executed to test the module. It's not currently integrated
with any CI jobs.
Note that this change cannot be backported as the JSON output callback
plugin was added in Ansible 2.5.
Change-Id: I8236dd4165f760c819ca972b75cbebc62015fada
Closes-Bug: #1844114
These filters can be used to capture a lot of the logic that we
currently have in 'when' statements, about which services are enabled
for a particular host.
In order to use these filters, it is necessary to install the
kolla_ansible python module, and not just the dependencies listed in
requirements.txt. The CI test and quickstart install from source
documentation has been updated accordingly.
Ansible is not currently in OpenStack global requirements, so for unit
tests we avoid a direct dependency on Ansible and provide fakes where
necessary.
Change-Id: Ib91cac3c28e2b5a834c9746b1d2236a309529556
After the integration with placement [1], we need to configure how
zun-compute is going to work with nova-compute.
* If zun-compute and nova-compute run on the same compute node,
we need to set 'host_shared_with_nova' as true so that Zun
will use the resource provider (compute node) created by nova.
In this mode, containers and VMs could claim allocations against
the same resource provider.
* If zun-compute runs on a node without nova-compute, no extra
configuration is needed. By default, each zun-compute will create
a resource provider in placement to represent the compute node
it manages.
[1] https://blueprints.launchpad.net/zun/+spec/use-placement-resource-management
Change-Id: I2d85911c4504e541d2994ce3d48e2fbb1090b813
Instead of changing Docker daemon command line let's change config
for Docker instead. In /etc/docker/daemon.json file as it should be.
Custom Docker options can be set with 'docker_custom_config' variable.
Old 'docker_custom_option' is still present but should be avoided.
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I1215e04ec15b01c0b43bac8c0e81293f6724f278
In order to orchestrate smooth transition to fluentd 0.14.x
aka 1.0 stable branch aka td-agent 3
from td-agent repository - use image labels (fluentd_version
and fluentd_binary).
Depends-On: https://review.opendev.org/676411
Change-Id: Iab8518c34ef876056c6abcdb5f2e9fc9f1f7dbdd
- Test Zun on CentOS too
- Make etcd change also trigger Zun jobs (like kuryr and zun)
- Test multinode Zun deployments instead of AIO
(more likely to break)
- In Zun scenario, stop configuring docker for legacy swarm mode
(Zun is no swarm)
- Separate test-zun.sh testing script
- Show appcontainer to see which node it has been started on
Change-Id: I289b1009fe00aedb9b78cbd83298b14da5fd9670
Depends-On: https://review.opendev.org/676736
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
During the MariaDB testing we saw a number of cases where this IP
address was not assigned to one or more hosts, which caused various
issues later on.
Change-Id: I61b54483e4553b926e9ddc0a8848b2daa6bc49f1
1) ceph-nfs (ganesha-ceph) - use NFSv4 only
This is recommended upstream.
v3 and UDP require portmapper (aka rpcbind) which we
do not want, except where Ubuntu ganesha version (2.6)
forces it by requiring enabled UDP, see [1].
The issue has been fixed in 2.8, included in CentOS.
Additionally disable v3 helper protocols and kerberos
to avoid meaningless warnings.
2) ceph-nfs (ganesha-ceph) - do not export host dbus
It is not in use. This avoids the temptation to try
handling it on host.
3) Properly handle ceph services deploy and upgrade
Upgrade runs deploy.
The order has been corrected - nfs goes after mds.
Additionally upgrade takes care of rgw for keystone
(for swift emulation).
4) Enhance ceph keyring module with error detection
Now it does not blindly try to create a keyring after
any failure. This used to hide real issue.
5) Retry ceph admin keyring update until cluster works
Reordering deployment caused issue with ceph cluster not being
fully operational before taking actions on it.
6) CI: Remove osd df from collected logs as it may hang CI
Hangs are caused by healthy MON and no healthy MGR.
A descriptive note is left in its place.
7) CI: Add 5s timeout to ceph informational commands
This decreases the timeout from the default 300s.
[1] https://review.opendev.org/669315
Change-Id: I1cf0ad10b80552f503898e723f0c4bd00a38f143
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This actually replaces two ad-hoc fixes with a more unified
solution (with comment for posterity).
Change-Id: I62f57cb489c900f68a0c7aeb3e20e4715c0e2661
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Multinode jobs did not run sanity checks for all the hosts,
only primary. Now they check all.
Additionally upgrades are now checked using the proper
(pre-upgrade) scripts (not that it matters too much as they
are the same atm) and both checks are done, not only failures,
but also config.
Change-Id: I10552e256edbddd5b1f8a8a7f8805262e72ce8d8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
We install kolla-ansible requirements in Zuul's Ansible playbooks.
This patch cleans up the installation in scripts so that they are
only concerned with auxiliary requirements:
- ansible (since we do not track it in requirements)
- ara (for log summaries)
- openstack clients (for first init and tests after deployment)
Additionally this patch installs openstack clients in a separate
virtualenv.
Note that all kolla-ansible requirements, ansible and ara are still
installed system-wide.
Change-Id: Iac04082ad39a9d823c515ba11c5db9af50ed225f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>