In dev mode currently the python source is mounted under python2.7
site-packages. This change fixes this to use the distro_python_version
variable to ensure dev mode works with Python 3 images.
Change-Id: Ieae3778a02f1b79023b4f1c20eff27b37f481077
Partially-Implements: blueprint python-3
When kolla_copy_ca_into_containers is set to "yes", the Certificate
Authority in /etc/kolla/certificates will be copied into service
containers to enable trust for that CA. This is especially useful when
the CA is self signed, and would not be trusted by default.
Partially-Implements: blueprint custom-cacerts
Change-Id: I4368f8994147580460ebe7533850cf63a419d0b4
Include a reference to the globally configured Certificate Authority to
all services. Services use the CA to verify HTTPs connections.
Change-Id: I38da931cdd7ff46cce1994763b5c713652b096cc
Partially-Implements: blueprint support-trusted-ca-certificate-file
For the CentOS 7 to 8 transition, we will have a period where both
CentOS 7 and 8 images are available. We differentiate these images via a
tag - the CentOS 8 images will have a tag of train-centos8 (or
master-centos8 temporarily).
To achieve this, and maintain backwards compatibility for the
openstack_release variable, we introduce a new 'openstack_tag' variable.
This variable is based on openstack_release, but has a suffix of
'openstack_tag_suffix', which is empty except on CentOS 8 where it has a
value of '-centos8'.
Change-Id: I12ce4661afb3c255136cdc1aabe7cbd25560d625
Partially-Implements: blueprint centos-rhel-8
The [placement].os_interface option was replaced by
[placement].valid_interfaces in Queens and was removed in Rocky.
Change-Id: I306c57305b9088159dd18af4aa373bbc39a8b881
Closes-Bug: #1853621
As part of the effort to implement Ansible code linting in CI
(using ansible-lint) - we need to implement recommendations from
ansible-lint output [1].
One of them is to stop using local_action in favor of delegate_to -
to increase readability and and match the style of typical ansible
tasks.
[1]: https://review.opendev.org/694779/
Partially implements: blueprint ansible-lint
Change-Id: I46c259ddad5a6aaf9c7301e6c44cd8a1d5c457d3
This variable was removed in the Train cycle, and a precheck added for
its use. This precheck can now be removed.
Change-Id: I6d9f0b577631ff9443deecf8ef9d94ca217674c5
If "reclaim_instance_interval" has been set in nova conf,
attched volume may not be delete while instacne deleted.
Adding cinder auth in nova conf can solve the problem.
Change-Id: I9eb3a74c2f6976043cc35a94915f1fcecb9ef601
Closes-Bug: 1850279
Due to a Docker bug [1] we cannot use Docker to send
SIGHUP to the container because it will mark it as
stopped.
This patch sends the signal directly to the process,
bypassing Docker.
'changed_when: false' is also removed from the
relevant task as it definitely changes the state.
In the future we could do the refresh only if
there really is a need for another one.
[1] https://github.com/moby/moby/issues/11065
Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Closes-Bug: #1845244
Affects config with Blazar and fake Nova only.
The default does not include it.
Upstream docs:
RetryFilter - Deprecated since version 20.0.0 (Train)
Since the 17.0.0 (Queens) release, the scheduler has provided
alternate hosts for rescheduling so the scheduler does not need to
be called during a reschedule which makes the RetryFilter useless.
Change-Id: I26bf45997005124e9166b5bf1d44cb276624430b
This patch adds initial support for deploying multiple Nova cells.
Splitting a nova-cell role out from the Nova role allows a more granular
approach to deploying and configuring Nova services.
A new enable_cells flag has been added that enables the support of
multiple cells via the introduction of a super conductor in addition to
cell-specific conductors. When this flag is not set (the default), nova
is configured in the same manner as before - with a single conductor.
The nova role now deploys the global services:
* nova-api
* nova-scheduler
* nova-super-conductor (if enable_cells is true)
The nova-cell role handles services specific to a cell:
* nova-compute
* nova-compute-ironic
* nova-conductor
* nova-libvirt
* nova-novncproxy
* nova-serialproxy
* nova-spicehtml5proxy
* nova-ssh
This patch does not support using a single cell controller for managing
more than one cell. Support for sharing a cell controller will be added
in a future patch.
This patch should be backwards compatible and is tested by existing CI
jobs. A new CI job has been added that tests a multi-cell environment.
ceph-mon has been removed from the play hosts list as it is not
necessary - delegate_to does not require the host to be in the play.
Documentation will be added in a separate patch.
Partially Implements: blueprint support-nova-cells
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I810aad7d49db3f5a7fd9a2f0f746fd912fe03917
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Using profiles in cephx is the recommended way since Mimic,
this also adds support for blacklist ops.
Change-Id: Ib9f65644637a5761c6cd7ca8925afc6bb2b8d5f5
Closes-Bug: #1760065
The rolling upgrade has been the default since Stein. The legacy
upgrade has been removed because it doesn't follow the upgrade
guide [1].
[1] https://docs.openstack.org/nova/latest/user/upgrade.html
Change-Id: I2aa879699cb4e9955bf5c38053eada5a53fb6211
Sometimes as cloud admins, we want to only update code that is running
in a cloud. But we dont need to do anything else. Make an action in
kolla-ansible that allows us to do that.
Change-Id: I904f595c69f7276e71692696471e32fd1f88e6e8
Implements: blueprint deploy-containers-action
To securely support live migration between computenodes we should enable
tls, with cert auth, instead of TCP with no auth support.
Implements: blueprint libvirt-tls
Change-Id: I22ea6233933c840b853fdcc8e03400b2bf577271
Use upstream Ansible modules for registration of services, endpoints,
users, projects, roles, and role grants.
Change-Id: I7c9138d422cc91c177fd8992347176bb54156b5a
Also fixes similar issues introduced by the same recent change.
Added FIXME note about possible TLS malfunction regarding horizon.
Change-Id: I5f46a9306139eb550d3849757c8bdf0767537c78
Closes-Bug: #1844016
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
When nova-api group have no hosts, we don't have to run create_cells
and discover_computes. Add conditional blocks to prevent to run them.
Change-Id: Ia1ba058c1b74b06b678f45544883e567e2b4eb55
Closes-Bug: #1843235
The output from `nova-manage cell_v2 list_cells --verbose` contains
an extra column, stating whether the cell is enabled or not. This means
that the regex never matches, so existing_cells is always empty.
This fix updates the regex by adding a match group for this field which
may be used in a later change.
Unfortuately the CLI doesn't output in JSON format, which would make
this a lot less messy.
Closes-Bug: #1842460
Change-Id: Ib6400b33785f3ef674bffc9329feb3e33bd3f9a3
nova.conf currently uses the [neutron] "url" parameter which has been
deprecated since 17.0.0. In multi-region environments this can
cause Nova to look up the Neutron endpoint for a different region.
Remove this parameter and set region_name and
valid_interfaces to allow the correct lookup to be performed.
Change-Id: I1bbc73728439a460447bc8edd264f9f2d3c814e0
Closes-Bug: #1836952
This resolves an issue where the web browser would complain that it
was trying to connect to insecure websocket when using HTTPS with
horizon.
Change-Id: Ib75cc2bc1b3811bc31badd5fda3db3ed0c59b119
Closes-Bug: #1841914
This review is the first one in a series of patches and it introduces an
optional encryption for internal openstack endpoints, implementing part
of the add-ssl-internal-network spec.
Change-Id: I6589751626486279bf24725f22e71da8cd7f0a43
Nova-consoleauth support was removed in
I099080979f5497537e390f531005a517ab12aa7a, but these variables were
left.
Change-Id: I1ce1631119bba991225835e8e409f11d53276550
This commit adds the functionality for an operator to specify
their own trusted CA certificate file for interacting with the
Keystone API.
Implements: blueprint support-trusted-ca-certificate-file
Change-Id: I84f9897cc8e107658701fb309ec318c0f805883b
After all of the discussions we had on
"https://review.opendev.org/#/c/670626/2", I studied all projects that
have an "oslo_messaging" section. Afterwards, I applied the same method
that is already used in "oslo_messaging" section in Nova, Cinder, and
others. This guarantees that we have a consistent method to
enable/disable notifications across projects based on components (e.g.
Ceilometer) being enabled or disabled. Here follows the list of
components, and the respective changes I did.
* Aodh:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Congress:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Cinder:
It was already properly configured.
* Octavia:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Heat:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Ceilometer:
Ceilometer publishes some messages in the rabbitMQ. However, the
default driver is "messagingv2", and not ''(empty) as defined in Oslo;
these configurations are defined in ceilometer/publisher/messaging.py.
Therefore, we do not need to do anything for the
"oslo_messaging_notifications" section in Ceilometer
* Tacker:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Neutron:
It was already properly configured.
* Nova
It was already properly configured. However, we found another issue
with its configuration. Kolla-ansible does not configure nova
notifications as it should. If 'searchlight' is not installed (enabled)
the 'notification_format' should be 'unversioned'. The default is
'both'; so nova will send a notification to the queue
versioned_notifications; but that queue has no consumer when
'searchlight' is disabled. In our case, the queue got 511k messages.
The huge amount of "stuck" messages made the Rabbitmq cluster
unstable.
https://bugzilla.redhat.com/show_bug.cgi?id=1478274https://bugs.launchpad.net/ceilometer/+bug/1665449
* Nova_hyperv:
I added the same configurations as in Nova project.
* Vitrage
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Searchlight
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Ironic
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Glance
It was already properly configured.
* Trove
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Blazar
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Sahara
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Watcher
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Barbican
I created a mechanism similar to what we have in Cinder, Nova,
and others. I also added a configuration to 'keystone_notifications'
section. Barbican needs its own queue to capture events from Keystone.
Otherwise, it has an impact on Ceilometer and other systems that are
connected to the "notifications" default queue.
* Keystone
Keystone is the system that triggered this work with the discussions
that followed on https://review.opendev.org/#/c/670626/2. After a long
discussion, we agreed to apply the same approach that we have in Nova,
Cinder and other systems in Keystone. That is what we did. Moreover, we
introduce a new topic "barbican_notifications" when barbican is
enabled. We also removed the "variable" enable_cadf_notifications, as
it is obsolete, and the default in Keystone is CADF.
* Mistral:
It was hardcoded "noop" as the driver. However, that does not seem a
good practice. Instead, I applied the same standard of using the driver
and pushing to "notifications" queue if Ceilometer is enabled.
* Cyborg:
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Murano
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Senlin
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Manila
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Zun
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Designate
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Magnum
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
Closes-Bug: #1838985
Change-Id: I88bdb004814f37c81c9a9c4e5e491fac69f6f202
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
A common class of problems goes like this:
* kolla-ansible deploy
* Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
* Re-run kolla-ansible deploy
* Service fails to start
This happens because the DB is created during the first run, but for some
reason we fail before performing the DB sync. This means that on the second run
we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
already exists, and therefore still don't perform the DB sync. However this
time, the command may complete without apparent error.
We should be less careful about when we perform the DB sync, and do it whenever
it is necessary. There is an argument for not doing the sync during a
'reconfigure' command, although we will not change that here.
This change only always performs the DB sync during 'deploy' and
'reconfigure' commands.
Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
Closes-Bug: #1823766
Closes-Bug: #1797814
Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).
Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835
Additionally this fixes the improper nesting of 'if' blocks for libvirt.
Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:
TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
"failed": true,
"msg": "The field 'vars' has an invalid value, which
includes an undefined variable. The error was:
'nova_compute_services' is undefined\n\nThe error
appears to have been in
'/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
line 30, column 3, but may\nbe elsewhere in the file
depending on the exact syntax problem.\n\nThe
offending line appears to be:\n\n\n- name: Waiting
for nova-compute services to register themselves\n ^
here\n"
}
Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy
This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.
We can work around this by not using an intermediary variable.
Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817
In a single controller scenario, the "Upgrade status check result"
does nothing because the previous task can only succeed when
`nova-status upgrade check` returns code 0. This change allows this
command to fail, so that the value of returned code stored in
`nova_upgrade_check_stdout` can then be analysed.
This change also allows for warnings (rc 1) to pass.
Closes-Bug: 1834647
Change-Id: I6f5e37832f43f23604920b9d890cc505ca924ff9