Reintroduce kolla-ansible check

This allows operators quickly diagnose all containers across
all hosts by running kolla-ansible check. It returns a list
of containers that are missing, not running or in unhealthy
state for each OpenStack service.

Change-Id: I36119ccdeb264aa3de928ec2254d6ff4cc955bfb
Implements: blueprint check-containers
Co-Authored-By: Roman Krček <roman.krcek@tietoevry.com>
This commit is contained in:
Mark Goddard 2018-09-04 16:09:41 +01:00 committed by Bartosz Bezak
parent 13e98d3db0
commit fa6535890c
58 changed files with 238 additions and 4 deletions
ansible/roles
aodh/tasks
barbican/tasks
blazar/tasks
ceilometer/tasks
cinder/tasks
cloudkitty/tasks
collectd/tasks
common/tasks
cyborg/tasks
designate/tasks
etcd/tasks
glance/tasks
gnocchi/tasks
grafana/tasks
hacluster/tasks
heat/tasks
horizon/tasks
influxdb/tasks
ironic/tasks
iscsi/tasks
keystone/tasks
kuryr/tasks
letsencrypt/tasks
loadbalancer/tasks
magnum/tasks
manila/tasks
mariadb/tasks
masakari/tasks
memcached/tasks
mistral/tasks
multipathd/tasks
neutron/tasks
nova-cell/tasks
nova/tasks
octavia/tasks
opensearch/tasks
openvswitch/tasks
ovn-controller/tasks
ovn-db/tasks
ovs-dpdk/tasks
placement/tasks
prometheus/tasks
rabbitmq/tasks
redis/tasks
service-check
defaults
tasks
vars
skyline/tasks
tacker/tasks
telegraf/tasks
trove/tasks
venus/tasks
watcher/tasks
zun/tasks
kolla_ansible/cli
releasenotes/notes
setup.cfg
tests

@ -1 +1,4 @@
---
- name: Checking Aodh containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking Barbican containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Blazar containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Ceilometer containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking Cinder containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Cloudkitty containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Collectd containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Common containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Cyborg containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Designate containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Etcd containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking Glance containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Gnocchi containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Grafana containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Hacluster containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Heat containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Horizon containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Influxdb containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Ironic containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking iSCSI containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking Keystone containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Kuryr containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking LetsEncrypt containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Loadbalancer containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Magnum containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Manila containers
import_role:
role: service-check

@ -1,7 +1,8 @@
---
# Explicitly wait for the database to be accessible via the load balancer.
# Sometimes it can reject connections even when all database services are up,
# due to the health check polling in HAProxy.
- name: Checking Mariadb containers
import_role:
role: service-check
- name: Wait for MariaDB service to be ready through VIP
become: true
command: >

@ -1 +1,4 @@
---
- name: Checking Masakari containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Memcached containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Mistral containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking multipathd containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Neutron containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Nova-cell containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Nova containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Octavia containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking OpenSearch containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Openvswitch containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking OVN-controller containers
import_role:
role: service-check

@ -0,0 +1,4 @@
---
- name: Checking OVN-DB containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking OVN-DPDK containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Placement containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Prometheus containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Rabbitmq containers
import_role:
role: service-check

@ -1,4 +1,8 @@
---
- name: Checking Redis containers
import_role:
role: service-check
- name: Redis ping pong check
become: true
command: "{{ kolla_container_engine }} exec redis redis-cli -h {{ api_interface_address }} -a {{ redis_master_password }} ping"

@ -0,0 +1,3 @@
---
# Dict mapping service names to container configuration.
service_check_services: {}

@ -0,0 +1,38 @@
---
# Check whether all necessary containers are running.
- name: "{{ kolla_role_name | default(project_name) }} | Get container facts"
become: true
kolla_container_facts:
action: get_containers
container_engine: "{{ kolla_container_engine }}"
name: "{{ service_check_enabled_container_names }}"
register: container_facts
when: service_check_enabled_container_names | length > 0
- name: "{{ kolla_role_name | default(project_name) }} | Fail if containers are missing or not running"
vars:
missing_containers: >-
{{ service_check_enabled_container_names | difference(container_facts) | list }}
fail:
msg: >
The following {{ kolla_role_name | default(project_name) }} containers are missing or not running:
{{ missing_containers | join(', ') }}
when:
- container_facts is defined
- missing_containers | length > 0
- name: "{{ kolla_role_name | default(project_name) }} | Fail if containers are unhealthy"
vars:
unhealthy_containers: >-
{{ container_facts |
dict2items |
selectattr("value.Status", "defined") |
selectattr("value.Status", "search", "unhealthy") |
map(attribute='key') | list }}
fail:
msg: >
The following {{ kolla_role_name | default(project_name) }} containers are unhealthy:
{{ unhealthy_containers | join(', ') }}
when:
- container_facts is defined
- unhealthy_containers | length > 0

@ -0,0 +1,9 @@
---
# List of names of containers to check that are enabled and mapped to this
# host.
service_check_enabled_container_names: >-
{{ lookup('vars', (kolla_role_name | default(project_name)) + '_services') |
select_services_enabled_and_mapped_to_host |
dict2items |
map(attribute='value.container_name') |
list }}

@ -1 +1,4 @@
---
- name: Checking Skyline containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Tacker containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Telegraf containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Trove containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Venus containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Watcher containers
import_role:
role: service-check

@ -1 +1,4 @@
---
- name: Checking Zun containers
import_role:
role: service-check

@ -469,3 +469,17 @@ class NovaLibvirtCleanup(KollaAnsibleMixin, Command):
playbooks = _choose_playbooks(parsed_args, "nova-libvirt-cleanup")
self.run_playbooks(parsed_args, playbooks)
class Check(KollaAnsibleMixin, Command):
"""Check container status"""
def take_action(self, parsed_args):
self.app.LOG.info("Checking container status")
extra_vars = {}
extra_vars["kolla_action"] = "check"
playbooks = _choose_playbooks(parsed_args)
self.run_playbooks(parsed_args, playbooks, extra_vars=extra_vars)

@ -0,0 +1,9 @@
---
features:
- |
Reintroduce kolla-ansible check.
This allows operators to quickly diagnose all containers across
all hosts by running kolla-ansible check. It returns a list
of containers that are missing, not running or in unhealthy state
for each OpenStack service.
`Blueprint check-containers <https://blueprints.launchpad.net/kolla-ansible/+spec/check-containers>`__

@ -72,4 +72,4 @@ kolla-ansible.cli =
mariadb-backup = kolla_ansible.cli.commands:MariaDBBackup
mariadb-recovery = kolla_ansible.cli.commands:MariaDBRecovery
nova-libvirt-cleanup = kolla_ansible.cli.commands:NovaLibvirtCleanup
check = kolla_ansible.cli.commands:Check

@ -71,6 +71,8 @@ function deploy {
if [[ $HAS_UPGRADE == 'no' ]]; then
kolla-ansible validate-config -i ${RAW_INVENTORY} -vvv &> /tmp/logs/ansible/validate-config
#TODO(r-krcek) check can be moved out of the if statement in the flamingo cycle
kolla-ansible check -i ${RAW_INVENTORY} -vvv &> /tmp/logs/ansible/check
fi
}