system-config

Author	SHA1	Message	Date
Ian Wienand	c1aff2ed38	kerberos-kdc: role to manage Kerberos KDC servers This adds a role and related testing to manage our Kerberos KDC servers, intended to replace the puppet modules currently performing this task. This role automates realm creation, initial setup, key material distribution and replica host configuration. None of this is intended to run on the production servers which are already setup with an active database, and the role should be effectively idempotent in production. Note that this does not yet switch the production servers into the new groups; this can be done in a separate step under controlled conditions and with related upgrades of the host OS to Focal. Change-Id: I60b40897486b29beafc76025790c501b5055313d	2021-03-17 08:30:52 +11:00
Ian Wienand	028d655375	Add borg-backup roles This adds roles to implement backup with borg [1]. Our current tool "bup" has no Python 3 support and is not packaged for Ubuntu Focal. This means it is effectively end-of-life. borg fits our model of servers backing themselves up to a central location, is well documented and seems well supported. It also has the clarkb seal of approval :) As mentioned, borg works in the same manner as bup by doing an efficient back up over ssh to a remote server. The core of these roles are the same as the bup based ones; in terms of creating a separate user for each host and deploying keys and ssh config. This chooses to install borg in a virtualenv on /opt. This was chosen for a number of reasons; firstly reading the history of borg there have been incompatible updates (although they provide a tool to update repository formats); it seems important that we both pin the version we are using and keep clients and server in sync. Since we have a hetrogenous distribution collection we don't want to rely on the packaged tools which may differ. I don't feel like this is a great application for a container; we actually don't want it that isolated from the base system because it's goal is to read and copy it offsite with as little chance of things going wrong as possible. Borg has a lot of support for encrypting the data at rest in various ways. However, that introduces the possibility we could lose both the key and the backup data. Really the only thing stopping this is key management, and if we want to go down this path we can do it as a follow-on. The remote end server is configured via ssh command rules to run in append-only mode. This means a misbehaving client can't delete its old backups. In theory we can prune backups on the server side -- something we could not do with bup. The documentation has been updated but is vague on this part; I think we should get some hosts in operation, see how the de-duplication is working out and then decide how we want to mange things long term. Testing is added; a focal and bionic host both run a full backup of themselves to the backup server. Pretty cool, the logs are in /var/log/borg-backup-<host>.log. No hosts are currently in the borg groups, so this can be applied without affecting production. I'd suggest the next steps are to bring up a borg-based backup server and put a few hosts into this. After running for a while, we can add all hosts, and then deprecate the current bup-based backup server in vexxhost and replace that with a borg-based one; giving us dual offsite backups. [1] https://borgbackup.readthedocs.io/en/stable/ Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e	2020-07-21 17:36:50 +10:00
Ian Wienand	c9215801f0	Generate ssl check list directly from letsencrypt variables This autogenerates the list of ssl domains for the ssl-cert-check tool directly from the letsencrypt list. The first step is the install-certcheck role that replaces the puppet-ssl_cert_check module that does the same. The reason for this is so that during gate testing we can test this on the test bridge.openstack.org server, and avoid adding another node as a requirement for this test. letsencrypt-request-certs is updated to set a fact letsencrypt_certcheck_domains for each host that is generating a certificate. As described in the comments, this defaults to the first host specified for the certificate and the listening port can be indicated (if set, this new port value is stripped when generating certs as is not necessary for certificate generation). The new letsencrypt-config-certcheck role runs and iterates all letsencrypt hosts to build the final list of domains that should be checked. This is then extended with the letsencrypt_certcheck_additional_domains value that covers any hosts using certificates not provisioned by letsencrypt using this mechanism. These additional domains are pre-populated from the openstack.org domains in the extant check file, minus those openstack.org domain certificates we are generating via letsencrypt (see letsencrypt-create-certs/handlers/main.yaml). Additionally, we update some of the certificate variables in host_vars that are listening on port !443. As mentioned, bridge.openstack.org is placed in the new certcheck group for gate testing, so the tool and config file will be deployed to it. For production, cacti is added to the group, which is where the tool currently runs. The extant puppet installation is disabled, pending removal in a follow-on change. Change-Id: Idbe084f13f3684021e8efd9ac69b63fe31484606	2020-05-20 14:27:14 +10:00
Ian Wienand	7b8b788ce2	Add focal testing for mirror nodes Change-Id: I64de9a61c5044b93f6ce7e2d31cf51d78fd4ec16	2020-05-13 05:32:54 +10:00
Ian Wienand	814e4be128	Ansible roles for backup This introduces two new roles for managing the backup-server and hosts that we wish to back up. Firstly the "backup" role runs on hosts we wish to backup. This generates and configures a separate ssh key for running bup and installs the appropriate cron job to run the backup daily. The "backup-server" job runs on the backup server (or, indeed servers). It creates users for each backup host, accepts the remote keys mentioned above and initalises bup. It is then ready to receive backups from the remote hosts. This eliminates a fairly long-standing requirement for manual setup of the backup server users and keys; this section is removed from the documentation. testinfra coverage is added. Change-Id: I9bf74df351e056791ed817180436617048224d2c	2019-08-05 16:59:57 +10:00
Ian Wienand	d33105535a	Separate openafs CI mirror This is an intermediate step to having both kafs and openafs testing in the gate; this just makes it clear which host is which. Change-Id: I8cd006227ed47ad5f2c5eec664083477dd7ba397	2019-06-17 15:56:09 +10:00
Ian Wienand	670107045a	Create opendev mirrors This impelements mirrors to live in the opendev.org namespace. The implementation is Ansible native for deployment on a Bionic node. The hostname prefix remains the same (mirrorXX.region.provider.) but the groups.yaml splits the opendev.org mirrors into a separate group. The matches in the puppet group are also updated so to not run puppet on the hosts. The kerberos and openafs client parts do not need any updating and works on the Bionic host. The hosts are setup to provision certificates for themselves from letsencrypt. Note we've added a new handler for mirror nodes to use that restarts apache on certificate issue/renewal. The new "mirror" role is a port of the existing puppet mirror.pp. It installs apache, sets up some modules, makes some symlinks, sets up a cleanup cron job and installs the apache vhost configuration. The vhost configuration is also ported from the extant puppet. It is simplified somewhat; but the biggest change is that we have extracted the main port 80 configuration into a macro which is applied to both port 80 and 443; i.e. the host will have SSL support. The other ports are left alone for now, but can be updated in due course. Thus we should be able to CNAME the existing mirrors to new nodes, and any existing http access can continue. We can update our mirror setup scripts to point to https resources as appropriate. Change-Id: Iec576d631dd5b02f6b9fb445ee600be060f9cf1e	2019-05-21 11:08:25 +10:00
James E. Blair	8ad300927e	Split the base playbook into services This is a first step toward making smaller playbooks which can be run by Zuul in CD. Zuul should be able to handle missing projects now, so remove it from the puppet_git playbook and into puppet. Make the base playbook be merely the base roles. Make service playbooks for each service. Remove the run-docker job because it's covered by service jobs. Stop testing that puppet is installed in testinfra. It's accidentally working due to the selection of non-puppeted hosts only being on bionic nodes and not installing puppet on bionic. Instead, we can now rely on actually running puppet when it's important, such as in the eavesdrop job. Also remove the installation of puppet on the nodes in the base job, since it's only useful to test that a synthetic test of installing puppet on nodes we don't use works. Don't run remote_puppet_git on gitea for now - it's too slow. A followup patch will rework gitea project creation to not take hours. Change-Id: Ibb78341c2c6be28005cea73542e829d8f7cfab08	2019-05-19 07:31:00 -05:00
Ian Wienand	afd907c16d	letsencrypt support This change contains the roles and testing for deploying certificates on hosts using letsencrypt with domain authentication. From a top level, the process is implemented in the roles as follows: 1) letsencrypt-acme-sh-install This role installs the acme.sh tool on hosts in the letsencrypt group, along with a small custom driver script to help parse output that is used by later roles. 2) letsencrypt-request-certs This role runs on each host, and reads a host variable describing the certificates required. It uses the acme.sh tool (via the driver) to request the certificates from letsencrypt. It populates a global Ansible variable with the authentication TXT records required. If the certificate exists on the host and is not within the renewal period, it should do nothing. 3) letsencrypt-install-txt-record This role runs on the adns server. It installs the TXT records generated in step 2 to the acme.opendev.org domain and then refreshes the server. Hosts wanting certificates will have pre-provisioned CNAME records for _acme-challenge.host.opendev.org pointing to acme.opendev.org. 4) letsencrypt-create-certs This role runs on each host, reading the same variable as in step 2. However this time the acme.sh tool is run to authenticate and create the certificates, which should now work correctly via the TXT records from step 3. After this, the host will have the full certificate material. Testing is added via testinfra. For testing purposes requests are made to the staging letsencrypt servers and a self-signed certificate is provisioned in step 4 (as the authentication is not available during CI). We test that the DNS TXT records are created locally on the CI adns server, however. Related-Spec: https://review.openstack.org/587283 Change-Id: I1f66da614751a29cc565b37cdc9ff34d70fdfd3f	2019-04-02 15:31:41 +11:00
Ian Wienand	f07bf2a507	Import install-docker role This is a role for installing docker on our control-plane servers. It is based on install-docker from zuul-jobs. Basic testinfra tests are added; because docker fiddles the iptables rules in magic ways, the firewall testing is moved out of the base tests and modified to partially match our base firewall configuration. Change-Id: Ia4de5032789ff0f2b07d4f93c0c52cf94aa9c25c	2018-12-14 11:30:47 -08:00
Monty Taylor	e998db36f2	Add yamlgroup inventory plugin The constructed inventory plugin allows expressing additional groups, but it's too heavy weight for our needs. Additionally, it is a full inventory plugin that will add hosts to the inventory if they don't exist. What we want instead is something that will associate existing hosts (that would have come from another source) with groups. This also switches to using emergency.yaml instead of emergency, which uses the same format. We add an extra groups file for gate testing to ensure the CI nodes get puppet installed. Change-Id: Iea8b2eb2e9c723aca06f75d3d3307893e320cced	2018-11-02 08:19:53 +11:00

11 Commits