76 Commits

Author SHA1 Message Date
Ian Wienand
c1aff2ed38 kerberos-kdc: role to manage Kerberos KDC servers
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.

This role automates realm creation, initial setup, key material
distribution and replica host configuration.  None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.

Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.

Change-Id: I60b40897486b29beafc76025790c501b5055313d
2021-03-17 08:30:52 +11:00
Zuul
4d85fc521a Merge "Use dstat to record performance of system-config-run hosts" 2021-02-23 00:13:59 +00:00
Clark Boylan
1560b01f7e Use dstat to record performance of system-config-run hosts
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.

Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
2021-02-16 14:31:30 -08:00
Ian Wienand
39ffc685d6 backups: remove all bup
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.

For reference, the servers being backed up at this time are:

 borg-ask01
 borg-ethercalc02
 borg-etherpad01
 borg-gitea01
 borg-lists
 borg-review-dev01
 borg-review01
 borg-storyboard01
 borg-translate01
 borg-wiki-update-test
 borg-zuul01

This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.

For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.

Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
2021-02-16 16:00:28 +11:00
James E. Blair
e58a18d8a1 Stop running ansible-lint on this repo
It is buggy (throwing exceptions for undefinied variables which are
actualyl defined via set_fact), and we frequently run into problems
using it in this repo.  It was designed to lint roles for Galaxy,
not the way we write ansible.  As of the 5.0.0 release it's
generating >4.5K lines of complaints about files in this repository.

Change-Id: If9d8c19b5e663bdd6b6f35ffed88db3cff3d79f8
2021-02-09 22:08:38 +00:00
Clark Boylan
a4604ae0b3 Deploy refstack with ansible docker
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.

Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
2021-02-05 19:23:34 +00:00
Ian Wienand
be085e564e run-selenium: run selenium on a node
This runs selenium from a container on a node, and exposes port 4444
so you can issue commands to it.  This is used in the follow-on
I56cda99790d3c172e10b664e57abeca10efc5566 to take some screenshots of
gerrit.

Change-Id: Idcbcd9a8f33bd86b5f3e546dd563792212e0751b
2021-01-18 07:58:23 -08:00
Ian Wienand
595dfd1166 system-config-run-review: remove review-dev server
We don't need to test two servers in this test; remove review-dev.
Consensus seems to be this was for testing plans that have now been
superseded.

Change-Id: Ia4db5e0748e1c82838000c9b655808c3d8b74461
2020-12-15 11:09:17 +11:00
Ian Wienand
368466730c Migrate codesearch site to container
The hound project has undergone a small re-birth and moved to

 https://github.com/hound-search/hound

which has broken our deployment.  We've talked about leaving
codesearch up to gitea, but it's not quite there yet.  There seems to
be no point working on the puppet now.

This builds a container than runs houndd.  It's an opendev specific
container; the config is pulled from project-config directly.

There's some custom scripts that drive things.  Some points for
reviewers:

 - update-hound-config.sh uses "create-hound-config" (which is in
   jeepyb for historical reasons) to generate the config file.  It
   grabs the latest projects.yaml from project-config and exits with a
   return code to indicate if things changed.

 - when the container starts, it runs update-hound-config.sh to
   populate the initial config.  There is a testing environment flag
   and small config so it doesn't have to clone the entire opendev for
   functional testing.

 - it runs under supervisord so we can restart the daemon when
   projects are updated.  Unlike earlier versions that didn't start
   listening till indexing was done, this version now puts up a "Hound
   is not ready yet" message when while it is working; so we can drop
   all the magic we were doing to probe if hound is listening via
   netstat and making Apache redirect to a status page.

 - resync-hound.sh is run from an external cron job daily, and does
   this update and restart check.  Since it only reloads if changes
   are made, this should be relatively rare anyway.

 - There is a PR to monitor the config file
   (https://github.com/hound-search/hound/pull/357) which would mean
   the restart is unnecessary.  This would be good in the near and we
   could remove the cron job.

 - playbooks/roles/codesearch is unexciting and deploys the container,
   certificates and an apache proxy back to localhost:6080 where hound
   is listening.

I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.

Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
2020-11-20 07:41:12 +11:00
Ian Wienand
ba45f251d1 Fix junit error, add HTML report
Specifying the family stops a deprecation warning being output.

Add a HTML report and report it as an artifact as well; this is easier
to read.

Change-Id: I2bd6505c19cee2d51e9af27e9344cfe2e1110572
2020-07-15 07:03:22 +10:00
Ian Wienand
c697f22413 run-base : don't strip root ssh private key
Builds running on the new container-based executors started failing to
connect to remote hosts with

 Load key "/root/.ssh/id_rsa": invalid format

It turns out the new executor is writing keys in OpenSSH format,
rather than the older PEM format.  And it seems that the OpenSSH
format is more picky about having a trailing space after the

 -----END OPENSSH PRIVATE KEY-----

bit of the id_rsa file.  By default, the file lookup runs an rstrip on
the incoming file to remove the trailing space.  Turn that off so we
generate a valid key.

Change-Id: I49bb255f359bd595e1b88eda890d04cb18205b6e
2020-07-14 13:13:13 +10:00
Ian Wienand
b146181174 Grafana container deployment
This uses the Grafana container created with
Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the
grafana.opendev.org service.

We retain the old model of an Apache reverse-proxy; it's well tested
and understood, it's much easier than trying to map all the SSL
termination/renewal/etc. into the Grafana container and we don't have
to convince ourselves the container is safe to be directly web-facing.

Otherwise this is a fairly straight forward deployment of the
container.  As before, it uses the graph configuration kept in
project-config which is loaded in with grafyaml, which is included in
the container.

Once nice advantage is that it makes it quite easy to develop graphs
locally, using the container which can talk to the public graphite
instance.  The documentation has been updated with a reference on how
to do this.

Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4
2020-07-03 07:17:22 +10:00
Clark Boylan
f7e92ee669 Improve ansible yaml output for humans
We use ansible's to_nice_yaml output filter when writing ansible
datastructures to yaml. This has a default indent of 4, but we humans
usually write yaml with an indent of 2. Make the generated yaml more
similar to what us humans write and set the indent to 2.

Change-Id: I3dc41b54e1b6480d7085261bc37c419009ef5ba7
2020-06-18 10:02:11 -07:00
James E. Blair
3d6cefe9dd Stop using backend hostname in zuul testinfra tests
Tests that call host.backend.get_hostname() to switch on test
assertions are likely to fail open.  Stop using this in zuul tests
and instead add new files for each of the types of zuul hosts
where we want to do additional verification.

Share the iptables related code between all the tests that perform
iptables checks.

Also, some extra merger test and some negative assertions are added.

Move multi-node-hosts-file to after set-hostname. multi-node-hosts-file
is designed to append, and set-hostname is designed to write.

When we write the gate version of the inventory, map the nodepool
private_ipv4 address as the public_v4 address of the inventory host
since that's what is written to /etc/hosts, and is therefore, in the
context of a gate job, the "public" address.

Change-Id: Id2dad08176865169272a8c135d232c2b58a7a2c1
2020-06-10 14:48:40 -07:00
Monty Taylor
6b89b0b5a7 Override bridge hostvars directly
Now that we're not using playbook adjacent hostvars, just set
overrides in the test hostvars.

Change-Id: I4598c61dbe28eb965fce9a4578ddf44d785c5ff1
2020-06-04 09:20:13 -07:00
Monty Taylor
83ced7f6e6 Split inventory into multiple dirs and move hostvars
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.

Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.

A followup patch will move host-specific values into equivilent
files in inventory/base.

This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.

Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
2020-06-04 07:44:36 -05:00
Zuul
fc39f87f1e Merge "testinfra: pass inventory and zuul data" 2020-05-28 23:18:46 +00:00
Clark Boylan
eb22e01f31 Add support for multiple jvbs behind meetpad
The jitsi video bridge (jvb) appears to be the main component we'll need
to scale up to handle more users on meetpad. Start preliminary
ansiblification of scale out jvb hosts.

Note this requires each new jvb to run on a separate host as the jvb
docker images seem to rely on $HOSTNAME to uniquely identify each jvb.

Change-Id: If6d055b6ec163d4a9d912bee9a9912f5a7b58125
2020-05-20 13:41:30 -07:00
James E. Blair
085856e318 Add iptables_extra_allowed_groups
This adds a new variable for the iptables role that allows us to
indicate all members of an ansible inventory group should have
iptables rules added.

It also removes the unused zuul-executor-opendev group, and some
unused variables related to the snmp rule.

Also, collect the generated iptables rules for debugging.

Change-Id: I48746a6527848a45a4debf62fd833527cc392398
Depends-On: https://review.opendev.org/728952
2020-05-20 13:18:29 -07:00
Ian Wienand
0d004ea73d testinfra: pass inventory and zuul data
Create a zuul_data fixture for testinfra.

The fixture directly loads the inventory from the inventory YAML file
written out.  This lets you get easy access to the IP addresses of the
hosts.

We pass in the "zuul" variable by writing it out to a YAML file on
disk, and then passing an environment variable to this.  This is
useful for things like determining which job is running.  Additional
arbitrary data could be added to this if required.

Change-Id: I8adb7601f7eec6d48509f8f1a42840beca70120c
2020-05-20 13:41:04 +10:00
Ian Wienand
7b8b788ce2 Add focal testing for mirror nodes
Change-Id: I64de9a61c5044b93f6ce7e2d31cf51d78fd4ec16
2020-05-13 05:32:54 +10:00
Zuul
e56cbdcee3 Merge "Run nodepool launchers with ansible and containers" 2020-05-06 14:21:53 +00:00
Zuul
9b1161e051 Merge "Set up robots.txt on lists servers" 2020-04-30 18:42:55 +00:00
Monty Taylor
e0619f17f1 Run nodepool launchers with ansible and containers
We don't run start in prod normally but we do need to run
it in the gate.

Change-Id: Iec50684280409eb978bf5638bf74ae16fad8aa26
2020-04-30 17:37:22 +00:00
Zuul
fdfbc3d0b9 Merge "Run zookeeper cluster in nodepool jobs" 2020-04-29 22:22:37 +00:00
Monty Taylor
8d7075b02f Run zookeeper cluster in nodepool jobs
Rather than running a local zookeeper, just run a real zookeeper.
Also, get rid of nb01-test and just use nb04 - what could possibly
go wrong?

Dynamically write zookeeper host information to nodepool.yaml

So that we can run an actual zk using the new zk role on hosts in
ansible inventory, we need to write out the ip addresses of the
hosts that we build in zuul. This means having the info baked in
to the file in project-config isn't going to work.

We can do this in prod too, it shouldn't hurt anything.

Increase timeout for run-service-nodepool

We need to fix the playbook, but we'll do that after we get the
puppet gone.

Change-Id: Ib01d461ae2c5cec3c31ec5105a41b1a99ff9d84a
2020-04-29 16:18:25 -05:00
Clark Boylan
eeac5467c3 Set up robots.txt on lists servers
This sets up a robots.txt on our lists servers. To start this file
prevents SEMrush bot from indexing our lists as that has been causing
lists.openstack.org to OOM with many listinfo processes started by
Apache.

We've avoided this OOM by manually configuring this robots.txt. Other
things we have ruled out are bup and input email causes qrunner's to
grow unexpectedly large. Fairly confident this bot is the trigger.

Note this fixes testing by adding 'hieradata' to set listpassword var.

Depends-On: https://review.opendev.org/724389
Change-Id: Id4f6739a8cf6a01f9796fa54c86ba1af3e31fecf
2020-04-29 17:48:13 +00:00
Monty Taylor
767e001cd6 Run test playbooks with more forks
As we add jobs that have more nodes in them, we need to make
sure we're running ansible with enough forks that the jobs
don't take forever.

Change-Id: I2b5bf55bd65eaf0fc2671f5379bd0cb5c3696f87
2020-04-29 12:04:22 -05:00
Monty Taylor
05b0587871 Add nodepool node key
Change-Id: I28ccb83fc984190b1ce8e3e18c5945209fcb2387
2020-04-24 17:54:50 -05:00
Zuul
b21a8e58cf Merge "Run Zuul using Ansible and Containers" 2020-04-24 16:31:42 +00:00
Zuul
1b2d113c0f Merge "Split eavesdrop into its own playbook" 2020-04-24 15:02:34 +00:00
Monty Taylor
f0b77485ec Run Zuul using Ansible and Containers
Zuul is publishing lovely container images, so we should
go ahead and start using them.

We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.

Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.

Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40
2020-04-24 09:18:44 -05:00
Monty Taylor
9fd2135a46 Split eavesdrop into its own playbook
Extract eavedrop into its own service playbook and
puppet manifest. While doing that, stop using jenkinsuser
on eavesdrop in favor of zuul-user.

Add the ability to override the keys for the zuul user.

Remove openstack_project::server, it doesn't do anything.

Containerize and anisblize accessbot. The structure of
how we're doing it in puppet makes it hard to actually
run the puppet in the gate. Run the script in its own
playbook so that we can avoid running it in the gate.

Change-Id: I53cb63ffa4ae50575d4fa37b24323ad13ec1bac3
2020-04-23 14:34:28 -05:00
Zuul
0b46f403ec Merge "Rearrange set-hostnames and cloud-init removal" 2020-04-21 20:18:55 +00:00
Monty Taylor
68b50ca05b Rearrange set-hostnames and cloud-init removal
In launch-node, we run two playbooks that aren't part of base.
One sets the system's hostname and removes cloud-init, the other
runs unattended update.

We need to run the hostname setting in our functional tests so
that the hosts behave as expected, but running the cloud-init
removal is a little weird, since our test nodes already don't
have it.

Make it so that set-hostname actually just sets the hostname,
and then run it in run-base. For running puppet, we need the
host to have the correct hostname.

Move cloud-init removal to the base-server role. Also move
the autoremove into base-server, since it's probably a nice
way to get rid of excess things.

Change-Id: I53cb8c515444a7d73b839e799c5794b067429daa
2020-04-21 13:18:24 -05:00
James E. Blair
f7bf07a03d Use real passwords for meetpad
The docker containers expect this now and refuse to start with
fake passwords.

Change-Id: I4c4bd243c9684e3987eeb99e4c66d31a882336a0
2020-04-20 09:05:51 -07:00
Monty Taylor
ebae022d07 Use project-config from zuul instead of direct clones
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.

Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.

Rename zuulcd to zuul

To better align prod and test, name the zuul user zuul.

Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
2020-04-15 12:29:33 -05:00
Monty Taylor
c117c1106d Update install-ansible away from /opt/system-config
So that we can start running things from the zuul source rather
thatn update-system-config and /opt/system-config, we need to
install a few things onto the host in install-ansible so that the
ansible env is standalone.

This introduces a split execution path. The ansible config is
now all installed globally onto the machine by install-ansible
and does not reference a git checkout.

For running ad-hoc commands, an ansible.cfg is introduced inside
the root of the system-config dir. So if ansible-playbook is
executed with PWD==/opt/system-config it will find that ansible.cfg,
it will take precedence, and any content from system-config
will take precedence.

As a followup we'll make /opt/system-config/ansible.cfg written
out by install-ansible from the same template, and we'll update
the split to make ansible only work when executed from one of
the two configured locations, so that it's clear where we're
operating from.

Change-Id: I097694244e95751d96e67304aaae53ad19d8b873
2020-04-14 14:54:23 -05:00
Monty Taylor
b23515c623 Make a new dockerized etherpad.opendev.org
Upstream likes building the settings file into the image, but that's
less exciting, let's bind-mount ours in.

Depends-On: https://review.opendev.org/717491/
Change-Id: Ia1894d884ef2a84e1282345b77fe07bf8898f367
2020-04-07 11:10:57 -05:00
Monty Taylor
2e6cf25e5d Rename bridge.yaml to install-ansible.yaml
We have a bridge.yaml and a service-bridge.yaml and it keeps
being confusing. Rename bridge.yaml to install-ansible.yaml to make
it clear what it is that it actually does.

Add a soft-depend on it for manage-projects, because if
something updates with the ansible config, we want it to
happen before running manage-projects.

Change-Id: Ia7c8dd0e32b2c4aaa674061037be5ab66d9a3581
2020-04-01 14:14:55 -05:00
Ian Wienand
e7f1062d51 Add install zookeeper role; use for nodepool-builder testing
This adds a simple role to install Zookeeper.

Add an option to nodepool-base to use this role to install Zookeeper.

Use this in the nodepool-builder gate testing where we are just
validating that the nodepool-builder container starts and is ready to
accept connections.  It needs a zookeeper to talk to, even though it
is not going to do anything.

Change-Id: I4ae89a51e454be4ee53ad4e04407162aaa8d9f9a
2020-03-06 14:02:52 +11:00
Monty Taylor
bbe8086726 Use LE certs for Apache
We're getting LE certs for the hosts now, use them in the apache
config. Also add the redirects.

Change-Id: I67d33b4c542182a2474ac0d2416357541b1c3a47
2020-02-13 10:31:59 -06:00
Monty Taylor
4de5f79599 Add Apache to Ansible for Gerrit
When we run gerrit, we also need to run Apache.

Change-Id: Ia2f1494808bd29d83e041e224cb2eb5fc406a93b
2020-02-03 07:57:36 -06:00
Ian Wienand
f49fc87f95 afs-client: move reduced cache to group variable
For gate testing we need the smaller AFS cache size applied to
everything that might install openafs, not just the mirror nodes.
Move the definition to the afs-client group.

Change-Id: Id27efd2f12f5ac3f351f65fa1ae513624a53df90
2019-12-16 15:34:12 +11:00
Zuul
29019411eb Merge "Run a gerrit container on review-dev01" 2019-12-15 19:00:21 +00:00
Clark Boylan
5392f8a27c Manage opendev.org cert with LE
This is the first step in managing the opendev.org cert with LE. We
modify gitea01.opendev.org only to request the cert so that if this
breaks the other 7 giteas can continue to serve opendev.org. When we are
happy with the results we can merge the followup change to update the
other 7 giteas.

Depends-On: https://review.opendev.org/694182
Change-Id: I9587b8c2896975aa0148cc3d9b37f325a0be8970
2019-11-18 12:07:10 -08:00
James E. Blair
4f9720e76e Run a gerrit container on review-dev01
This runs gerrit in a container on review-dev01 using podman.

Remove an unused web_server.py file that we found from copying it
from puppet to ansible.

Change-Id: I399d3cf8471bc8063022b0db0ff81718b2ee2941
2019-10-29 08:29:17 +09:00
Ian Wienand
912dff49e7 Set zuul_work_dir for tox testing
Setting this to system-config allows us to run the base tests as 3rd
party ci for projects like testinfra.

Change-Id: I2d15df154dcdc7c5da6c3326fbecec2146201164
2019-09-09 09:44:43 +10:00
Ian Wienand
814e4be128 Ansible roles for backup
This introduces two new roles for managing the backup-server and hosts
that we wish to back up.

Firstly the "backup" role runs on hosts we wish to backup.  This
generates and configures a separate ssh key for running bup and
installs the appropriate cron job to run the backup daily.

The "backup-server" job runs on the backup server (or, indeed
servers).  It creates users for each backup host, accepts the remote
keys mentioned above and initalises bup.  It is then ready to receive
backups from the remote hosts.

This eliminates a fairly long-standing requirement for manual setup of
the backup server users and keys; this section is removed from the
documentation.

testinfra coverage is added.

Change-Id: I9bf74df351e056791ed817180436617048224d2c
2019-08-05 16:59:57 +10:00
Ian Wienand
814b42f616 Set openafs cache sizes for mirror/mirror-update
Set the openafs cache values to the same as the puppet set values for
openafs-client role users.

Change-Id: I5a58673cad8df2a1e8dddb592c322e751d7f2ac5
2019-07-19 12:04:26 -07:00