168 Commits

Author SHA1 Message Date
Ian Wienand
9f4cbcfbc2 Expand gerrit testing to multiple changes
This reworks the gerrit testing slightly to give some broader
coverage.

It sets up ssh keys for the user; not really necessary but can be
helpful when interacting on a held host.

It sets up groups and verification labels just so Zuul can comment
with -2/+2; again this is not really necessary, but makes things a
little closer to production reality.

We make multiple changes, so we can better test navigating between
them.  The change comments are updated to have some randomness in them
so they don't all look the same.  We take screen shots of two change
pages to validate the navigation between them.

Change-Id: I60b869e4fdcf8849de836e33db643743128f8a70
2021-02-01 14:06:08 +11:00
Ian Wienand
738b4ba739 gerrit: Install zuul-summary-results plugin
This installs the zuul-summary-results plugin into our gerrit
container.  testinfra is updated to take a screenshot of the plugin in
action.

Change-Id: Ie0a165cc6ffc765c03457691901a1dd41ce99d5a
2021-01-18 07:58:23 -08:00
Ian Wienand
d1694d4c98 gerrit: Initalize in testing
By setting the auth type to DEVELOPMENT_BECOME_ANY_ACCOUNT and passing
--dev to the init process, gerrit will create an initial admin user
for us.  We leverage this user to create a sample project, change,
Zuul user and sample CI result comment.

We also update testinfra to take some screenshots of gerrit and report
them back.

Change-Id: I56cda99790d3c172e10b664e57abeca10efc5566
2021-01-18 07:58:23 -08:00
Clark Boylan
4d41c1002c Fix review01's fqdn in infratesting
This server is canonicallly named review01.openstack.org in inventory.
We need to use that inventory name in our testing.

Change-Id: I1d16469f5abb764978945b5209e01a4e7d2ccb3d
2021-01-18 07:58:23 -08:00
Ian Wienand
595dfd1166 system-config-run-review: remove review-dev server
We don't need to test two servers in this test; remove review-dev.
Consensus seems to be this was for testing plans that have now been
superseded.

Change-Id: Ia4db5e0748e1c82838000c9b655808c3d8b74461
2020-12-15 11:09:17 +11:00
Ian Wienand
927046f18a bup: Remove from hosts
To complete our transition to borg backups, remove bup-related bits
from backup hosts.  All hosts have been backing up with borg since
Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb.

Change-Id: Ie99f8cee9befee28bcf74bff9f9994c4b17b87ff
2020-12-11 09:09:53 +11:00
Zuul
e48ac000e3 Merge "codesearch: Add robots.txt" 2020-11-23 05:41:33 +00:00
Zuul
03edbd8b14 Merge "docker: install rsyslog to capture container output" 2020-11-20 09:12:23 +00:00
Ian Wienand
1288de67aa codesearch: Add robots.txt
We don't want anything on the codesearch page indexed

Change-Id: I556b77013cf1b7ff2c03426fea92a6d445131f6d
2020-11-20 19:13:32 +11:00
Ian Wienand
368466730c Migrate codesearch site to container
The hound project has undergone a small re-birth and moved to

 https://github.com/hound-search/hound

which has broken our deployment.  We've talked about leaving
codesearch up to gitea, but it's not quite there yet.  There seems to
be no point working on the puppet now.

This builds a container than runs houndd.  It's an opendev specific
container; the config is pulled from project-config directly.

There's some custom scripts that drive things.  Some points for
reviewers:

 - update-hound-config.sh uses "create-hound-config" (which is in
   jeepyb for historical reasons) to generate the config file.  It
   grabs the latest projects.yaml from project-config and exits with a
   return code to indicate if things changed.

 - when the container starts, it runs update-hound-config.sh to
   populate the initial config.  There is a testing environment flag
   and small config so it doesn't have to clone the entire opendev for
   functional testing.

 - it runs under supervisord so we can restart the daemon when
   projects are updated.  Unlike earlier versions that didn't start
   listening till indexing was done, this version now puts up a "Hound
   is not ready yet" message when while it is working; so we can drop
   all the magic we were doing to probe if hound is listening via
   netstat and making Apache redirect to a status page.

 - resync-hound.sh is run from an external cron job daily, and does
   this update and restart check.  Since it only reloads if changes
   are made, this should be relatively rare anyway.

 - There is a PR to monitor the config file
   (https://github.com/hound-search/hound/pull/357) which would mean
   the restart is unnecessary.  This would be good in the near and we
   could remove the cron job.

 - playbooks/roles/codesearch is unexciting and deploys the container,
   certificates and an apache proxy back to localhost:6080 where hound
   is listening.

I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.

Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
2020-11-20 07:41:12 +11:00
Zuul
77c930c2bb Merge "grafana: fix typo in test name" 2020-11-05 22:38:02 +00:00
Ian Wienand
a529cdc221 grafana: fix typo in test name
Change-Id: I1365432255dce16e3ad3294d78300a8f72f5f689
2020-11-05 13:57:04 +11:00
Ian Wienand
eb07ab3613 borg-backup: add fuse
Add the FUSE dependencies for our hosts backed up with borg, along
with a small script to make mounting the backups easier.  This is the
best way to recover something quickly in what is sure to be a
stressful situation.

Documentation and testing is updated.

Change-Id: I1f409b2df952281deedff2ff8f09e3132a2aff08
2020-11-05 11:56:46 +11:00
Ian Wienand
77eb5dfb66 reprepro: install keytab
In converting this to ansible I forgot to install the reprepro keytab.
The encoded secret has been added for production.

Change-Id: I39d586e375ad96136cc151a7aed6f4cd5365f3c7
2020-10-27 15:14:47 +11:00
Ian Wienand
5596d57be7 reprepro: fixup script name
Everything expects this to be called 'reprepro-mirror-update' (no
.sh); rename the file.

Change-Id: I8ec6ff4ed2afe6487959ef56dc0603f9d316d1a3
2020-10-27 15:09:46 +11:00
Ian Wienand
694241ad77 docker: install rsyslog to capture container output
This started with me wondering why gerritbot was putting all its
output into /var/log/syslog -- it turns out Xenial docker is
configured to use journalctl (which forwards to syslog) and Bionic
onwards uses json-file.

Both are sub-optimial; but particularly the json-file because we lose
the logs when the container dies.  This proposes moving to a more
standard model of having the containers log to syslog and redirecting
that to files on disk.

Install a rsyslog configuration to capture "docker-*" program names
and put them into logfiles in /var/log/containers.  Also install
rotation for these files.

In an initial group of docker-compose files, setup logging to syslog
which should then be captured into these files.  Add some basic
testing.

If this works OK, I think we can standardise our docker-compose files
like this to caputure the logs the same everywhere.

Change-Id: I940a5b05057e832e2efad79d9a2ed5325020ed0c
2020-10-19 16:06:03 +11:00
Ian Wienand
3eceba5749 reprepro: convert to Ansible
This converts the reprepro configuration from our existing puppet to
Ansible.

This takes a more direct approach; the templating done by the puppet
version started simple but over the years grew several different
options to handle various use-cases.  This means you not only had to
understand the rather obscure reprepro configuration, but then *also*
figure out how to translate that from our puppet template layers.

Here the configuration files are kept directly (they were copied from
the existing mirror-update.openstack.org) and deployed with some light
wrapper tasks in reprepro/tasks/utils which avoids most duplication.

Note the initial cron jobs are left disabled so we can run some manual
testing before letting it go automatically.

Change-Id: I96a9ff1efbf51c4164621028b7a3a1e2e1077d5c
2020-10-19 14:06:57 +11:00
Clark Boylan
9b6398394d Remove docker v1 registry proxy from our mirrors
Docker has long planned to turn this off and it appears that they have
done so. Planning details can be found at:
https://www.docker.com/blog/registry-v1-api-deprecation/

Removing this simplifies our configs as well as testing. Do this as part
of good hygiene.

Change-Id: I11281167a87ba30b4ebaa88792032aec1af046c1
2020-10-16 12:35:37 -07:00
Ian Wienand
a86ba4590b install-borg: bump to latest version
Since we haven't used this anywhere yet, let's start with the latest
version.

Fix role matching for job too.

Change-Id: I22620fc7ade8fbdb664100ef6b6ab98c93d6104f
2020-10-12 15:07:38 +11:00
Ian Wienand
03727e4941 tarballs.opendev.org: better redirects
This matches the file, which got lost in my original script because I
didn't quote a $.  Also add some quotes for better grouping.

Change-Id: I335e89616f093bdd2f0599b1ea1125ec642515ba
2020-10-02 12:22:28 +10:00
Zuul
083e8b43ea Merge "Add borg-backup roles" 2020-10-01 07:36:47 +00:00
Clark Boylan
9fdbd56d16 Remove nb04
This was a host used to transition to docker run nodepool builders. That
transition has been completed for nb01.opendev.org and nb02.opendev.org
and we don't need the third x86 builder.

Change-Id: I93c7fc9b24476527b451415e7c138cd17f3fdf9f
2020-09-18 11:12:04 -07:00
Ian Wienand
139dd374ec letsencrypt test: fix email match
It seems acme.sh might have been rewriting this with quotes, and has
now stopped doing that.  Fix the match.

Change-Id: I3c363c498580b79a1a9ed07da6ed3ac72807383b
2020-08-25 14:42:54 +10:00
Clark Boylan
506a11f9d2 Add ansible role to manage gerritbot
This new ansible role deploys gerritbot with docker-compose on
eavesdrop.openstack.org. This way we can run it where the other bots
live.

Testing is rudimentary for now as we don't really want to connect to a
production gerrit and freenode. We check things the best we can.

We will want to coordinate deployment of this change with disabling the
running service on the gerrit server.

Depends-On: https://review.opendev.org/745240
Change-Id: I008992978791ff0a38f92fb4bc529ff643f01dd6
2020-08-07 13:20:18 -07:00
Ian Wienand
028d655375 Add borg-backup roles
This adds roles to implement backup with borg [1].

Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal.  This means it is effectively end-of-life.  borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported.  It also has the clarkb seal
of approval :)

As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server.  The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.

This chooses to install borg in a virtualenv on /opt.  This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync.  Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ.  I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.

Borg has a lot of support for encrypting the data at rest in various
ways.  However, that introduces the possibility we could lose both the
key and the backup data.  Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.

The remote end server is configured via ssh command rules to run in
append-only mode.  This means a misbehaving client can't delete its
old backups.  In theory we can prune backups on the server side --
something we could not do with bup.  The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.

Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server.  Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.

No hosts are currently in the borg groups, so this can be applied
without affecting production.  I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this.  After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.

[1] https://borgbackup.readthedocs.io/en/stable/

Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
2020-07-21 17:36:50 +10:00
Zuul
fdb446f0e3 Merge "testinfra: silence yaml.load() warnings" 2020-07-16 23:45:51 +00:00
James E. Blair
7a32463f9d Revert "Revert "Add Zookeeper TLS support""
This reverts commit 05021f11a29a0213c5aecddf8e7b907b7834214a.

This switches Zuul and Nodepool to use Zookeeper TLS.  The ZK
cluster is already listening on both ports.

Change-Id: I03d28fb75610fbf5221eeee28699e4bd6f1157ea
2020-07-15 15:45:48 -07:00
Ian Wienand
711b2493a9 testinfra: silence yaml.load() warnings
Switch to safe_load to silence warnings in output

Change-Id: If91f79a4648920999de8e6bf6e0c9fec82fde233
2020-07-15 07:03:22 +10:00
Zuul
623c93d632 Merge "gitea: crawler UA reject rules" 2020-07-07 21:10:54 +00:00
Zuul
466e14b5f7 Merge "gitea: Add reverse proxy option" 2020-07-07 21:07:57 +00:00
Ian Wienand
185797a0e5 Graphite container deployment
This deploys graphite from the upstream container.

We override the statsd configuration to have it listen on ipv6.
Similarly we override the ngnix config to listen on ipv6, enable ssl,
forward port 80 to 443, block the /admin page (we don't use it).

For production we will just want to put some cinder storage in
/opt/graphite/storage on the production host and figure out how to
migrate the old stats.  The is also a bit of cleanup that will follow,
because we half-converted grafana01.opendev.org -- so everything can't
be in the same group till that is gone.

Testing has been added to push some stats and ensure they are seen.

Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d
2020-07-03 07:17:28 +10:00
Ian Wienand
b146181174 Grafana container deployment
This uses the Grafana container created with
Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the
grafana.opendev.org service.

We retain the old model of an Apache reverse-proxy; it's well tested
and understood, it's much easier than trying to map all the SSL
termination/renewal/etc. into the Grafana container and we don't have
to convince ourselves the container is safe to be directly web-facing.

Otherwise this is a fairly straight forward deployment of the
container.  As before, it uses the graph configuration kept in
project-config which is loaded in with grafyaml, which is included in
the container.

Once nice advantage is that it makes it quite easy to develop graphs
locally, using the container which can talk to the public graphite
instance.  The documentation has been updated with a reference on how
to do this.

Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4
2020-07-03 07:17:22 +10:00
Ian Wienand
8d0d6155ed gitea: crawler UA reject rules
As described inline, this crawler is causing us problems as it hits
the backends indiscriminately.  Block it via the known UA strings,
which luckily are old so should not cause real client issues.

Change-Id: I0d78a8b625b69f600e00e8b3ea64576e0fdb84d9
2020-07-01 16:15:59 +10:00
Ian Wienand
870f664648 gitea: Add reverse proxy option
This adds an option to have an Apache based reverse proxy on port 3081
forwarding to 3000.  The idea is that we can use some of the Apache
filtering rules to reject certain traffic if/when required.

It is off by default, but tested in the gate.

Change-Id: Ie34772878d9fb239a5f69f2d7b993cc1f2142930
2020-07-01 15:33:05 +10:00
Ian Wienand
eb3e58da91 gitea-image: add a robots.txt
This looks like a very sane default robots.txt.  We can modify it as
required.

Change-Id: I8b9d3aa63538388e319f0216535f7a1d977f4885
2020-07-01 06:38:18 +10:00
James E. Blair
a514aa0f98 Zookeeper: listen on plain and TLS ports
To prepare for switching to TLS, set up TLS certs for Zookeeper and
all of Nodepool and Zuul, but do not have them connect over TLS yet.
We have observed problems with Kazoo using TLS in production.  This
will let us run the ZK quorum using TLS internally, and have Zuul
and Nodepool connect over plaintext while also exposing the TLS
client port so that we can perform some more production tests.

Change-Id: If93b27f5b55be42be1cf6ee23258127fab5ce9ea
2020-06-17 10:38:59 -07:00
James E. Blair
05021f11a2 Revert "Add Zookeeper TLS support"
This reverts commit 29825ac18b58145f007f64b2998357445b8fdd91.

We observed this issue in production:
https://github.com/python-zk/kazoo/issues/587

Revert until we find a fix.

Change-Id: Ib7b8e3b06770a83b39458d09d2b1e655bd94bd22
2020-06-16 11:15:48 -07:00
James E. Blair
ada91cdad9 Merge "Add Zookeeper TLS support" 2020-06-15 21:48:14 +00:00
James E. Blair
e989281e02 Merge "Stop using backend hostname in zuul testinfra tests" 2020-06-15 21:47:43 +00:00
James E. Blair
29825ac18b Add Zookeeper TLS support
This creates TLS certs for Zookeeper, uses them inside the ZK
quorum, and configures Nodepool and Zuul to use them as well.

A full system restart of all ZK-related components will be required
after merging this patch.

Change-Id: I0cb96a989f3d2c7e0563ce8899f2a5945ea225b3
2020-06-15 11:19:47 -07:00
Ian Wienand
ccd3ac2344 Add tool to export Rackspace DNS domains to bind format
This exports Rackspace DNS domains to bind format for backup and
migration purposes.

This installs a small tool to query and export all the domains we can
see via the Racksapce DNS API.

Because we don't want to publish the backups (it's the equivalent of a
zone xfer) it is run on, and logs output to, bridge.openstack.org from
cron once a day.

Change-Id: I50fd33f5f3d6440a8f20d6fec63507cb883f2d56
2020-06-12 16:49:23 +10:00
Zuul
7c913ab48b Merge "Test etherpad with testinfra" 2020-06-12 00:03:54 +00:00
Clark Boylan
7caf3a6c6d Test etherpad with testinfra
This adds simple testing of the etherpad service to testinfra.

Change-Id: I3c89a0a92a41cf69d075d6cef99fa12db68b17c6
2020-06-11 10:24:39 -07:00
James E. Blair
3d6cefe9dd Stop using backend hostname in zuul testinfra tests
Tests that call host.backend.get_hostname() to switch on test
assertions are likely to fail open.  Stop using this in zuul tests
and instead add new files for each of the types of zuul hosts
where we want to do additional verification.

Share the iptables related code between all the tests that perform
iptables checks.

Also, some extra merger test and some negative assertions are added.

Move multi-node-hosts-file to after set-hostname. multi-node-hosts-file
is designed to append, and set-hostname is designed to write.

When we write the gate version of the inventory, map the nodepool
private_ipv4 address as the public_v4 address of the inventory host
since that's what is written to /etc/hosts, and is therefore, in the
context of a gate job, the "public" address.

Change-Id: Id2dad08176865169272a8c135d232c2b58a7a2c1
2020-06-10 14:48:40 -07:00
Monty Taylor
83ced7f6e6 Split inventory into multiple dirs and move hostvars
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.

Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.

A followup patch will move host-specific values into equivilent
files in inventory/base.

This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.

Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
2020-06-04 07:44:36 -05:00
Monty Taylor
d93a661ae4 Run iptables in service playbooks instead of base
It's the only part of base that's important to run when we run a
service. Run it in the service playbooks and get rid of the
dependency on infra-prod-base.

Continue running it in base so that new nodes are brought up
with iptables in place.

Bump the timeout for the mirror job, because the iptables addition
seems to have just bumped it over the edge.

Change-Id: I4608216f7a59cfa96d3bdb191edd9bc7bb9cca39
2020-06-04 07:44:22 -05:00
Zuul
3f61433c59 Merge "Generate ssl check list directly from letsencrypt variables" 2020-05-28 23:31:11 +00:00
Zuul
fc39f87f1e Merge "testinfra: pass inventory and zuul data" 2020-05-28 23:18:46 +00:00
Ian Wienand
c9215801f0 Generate ssl check list directly from letsencrypt variables
This autogenerates the list of ssl domains for the ssl-cert-check tool
directly from the letsencrypt list.

The first step is the install-certcheck role that replaces the
puppet-ssl_cert_check module that does the same.  The reason for this
is so that during gate testing we can test this on the test
bridge.openstack.org server, and avoid adding another node as a
requirement for this test.

letsencrypt-request-certs is updated to set a fact
letsencrypt_certcheck_domains for each host that is generating a
certificate.  As described in the comments, this defaults to the first
host specified for the certificate and the listening port can be
indicated (if set, this new port value is stripped when generating
certs as is not necessary for certificate generation).

The new letsencrypt-config-certcheck role runs and iterates all
letsencrypt hosts to build the final list of domains that should be
checked.  This is then extended with the
letsencrypt_certcheck_additional_domains value that covers any hosts
using certificates not provisioned by letsencrypt using this
mechanism.

These additional domains are pre-populated from the openstack.org
domains in the extant check file, minus those openstack.org domain
certificates we are generating via letsencrypt (see
letsencrypt-create-certs/handlers/main.yaml).  Additionally, we
update some of the certificate variables in host_vars that are
listening on port .

As mentioned, bridge.openstack.org is placed in the new certcheck
group for gate testing, so the tool and config file will be deployed
to it.  For production, cacti is added to the group, which is where
the tool currently runs.  The extant puppet installation is disabled,
pending removal in a follow-on change.

Change-Id: Idbe084f13f3684021e8efd9ac69b63fe31484606
2020-05-20 14:27:14 +10:00
Ian Wienand
0d004ea73d testinfra: pass inventory and zuul data
Create a zuul_data fixture for testinfra.

The fixture directly loads the inventory from the inventory YAML file
written out.  This lets you get easy access to the IP addresses of the
hosts.

We pass in the "zuul" variable by writing it out to a YAML file on
disk, and then passing an environment variable to this.  This is
useful for things like determining which job is running.  Additional
arbitrary data could be added to this if required.

Change-Id: I8adb7601f7eec6d48509f8f1a42840beca70120c
2020-05-20 13:41:04 +10:00