161 Commits

Author SHA1 Message Date
James E. Blair
b9f7f5506f Use infra-prod-base in infra-prod jobs
This uses a new base job which handles pushing the git repos on to
bridge since that must now happen in a trusted playbook.

Depends-On: https://review.opendev.org/742934
Change-Id: Ie6d0668f83af801c0c0e920b676f2f49e19c59f6
2020-07-24 09:04:50 -07:00
Ian Wienand
028d655375 Add borg-backup roles
This adds roles to implement backup with borg [1].

Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal.  This means it is effectively end-of-life.  borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported.  It also has the clarkb seal
of approval :)

As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server.  The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.

This chooses to install borg in a virtualenv on /opt.  This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync.  Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ.  I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.

Borg has a lot of support for encrypting the data at rest in various
ways.  However, that introduces the possibility we could lose both the
key and the backup data.  Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.

The remote end server is configured via ssh command rules to run in
append-only mode.  This means a misbehaving client can't delete its
old backups.  In theory we can prune backups on the server side --
something we could not do with bup.  The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.

Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server.  Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.

No hosts are currently in the borg groups, so this can be applied
without affecting production.  I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this.  After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.

[1] https://borgbackup.readthedocs.io/en/stable/

Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
2020-07-21 17:36:50 +10:00
Clark Boylan
4ebff6f9b2 Run our etherpad prod deploy job when docker updates
We want to pick up changes to our docker setup in production. Without
this we don't get the infra-prod-service-etherpad job running when we
update the etherpad docker image.

Change-Id: I25aee457b7c0547fc11439301054bb5aef799476
2020-07-17 13:20:48 -07:00
Zuul
ae440c4fcc Merge "Fix junit error, add HTML report" 2020-07-16 23:45:53 +00:00
Zuul
4b9180acfa Merge "Copy generated inventory to bridge logs" 2020-07-16 23:45:49 +00:00
Monty Taylor
2302879244 Build multi-arch python-base/python-builder
In order to build multi-arch python images, we need
multi-arch python base and builder images.

Change-Id: Ifc0d6f7c16876bf55db8e1ee459a3eaa07744547
2020-07-15 09:09:35 -07:00
Ian Wienand
ba45f251d1 Fix junit error, add HTML report
Specifying the family stops a deprecation warning being output.

Add a HTML report and report it as an artifact as well; this is easier
to read.

Change-Id: I2bd6505c19cee2d51e9af27e9344cfe2e1110572
2020-07-15 07:03:22 +10:00
Ian Wienand
a020568ee5 Copy generated inventory to bridge logs
This is the inventory generated and used by bridge, copy it into the
logs as well.

Change-Id: I15d0ddc4c8340735c0332139ddedc06fc05b8269
2020-07-15 07:03:22 +10:00
Zuul
c2b2efdf5b Merge "Graphite container deployment" 2020-07-07 00:41:10 +00:00
Zuul
1d610297f3 Merge "Grafana container deployment" 2020-07-06 05:56:02 +00:00
Ian Wienand
3cf11d298e Update grafana-container
There is a new release, update base container.  Add promote job that
was forgotten with the original commit
Iddfafe852166fe95b3e433420e2e2a4a6380fc64.

Change-Id: Ie0d7febd2686d267903b29dfeda54e7cd6ad77a3
2020-07-06 10:48:25 +10:00
Ian Wienand
185797a0e5 Graphite container deployment
This deploys graphite from the upstream container.

We override the statsd configuration to have it listen on ipv6.
Similarly we override the ngnix config to listen on ipv6, enable ssl,
forward port 80 to 443, block the /admin page (we don't use it).

For production we will just want to put some cinder storage in
/opt/graphite/storage on the production host and figure out how to
migrate the old stats.  The is also a bit of cleanup that will follow,
because we half-converted grafana01.opendev.org -- so everything can't
be in the same group till that is gone.

Testing has been added to push some stats and ensure they are seen.

Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d
2020-07-03 07:17:28 +10:00
Ian Wienand
b146181174 Grafana container deployment
This uses the Grafana container created with
Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the
grafana.opendev.org service.

We retain the old model of an Apache reverse-proxy; it's well tested
and understood, it's much easier than trying to map all the SSL
termination/renewal/etc. into the Grafana container and we don't have
to convince ourselves the container is safe to be directly web-facing.

Otherwise this is a fairly straight forward deployment of the
container.  As before, it uses the graph configuration kept in
project-config which is loaded in with grafyaml, which is included in
the container.

Once nice advantage is that it makes it quite easy to develop graphs
locally, using the container which can talk to the public graphite
instance.  The documentation has been updated with a reference on how
to do this.

Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4
2020-07-03 07:17:22 +10:00
Zuul
d673281a75 Merge "Deal with gitea pagination of repo lists" 2020-06-26 15:30:38 +00:00
Clark Boylan
9b5e5d3c57 Deal with gitea pagination of repo lists
We list gitea repos to determine if we need to create a repo. If the
repo isn't listed by gitea we create it. New gitea paginates these
listings so we were only getting 30 repos listed when we had far more.
This resulted in us trying to create repos which already exist which is
a gitea http 409 error.

Fix this by paging through the listings until we've seen all the
repos. This should give us a complete listing.

To test this we run our manage-projects playbook twice in the
system-config-run-gitea job. The first pass creates all the new
projects. Then the second pass should noop cleanly.

Change-Id: I73b77b9ddaa0106d4dc0a49c4d4b7751a39a16f9
Co-Authored-By: Jeremy Stanley <fungi@yuggoth.org>
2020-06-25 13:51:27 -07:00
Ian Wienand
330e297318 Add a grafana/grafyaml image
This is a docker image based on the latest upstream Grafana with
grafyaml also installed inside.  It includes a small script to run a
refresh of the dashboards.

Change-Id: Iddfafe852166fe95b3e433420e2e2a4a6380fc64
2020-06-24 08:21:26 +10:00
Ian Wienand
44fe656159 Acutally run system-config arm64 test on an arm64 node
I forgot the -arm64

Change-Id: I8fbda232840ba22cd0886ebdf9657a05520741a7
2020-06-22 14:01:53 +10:00
Ian Wienand
8acd503692 mirror-update: update to focal
We want a more recent version of rsync, and upgrading to focal is one
easy way to get it, and to also have a base OS with a longer support
period.  Test it in the gate.

Change-Id: I1edf074e5fe788ef75693d2cd172370c05bf4732
2020-06-18 14:24:27 +10:00
Monty Taylor
9b28b8864a Run restart playbooks to test they work
This runs our zuul and nodepool restart playbooks after the initial
service installs in the system-config-run jobs. This will help ensure
that they work consistently over time.

Fix nodepool restart playbook

Change-Id: I953e7222218c5555bb44fccd913eaa5e9374c669
2020-06-16 12:03:00 -05:00
James E. Blair
ac5fc652f4 Merge "Fake zuul_connections for gate" 2020-06-15 21:47:49 +00:00
James E. Blair
e989281e02 Merge "Stop using backend hostname in zuul testinfra tests" 2020-06-15 21:47:43 +00:00
James E. Blair
7f7c155555 Fake zuul_connections for gate
We can't establish Gerrit or Github connections in the gate, so
Zuul fails to start.  Reducing the set of connections in the gate
to just smtp should allow it to start (albiet with tenant loading
errors).  But that should let us test basic system setup and
internal connectivity.

Change-Id: I39d648ac5dd6ee3e9bfbc026cd6d7142461c418c
2020-06-15 09:57:39 -07:00
Zuul
7c913ab48b Merge "Test etherpad with testinfra" 2020-06-12 00:03:54 +00:00
Zuul
ee0c61f6d4 Merge "Add puppet3 tests to xenial arm64" 2020-06-11 23:12:49 +00:00
Clark Boylan
7caf3a6c6d Test etherpad with testinfra
This adds simple testing of the etherpad service to testinfra.

Change-Id: I3c89a0a92a41cf69d075d6cef99fa12db68b17c6
2020-06-11 10:24:39 -07:00
James E. Blair
3d6cefe9dd Stop using backend hostname in zuul testinfra tests
Tests that call host.backend.get_hostname() to switch on test
assertions are likely to fail open.  Stop using this in zuul tests
and instead add new files for each of the types of zuul hosts
where we want to do additional verification.

Share the iptables related code between all the tests that perform
iptables checks.

Also, some extra merger test and some negative assertions are added.

Move multi-node-hosts-file to after set-hostname. multi-node-hosts-file
is designed to append, and set-hostname is designed to write.

When we write the gate version of the inventory, map the nodepool
private_ipv4 address as the public_v4 address of the inventory host
since that's what is written to /etc/hosts, and is therefore, in the
context of a gate job, the "public" address.

Change-Id: Id2dad08176865169272a8c135d232c2b58a7a2c1
2020-06-10 14:48:40 -07:00
Zuul
8b8ba03667 Merge "ARM64 openafs role tests" 2020-06-09 23:53:44 +00:00
Zuul
ab0be49488 Merge "Integration tests: update debian stable to Buster" 2020-06-09 20:59:25 +00:00
Zuul
07eea952cd Merge "Remove Puppet 5 testing" 2020-06-09 20:59:24 +00:00
Ian Wienand
52945f81ce Add puppet3 tests to xenial arm64
nb03.openstack.org still uses this, for now, so we should test
installation.

Change-Id: I305e796160927b1fb9954126a6631df87f6527db
2020-06-10 06:38:02 +10:00
Ian Wienand
3c04791656 ARM64 openafs role tests
This tests the openafs client installation on all the arm64 types that
build wheels, where we currently need the client to copy the binary
wheel output.

Depends-On: https://review.opendev.org/733755
Change-Id: I278db0b6c8fad04ebf2f971bc7b0c007ee92ac31
2020-06-09 10:37:00 +10:00
Ian Wienand
d33e620cc0 Integration tests: update debian stable to Buster
Run the integration tests on buster, which is our current stable
distro.

Change-Id: Ie41a4c83a1411c5883190b522ce130ea72439c50
2020-06-09 10:19:33 +10:00
Ian Wienand
4fa8e22ffc Remove Puppet 5 testing
We have pivoted to Ansible and We don't use puppet5 anywhere.  Stop
testing on Bionic as we're not really interested in maintaing it, and
remove the puppet-install installation path there so we don't have
code that isn't being tested.

Change-Id: Ia2d05f7c75e46bc01717d11457b832e42522fa95
2020-06-09 10:15:05 +10:00
Monty Taylor
8c9b4af143 Stop cloning more puppet modules
Previous review pointed out some additional modules we probably
aren't using any longer.

Remove the openafs::client section from openstack_project::server
because we're doing this with ansible now.

Depends-On: https://review.opendev.org/733890
Change-Id: Ib5104da9cf7d53b77191f48ec185f5d667d51944
2020-06-05 12:09:30 -05:00
Monty Taylor
96364a11d9 Stop cloning a bunch of puppet modules we don't use
We've stopped using many of these, but we never got around to
removing them from lists.

Also, we should probably retire the repos.

Depends-On: https://review.opendev.org/717620
Depends-On: https://review.opendev.org/720527
Change-Id: I8e012c5bfa48d274dbd7f5484a9e75fee080cb5e
2020-06-05 08:42:47 -05:00
Zuul
30ff2de191 Merge "Split inventory into multiple dirs and move hostvars" 2020-06-04 22:33:37 +00:00
Zuul
5d3c8af17e Merge "Stop cloning drupal puppet modules" 2020-06-04 21:19:29 +00:00
Zuul
b25804d4cb Merge "Rename service-letsencrypt to just letsencrypt" 2020-06-04 21:19:27 +00:00
Zuul
075c4035b3 Merge "Run iptables in service playbooks instead of base" 2020-06-04 21:05:04 +00:00
Monty Taylor
83ced7f6e6 Split inventory into multiple dirs and move hostvars
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.

Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.

A followup patch will move host-specific values into equivilent
files in inventory/base.

This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.

Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
2020-06-04 07:44:36 -05:00
Monty Taylor
9abec21f8f Stop cloning drupal puppet modules
These were for groups.openstack.org which is no longer a thing.
We can retire puppet-drupal too.

Change-Id: I4a9ef3bf37545429ae7e1371be5806e26cef953e
2020-06-04 07:44:36 -05:00
Monty Taylor
f27c170d01 Rename service-letsencrypt to just letsencrypt
This isn't a service, it's a meta thing that we run for different
hosts at different times.

Change-Id: Ib65665c98afb3ddb94b15346931be88a4b1757d8
2020-06-04 07:44:36 -05:00
Monty Taylor
d93a661ae4 Run iptables in service playbooks instead of base
It's the only part of base that's important to run when we run a
service. Run it in the service playbooks and get rid of the
dependency on infra-prod-base.

Continue running it in base so that new nodes are brought up
with iptables in place.

Bump the timeout for the mirror job, because the iptables addition
seems to have just bumped it over the edge.

Change-Id: I4608216f7a59cfa96d3bdb191edd9bc7bb9cca39
2020-06-04 07:44:22 -05:00
James E. Blair
1210ef366c Move run-eavesdrop from periodic-hourly to periodic
This is required by the accessbot job which is in periodic.  We
moved it to hourly so that ptgbot could be updated more often, but
without it being in periodic, no periodic jobs are running, and that
seems more critical at the moment.

Change-Id: I0c7dbc0db77f295820302441e495fe4e9ea7d726
2020-06-02 09:33:15 -07:00
Jeremy Stanley
918dd0ed91 Deploy eavesdrop hourly
Since changes to some services on eavesdrop, for example ptgbot, may
need to take effect fairly quickly, run the playbook hourly rather
than daily. We can't easily trigger on changes merging to the ptgbot
repo in the future when it's in a different Zuul tenant from
system-config.

Change-Id: I90ddc555ded0ac1d3134fd075d816155a475c6d2
2020-05-29 16:21:35 +00:00
Zuul
3f61433c59 Merge "Generate ssl check list directly from letsencrypt variables" 2020-05-28 23:31:11 +00:00
Monty Taylor
e8716e742e Move base roles into a base subdir
If we move these into a subdir, it cleans up the number of things
we nave to files match on.

Stop running disable-puppet-agent in base. We run it in run-puppet
which should be fine.

Change-Id: Ia16adb96b11d25a097490882c4c59a50a0b7b23d
2020-05-27 16:28:37 -05:00
Clark Boylan
eb22e01f31 Add support for multiple jvbs behind meetpad
The jitsi video bridge (jvb) appears to be the main component we'll need
to scale up to handle more users on meetpad. Start preliminary
ansiblification of scale out jvb hosts.

Note this requires each new jvb to run on a separate host as the jvb
docker images seem to rely on $HOSTNAME to uniquely identify each jvb.

Change-Id: If6d055b6ec163d4a9d912bee9a9912f5a7b58125
2020-05-20 13:41:30 -07:00
James E. Blair
085856e318 Add iptables_extra_allowed_groups
This adds a new variable for the iptables role that allows us to
indicate all members of an ansible inventory group should have
iptables rules added.

It also removes the unused zuul-executor-opendev group, and some
unused variables related to the snmp rule.

Also, collect the generated iptables rules for debugging.

Change-Id: I48746a6527848a45a4debf62fd833527cc392398
Depends-On: https://review.opendev.org/728952
2020-05-20 13:18:29 -07:00
James E. Blair
b173fcb1d9 Vendor the apt repo gpg keys used for Zuul
We use several PPAs on the Zuul servers, and today the Ubuntu keyring
servers are frequently failing.  Rather than rely on them, store the
GPG keys in this repo and install the files "manually" rather than
using the apt_repo module.

Change-Id: I009a1a38d3a5864a8d5b0d8f8be24a83d1924292
2020-05-20 13:17:09 -07:00