161 Commits

Author SHA1 Message Date
Zuul
a7be740183 Merge "Fix up openafs-client job matching" 2021-04-12 22:43:13 +00:00
Zuul
b5f3f7ef49 Merge "zuul-summary-status : handle SKIPPED and ERROR jobs" 2021-04-09 02:08:58 +00:00
Zuul
3180086559 Merge "Rename refstack group variables" 2021-03-29 21:33:02 +00:00
Ian Wienand
9f11fc5c75 Remove references to review-dev
With our increased ability to test in the gate, there's not much use
for review-dev any more.  Remove references.

Change-Id: I97e9865e0b655cd157acf9ffa7d067b150e6fc72
2021-03-24 11:40:31 +11:00
Zuul
acf0e00478 Merge "Set up gitea image provides and requires for gating" 2021-03-23 18:29:35 +00:00
Clark Boylan
c2d46f4247 Set up gitea image provides and requires for gating
This should ensure that if we have a parent job that updates the gitea
version and a do not merge child job that induces an artificial failure
for zuul hold purposes that we test the correct image in the child job's
changes.

Prior to this we were testing the existing published images, but
provides + requires will give the correct signaling to make the desired
"test new proposed image" behavior happen in the child change builds.

Change-Id: Ie6b827b650e0f32606dc5ec7f4aa0adfeebdeb5e
2021-03-19 10:33:09 -07:00
Ian Wienand
aa94f2d831 Rename refstack group variables
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.

Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8
2021-03-19 16:01:46 +11:00
Zuul
3bb0573f41 Merge "system-config-run-kerberos: run twice" 2021-03-19 00:07:09 +00:00
James E. Blair
96bac7b486 Add zookeeper-statsd
This adds a program, zookeeper-statsd, which monitors zookeeper
metrics and reports them to statsd.  It also adds a container to
run that program.  And it runs the container on each of the
ZooKeeper quorum members.  And it updates the graphite host to
allow statsd traffic from quorum members.  And it updates the
4-letter-word whitelist to allow the mntr command (which is used
to gather metrics) to be issued.

Change-Id: I298f0b13a05cc615d8496edd4622438507fc5423
2021-03-17 14:52:31 -07:00
Zuul
b2b1a9062d Merge "Add new opendev.org nodepool launchers" 2021-03-17 18:13:07 +00:00
Zuul
77b1c14a9a Merge "Use upstream jitsi-meet web image" 2021-03-17 00:22:50 +00:00
Zuul
4524a92caf Merge "kerberos-kdc: role to manage Kerberos KDC servers" 2021-03-16 22:28:46 +00:00
Clark Boylan
680ed17ecd Add new opendev.org nodepool launchers
This adds the new focal nodepool launchers replacements for nl02-04 to
our inventory. This will configure them with an idle configuration. We
then confirm they are happy running in an idle state then switch over
the config from the old to new servers.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/780982
Change-Id: Iea645925caaeee6f498aa690c4f2c848f6899317
2021-03-16 15:21:58 -07:00
Zuul
a5d0329cf7 Merge "Don't always update gitea project descriptions" 2021-03-16 22:20:55 +00:00
Ian Wienand
bf886a5ab6 system-config-run-kerberos: run twice
Run the playbook twice to ensure the role doesn't change anything

Change-Id: I1c0c45ece37035a18eb9468a5d7f4f34cfec4edc
2021-03-17 08:31:55 +11:00
Ian Wienand
c1aff2ed38 kerberos-kdc: role to manage Kerberos KDC servers
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.

This role automates realm creation, initial setup, key material
distribution and replica host configuration.  None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.

Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.

Change-Id: I60b40897486b29beafc76025790c501b5055313d
2021-03-17 08:30:52 +11:00
Clark Boylan
16a4bdce02 Don't always update gitea project descriptions
There is some correlation that running the manage-projects playbook
gives our gitea fits. The bulk of the work done here is in trying to
update the descriptions of all projects. There isn't a good way to see
if the description is already set first so we just try and ignore
errors. This creates potentially thousands of operations all at once and
could be why things are sad.

We move these operations under the always update flag which is not set
on normal runs. If we really need to converge to a good updated state we
can manually run the playbook/role with always update set.

We also don't set a limit on the number of ThreadPoolExecutor workers
which will default to 5 * NumProcs. Could be that tuning this down would
make gitea happier.

One other thought is that we may not be using request sessions properly
for connection reuse. In particular requests notes that you need to set
stream to False or read request content to return a connection back to
the pool for reuse. We might look into this for further improvements.

Change-Id: I6e6fb1eb08303e9da7e38cf493d1871364340000
2021-03-16 13:06:16 -07:00
Clark Boylan
ed61423b6b Add nl01.opendev.org to our inventory
This is a new focal replacement for nl01.openstack.org. We keep
nl01.openstack.org in our inventory for now because we want ansible to
update the nodepool.yaml configs for these two hosts to coordinate a
hand off of responsibilities once we are happy with the new deployment.

We also switch the testing hostname to nl04.openstack.org as this will
be the last nodepool launcher to be removed. When we swap it out the
testing will be updated to use focal hosts.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/779863
Change-Id: Ib3ea6586fe0567c1edf6255ee9be50164d35db62
2021-03-15 09:48:22 -07:00
Ian Wienand
d33ce951c0 refstack: use CNAME for production server
The production server is trying to send itself to
refstack01.openstack.org, causing cross-site scripting issues.  In
production, use the CNAME, but use the FQDN for testing.

Fix up job file matchers while here.

Change-Id: I18a5067ee25c59c5eaa17b7c2d9bd5a942a9173d
2021-03-12 10:24:06 +11:00
Zuul
d8cfde1e22 Merge "refstack: Edit URL of public RefStackAPI" 2021-03-11 03:43:17 +00:00
Martin Kopec
834e39fc7e refstack: Edit URL of public RefStackAPI
The previous refstack server had 'api' in the endpoint
addresses of API calls. Let's try to set it in the new
instance as well to keep the same interface.

Also, fix the typo in the testinfra host match and in
the test name.

Change-Id: I7319990144396b3a753678975a09b0add3ac4465
2021-03-10 14:09:20 +11:00
James E. Blair
b768325480 Use upstream jitsi-meet web image
This has our change to open etherpad on join, so we should no longer need
to run a fork of the web server.  Switch to the upstream container image
and stop building our own.

Change-Id: I3e8da211c78b6486a3dcbd362ae7eb03cc9f5a48
2021-03-09 12:35:46 -08:00
Clark Boylan
a2fd912511 Replace ze09-12.openstack.org with ze09-12.opendev.org
These are new focal replacement servers. Because this is the last set of
replacements for the executors we also cleanup the testing of the old
servers in the system-config-run-zuul job and the inventory group
checker job.

Change-Id: I111d42c9dfd6488ef69ff1a7f76062a73d1f37bf
2021-03-08 10:13:29 -08:00
Ian Wienand
c0144eab68 Fix up openafs-client job matching
The jobs should have file matchers for "roles/openafs-client" (not
playbooks/).  Fix this.

Add the openafs/kerberos role matchers to Zuul as well, as it uses
them on the executors.

Change-Id: I66fd7792d6b533362606291e1bfc01dfa2a2e05b
2021-03-03 13:41:56 +11:00
Clark Boylan
a42c0b704a Remove ze01.openstack.org
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.

This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.

We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.

Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
2021-03-02 10:21:59 -08:00
Ian Wienand
cc37e1f7b4 zuul-summary-status : handle SKIPPED and ERROR jobs
Zuul reports SKIPPED jobs with a slightly different format, with a
fake URL and no run-time.

ERROR results include a message before the timestamp.

Update our test comments to reflect this, and the dependent change has
the updates for the zuul-summary-status plugin.

Depends-On: https://gerrit-review.googlesource.com/c/plugins/zuul-results-summary/+/298422
Change-Id: Idf2f12a8105c08f34a1600c66a3d7a26671728d2
2021-02-26 10:34:18 +11:00
Clark Boylan
de4c369c27 Replace all the zuul mergers with new focal nodes
These servers are all up and running and should be ready to go after the
zm01 canary. Note there will be a followup change to remove
zm02-zm08.openstack.org from the inventory. We split this up so that we
can keep those servers around until we're happy with the replacements.

Change-Id: Ic2671da104df2b01986d1b65c8d13507d6792c40
2021-02-23 11:09:16 -08:00
Zuul
4d85fc521a Merge "Use dstat to record performance of system-config-run hosts" 2021-02-23 00:13:59 +00:00
Zuul
1b2435c349 Merge "backups: remove all bup" 2021-02-21 22:41:41 +00:00
Ian Wienand
5a1b8ac179 grafana: take some screenshots during testing
Take some simple screenshots for basic validation of any new releases.

Change-Id: I52770032a6cc91d76da23194f58474f5ceeaed38
2021-02-17 10:43:26 +11:00
Clark Boylan
1560b01f7e Use dstat to record performance of system-config-run hosts
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.

Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
2021-02-16 14:31:30 -08:00
Ian Wienand
39ffc685d6 backups: remove all bup
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.

For reference, the servers being backed up at this time are:

 borg-ask01
 borg-ethercalc02
 borg-etherpad01
 borg-gitea01
 borg-lists
 borg-review-dev01
 borg-review01
 borg-storyboard01
 borg-translate01
 borg-wiki-update-test
 borg-zuul01

This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.

For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.

Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
2021-02-16 16:00:28 +11:00
Zuul
7a5041140b Merge "Cleanup refstack job dependencies" 2021-02-16 04:27:53 +00:00
Zuul
8360a7ceab Merge "Run gerrit 3.2 and 3.3 functional tests" 2021-02-16 04:27:46 +00:00
Zuul
d9f59d8728 Merge "Build Gerrit 3.3 images" 2021-02-11 14:55:32 +00:00
Zuul
03f5e8e0de Merge "borg-backup-server: run a weekly backup verification" 2021-02-11 05:53:16 +00:00
Ian Wienand
0d01d941b1 borg-backup-server: run a weekly backup verification
This checks the backup archives and alerts us if anything seems wrong.
This will take a few hours, so we run once a week.

Change-Id: I832c0d29a37df94d4bf2704c59bb3f8d855c3cc8
2021-02-11 00:43:16 +00:00
Ian Wienand
533e6d43fa refstack: fix typo in role matcher
Change-Id: I61929708be87a28669606ac38abf478afd70fc51
2021-02-11 10:37:31 +11:00
Clark Boylan
2bb3dd797b Cleanup refstack job dependencies
We need to depend on the buildset registry as we are building this image
in a separate job. We also don't need to depend on the build job in
gate, we only need the upload job.

Change-Id: Ie7c2ed29c028f8c23d67ad38edbe04b12e22d026
2021-02-10 15:11:54 -08:00
Clark Boylan
9b90e192b1 Run gerrit 3.2 and 3.3 functional tests
This change splits our existing system-config-run-review job into two
jobs, one for gerrit 3.2 and another for 3.3. The biggest change is that
we use a var called zuul_test_gerrit_version to select which version we
want and that ends up in the fake group file written out by Zuul for the
nested ansible run. The nested ansible run will then populate the
docker-compose file with the appropriate version for us.

Change-Id: I00b52c0f4aa8df3ecface964007fcf5724887e5e
2021-02-10 15:10:46 -08:00
Clark Boylan
7320c8e6ed Build Gerrit 3.3 images
Gerrit 3.3 has released. Lets start building images for it so that we
can do testing when ready to start that.

We also add testing files to the list of things that trigger the 3.3
builds. Strictly this isn't necessary since the test will continue to
use 3.2 images until we upgrade to 3.3, but this helps us avoid
forgetting to do this when we do upgrade. Little extra jobs run today to
ensure we continue to run the right jobs tomorrow.

Change-Id: Ib7e7d7313e0827a40009df840119444611d74ca2
2021-02-10 15:07:19 -08:00
Zuul
1d79574d82 Merge "borg-backup-server: add script for pruning borg backups" 2021-02-10 01:28:33 +00:00
Ian Wienand
78167396bf refstack: add production image and deployment jobs
Change-Id: I017a32ee374f0473525c9941c41b26c2a43bf2c8
2021-02-10 07:11:22 +11:00
Zuul
f526060e39 Merge "Deploy refstack with ansible docker" 2021-02-09 03:58:22 +00:00
Ian Wienand
4f0bfa6d9d borg-backup-server: add script for pruning borg backups
This adds a script that performs a manual pruning of backup
directories.

Change-Id: I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e
2021-02-09 11:29:46 +11:00
Clark Boylan
a4604ae0b3 Deploy refstack with ansible docker
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.

Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
2021-02-05 19:23:34 +00:00
Zuul
89cd6972f2 Merge "borg-backup: implement saving a stream, use for database backups" 2021-02-03 03:11:11 +00:00
Ian Wienand
51733e5623 borg-backup: implement saving a stream, use for database backups
Add facility to borg-backup role to run a command and save the output
of it to a separate archive file during the backup process.

This is mostly useful for database backups.  Compressed on-disk logs
are terrible for differential backups because revisions have
essentially no common data.  By saving the uncompressed stream
directly from mysqldump, we allow borg the chance to de-duplicate,
saving considerable space on the backup servers.

This is implemented for our ansible-managed servers currently doing
dumps.  We also add it to the testinfra.

This also separates the archive names for the filesystem and stream
backup with unique prefixes so they can be pruned separately.
Otherwise we end up keeping only one of the stream or filesystem
backups which isn't the intention.  However, due to issues with
--append-only mode we are not issuing prune commands at this time.

Note the updated dump commands are updated slightly, particularly with
"--skip-extended-insert" which was suggested by mordred and
significantly improves incremental diff-ability by being slightly more
verbose but keeping much more of the output stable across dumps.

Change-Id: I500062c1c52c74a567621df9aaa716de804ffae7
2021-02-03 11:43:12 +11:00
Zuul
137d518bf8 Merge "Update Gerrit 3.2 plugin versions on image builds" 2021-02-01 09:21:23 +00:00
Ian Wienand
9f4cbcfbc2 Expand gerrit testing to multiple changes
This reworks the gerrit testing slightly to give some broader
coverage.

It sets up ssh keys for the user; not really necessary but can be
helpful when interacting on a held host.

It sets up groups and verification labels just so Zuul can comment
with -2/+2; again this is not really necessary, but makes things a
little closer to production reality.

We make multiple changes, so we can better test navigating between
them.  The change comments are updated to have some randomness in them
so they don't all look the same.  We take screen shots of two change
pages to validate the navigation between them.

Change-Id: I60b869e4fdcf8849de836e33db643743128f8a70
2021-02-01 14:06:08 +11:00