388 Commits

Author SHA1 Message Date
Zuul
a8a19abf2c Merge "system-config-run-borg-backup: add to gate" 2022-08-12 07:11:38 +00:00
Zuul
74389454ce Merge "system-config-run-borg-backup: rename hosts to distro" 2022-08-11 23:57:30 +00:00
Zuul
00df4d06c0 Merge "system-config-run-borg-backup: add jammy test host" 2022-08-11 05:32:30 +00:00
Ian Wienand
46bb73d947 system-config-run-borg-backup: add to gate
We must have missed this, I noticed when it didn't run on the gate job
for I949c40e9046008d4f442b322a267ce0c967a99dc

Change-Id: I62c5c0f262d9bd53580367dc9f1ad00fe7b6f6f2
2022-08-11 13:54:52 +10:00
Ian Wienand
55654851bc system-config-run-borg-backup: rename hosts to distro
Rename the testing hosts to be clearer that they are different
distros.

Change-Id: Ic4b2b4a1b1fa8bc9a9eb62dc2ccba529958f19cd
2022-08-11 13:32:49 +10:00
Zuul
4ee5be00d9 Merge "Also pin pip/setuptools when creating Xenial venvs" 2022-08-11 00:19:46 +00:00
Jeremy Stanley
2d9d24d07d Also pin pip/setuptools when creating Xenial venvs
We still have some Ubuntu Xenial servers, so cap the max usable pip
and setuptools versions in their venvs like we already do for
Bionic, in order to avoid broken installations. Switch our
conditionals from release name comparisons to version numbers in
order to more cleanly support ranges. Also make sure the borg run
test is triggered by changes to the create-venv role.

Change-Id: I5dd064c37786c47099bf2da66b907facb517c92a
2022-08-10 19:35:10 +00:00
Ian Wienand
a36ee527c8 system-config-run-borg-backup: add jammy test host
With Jammy production nodes coming, add testing to the backup roles on
this distro.

Change-Id: I7d7733c7a52918b1faa65c3d0dcfd2cf94e66066
2022-08-10 10:14:56 +10:00
Ian Wienand
57939b40d9 system-config-run: bump base timeout to 3600
Many of our tests are actually running with a timeout of 3600; I think
between a combination of bumping timeouts for failures and
copy-pasting jobs.

We are seeing frequent timeouts of other jobs without this,
particularly on OVH GRA1.  Let's bump the base timeout to 3600 to
account for this.  The only job that overrides this now is gitea,
which runs for 4800 due to it's long import process.

Change-Id: I762f0f7c7a53a456d9269530c9ae5a9c85903c9c
2022-08-10 10:14:56 +10:00
Ian Wienand
08644ae925 mirror-update: move testing to mirror-update99
Keeping the testing nodes at the other end of the namespace separates
them from production hosts.  This one isn't really referencing itself
in testing like many others, but move it anyway.

Change-Id: I2130829a5f913f8c7ecd8b8dfd0a11da3ce245a9
2022-08-05 08:18:55 +10:00
Ian Wienand
5ba37ced60 paste: move certificate to group variable
Similar to Id98768e29a06cebaf645eb75b39e4dc5adb8830d, move the
certificate variables to the group definition file, so that we don't
have to duplicate handlers or definitions for the testing host.

Change-Id: I6650f5621a4969582f40700232a596d84e2b4a06
2022-08-05 08:18:55 +10:00
Ian Wienand
e70c1e581c static: move certs to group, update testing name to static99
Currently we define the letsencrypt certs for each host in its
individual host variables.

With recent work we have a trusted CA and SAN names setup in
our testing environment; introducing the possibility that we could
accidentally reference the production host during testing (both have
valid certs, as far as the testing hosts are concerned).

To avoid this, we can use our naming scheme to move our testing hosts
to "99" and avoid collision with the production hosts.  As a bonus,
this really makes you think more about your group/host split to get
things right and keep the environment as abstract as possible.

One example of this is that with letsencrypt certificates defined in
host vars, testing and production need to use the same hostname to get
the right certificates created.  Really, this should be group-level
information so it applies equally to host01 and host99.  To cover
"hostXX.opendev.org" as a SAN we can include the inventory_hostname in
the group variables.

This updates one of the more tricky hosts, static, as a proof of
concept.  We rename the handlers to be generic, and update the testing
targets.

Change-Id: Id98768e29a06cebaf645eb75b39e4dc5adb8830d
2022-08-05 08:18:55 +10:00
Zuul
11494a31a4 Merge "system-config-run-gitea: increase timeout" 2022-08-04 17:06:06 +00:00
Zuul
13d65b07a1 Merge "Run our base playbook on jammy" 2022-08-04 11:34:10 +00:00
Ian Wienand
53da4a3fb2 system-config-run-gitea: increase timeout
I've seen a couple of jobs timeout on this for no apparent reason.
Loading all the repos just seems to take a long time.  Looking at the
logs [1], depending on the cloud taking 55m - 1h is not terribly
uncommon.  Increase the timeout on this by 20 minutes to give it
enough headroom over an hour.

[1] https://zuul.opendev.org/t/openstack/builds?job_name=system-config-run-gitea&project=opendev%2Fsystem-config

Change-Id: I51080820bae35ac615a3b8b7ee1b8890e0df8410
2022-08-04 20:38:08 +10:00
Zuul
187e4307a1 Merge "paste : move testing host to paste99, remove https hacks" 2022-08-04 07:19:05 +00:00
Clark Boylan
d5cef7827e Run our base playbook on jammy
This is the first step in running our servers on jammy. This will help
us boot new servers on jammy and bionic replacements on jammy.

Change-Id: If2e8a683c32eca639c35768acecf4f72ce470d7d
2022-08-04 13:40:28 +10:00
Zuul
c5bce86dfa Merge "haproxy: redirect logs to a separate file" 2022-07-20 00:22:55 +00:00
Ian Wienand
376648bfdc Revert "Force ansible 2.9 on infra-prod jobs"
This reverts commit 21c6dc02b5b3069e4c9410416aeae804b2afbb5c.

Everything appears to be working with Ansible 2.9, which does seem to
sugguest reverting this will result in jobs timing out again.  We will
monitor this, and I76ba278d1ffecbd00886531b4554d7aed21c43df is a
potential fix for this.

Change-Id: Id741d037040bde050abefa4ad7888ea508b484f6
2022-07-17 09:07:20 +10:00
Clark Boylan
21c6dc02b5 Force ansible 2.9 on infra-prod jobs
We've been seeing ansible post-run playbook timeouts in our infra-prod
jobs. The only major thing that has changed recently is the default
update to ansible 5 for these jobs. Force them back to 2.9 to see if the
problem goes away.

Albin Vass has noted that there are possibly glibc + debian bullseye +
ansible 5 problems that may be causing this. If we determine 2.9 is
happy then this is the likely cause.

Change-Id: Ibd40e15756077d1c64dba933ec0dff6dc0aac374
2022-07-15 09:06:11 -07:00
Zuul
ee077085e9 Merge "production-playbook logs : move to post-run step" 2022-07-15 01:54:31 +00:00
Ian Wienand
21efe11eed production-playbook logs : move to post-run step
If the production playbook times out, we don't get any logs collected
with the run.  By moving the log collection into a post-run step, we
should always get something copied to help us diagnose what is going
wrong.

Change-Id: I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3
2022-07-15 07:58:23 +10:00
Clark Boylan
e86bccf6ea Fix system-config-run-review file triggers
These files got moved around and refactored to better support testing of
the Gerrit 3.5 to 3.6 upgrade path. Make sure we trigger the test jobs
when these files are updated.

Change-Id: I5a520e8a8a7c794a761279d4fb98c23e5d25f0ad
2022-07-14 08:56:58 -07:00
Ian Wienand
f97b9b8b8b haproxy: redirect logs to a separate file
haproxy only logs to /dev/log; this means all our access logs get
mixed into syslog.  This makes it impossible to pick out anything in
syslog that might be interesting (and vice-versa, means you have to
filter out things if analysing just the haproxy logs).

It seems like the standard way to deal with this is to have rsyslogd
listen on a separate socket, and then point haproxy to that.  So this
configures rsyslogd to create /var/run/dev/log and maps that into the
container as /dev/log (i.e. don't have to reconfigure the container at
all).

We then capture this sockets logs to /var/log/haproxy.log, and install
rotation for it.

Additionally we collect this log from our tests.

Change-Id: I32948793df7fd9b990c948730349b24361a8f307
2022-07-07 21:29:13 +10:00
Ian Wienand
939233e4e4 paste : move testing host to paste99, remove https hacks
Move the paste testing server to paste99 to distinguish it in testing
from the actual production paste service.  Since we have certificates
setup now, we can directly test against "paste99.opendev.org",
removing the insecure flags to various calls.

Change-Id: Ifd5e270604102806736dffa86dff2bf8b23799c5
2022-07-07 10:02:46 +10:00
Zuul
195ff48d4a Merge "Add Gerrit 3.5 to 3.6 upgrade testing" 2022-06-29 01:28:59 +00:00
Zuul
65a4e3a8d4 Merge "Add Gerrit 3.6 jobs" 2022-06-29 01:28:57 +00:00
Zuul
7146f73f1c Merge "Remove Gerrit 3.4 jobs" 2022-06-29 00:20:52 +00:00
Ian Wienand
6cd7433086 graphite: fix xFilesFactor
When we migrated this to ansible I missed that we didn't bring across
the storage-aggregation.conf file.

This has had the unfortunate effect of regressing the xFilesFactor set
for every newly created graphite stat since the migration.  This
setting is a percentage (0-1 float) of how much of a "bucket" needs to
be non-null to keep the value when rolling up changes.  We want this
to be zero due to the sporadic nature of data (see the original change
I5f416e798e7abedfde776c9571b6fc8cea5f3a33).

This only affected newly created statistics, as graphite doesn't
modify this setting once it creates the whisper file.  This probably
helped us overlook this for so long, as longer-existing stats were
operating correctly, but newer were dropping data when zoomed out.

Restore this setting, and double-check it in testinfra for the future.
For simplicity and to get this back to the prior state I will manually
update the on-disk .wsp files to this when this change applies.

Change-Id: I57873403c4ca9783b1851ba83bfba038f4b90715
2022-06-28 18:41:17 +10:00
Clark Boylan
d40f5d3089 Add Gerrit 3.5 to 3.6 upgrade testing
This adds upgrade testing from our current Gerrit version (3.5) to the
likely future version of our next upgrade (3.6).

To do so we have to refactor the gerrit testing becase the 3.5 to 3.6
upgrade requires we run a command against 3.5. The previous upgrade
system assumed the old version could be left alone and jumped straight
into the upgrade finally testing the end state. Now we have split up the
gerrit bootstrapping and gerrit testing so that normal gerrit testing
and upgrade testing can run these different tasks at different points in
the gerrit deployment process.

Now the upgrade tests use the bootstrapping playbook to create users,
projects, and changes on the old version of gerrit before running the
copy-approvals command. Then after the upgrade we run the test assertion
portion of the job.

Change-Id: Id58b27e6f717f794a8ef7a048eec7fbb3bc52af6
2022-06-22 10:58:17 -07:00
Clark Boylan
1da5615477 Add Gerrit 3.6 jobs
This adds Gerrit 3.6 image build jobs as well as CI testing for this
version of Gerrit. Once we've got images that build and function
generally we'll reenable the upgrade job and work through that.

Change-Id: I494a21911a2279228e57ff8d2b731b06a1573438
2022-06-21 16:54:36 -07:00
Clark Boylan
063ec0f5a7 Remove Gerrit 3.4 jobs
This removes our Gerrit 3.4 image builds as well as testing. We should
land this after an appropriate amount of time has passed since the 3.5
upgrade that we are unlikely to revert.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/847057
Change-Id: Iefa7cc1157311f0239794b15bea7c93f0c625a93
2022-06-21 16:27:46 -07:00
Clark Boylan
cb905c9b4f Fix gerrit deployment dependencies
We've upgraded to Gerrit 3.5 so now need to wait for the 3.5 image to
promote rather than the 3.4 image when deploying Gerrit.

Change-Id: Ic3a4d578aea955aeee51f4cac7f4c95de931a94b
2022-06-21 07:48:12 -07:00
Zuul
0e072f1399 Merge "Update Gerrit images to 3.4.5 and 3.5.2" 2022-06-01 02:04:47 +00:00
Clark Boylan
5cc6c14a6d Remove ethercalc config management
About a month ago we announced [0][1] that this server would be shutdown
and removed on May 31, 2022. Before we can shutdown the server we need
to remove it from config management to prevent Ansible errors. This
change is safe to land now, then on the 31st we can shutdown, snapshot,
and delete the server.

[0] https://lists.opendev.org/pipermail/service-announce/2022-May/000038.html
[1] https://lists.openstack.org/pipermail/openstack-discuss/2022-May/028408.html

Change-Id: Ic44bed01384845e5b6322eeed02dd0932501cdb3
2022-05-30 12:57:48 -07:00
Clark Boylan
819d3ce480 Update Gerrit images to 3.4.5 and 3.5.2
3.4.5 is a fairly minor update. Some bugs are fixed and jgit is updated.

3.4.5 release notes:
  https://www.gerritcodereview.com/3.4.html#345

3.5.2 is a bigger update and important adds support for being able to
upgrade to 3.6.0 later. There is a new copy-approvals command that must
be run offline on 3.5.2 before upgrading to 3.6.0. This copies approvals
in the notedb in a way that 3.6.0 can handle them apparently. The
release notes indicate this may take some time to run. We don't need to
run it now though and instead need to make note of it when we prepare
for the 3.6.0 upgrade.

3.5.2 release notes:
  https://www.gerritcodereview.com/3.5.html#352

For now don't overthink things and instead just get up to date with our
images.

Change-Id: I837c2cbb09e9a4ff934973f6fc115142d459ae0f
2022-05-25 08:37:33 -07:00
Ian Wienand
24e179d5de Add testing for jammy openafs
Change-Id: I733d10c9285d4ea0d86e97f6ed45f28376a8672b
2022-05-12 12:53:49 +10:00
Zuul
86d58f6ce9 Merge "Test openafs roles on CentOS 9-stream" 2022-05-03 11:59:49 +00:00
Ian Wienand
b42769b7ed Test openafs roles on CentOS 9-stream
We have labeled the 8/9 stream repos with -stream for clarity; add
this to the path for the repo.

Change-Id: I5c4c5365d763f8a3c03a4adef36235e7809c44d7
Depends-On: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/839689
2022-05-03 09:09:44 +10:00
Zuul
69546bc70c Merge "Update Gerrit build checkouts" 2022-05-02 18:37:41 +00:00
Jeremy Stanley
d185aedd7d Decommission status.openstack.org and services
The status.openstack.org server is offline now that it no longer
hosts any working services. Remove all configuration for it in
preparation for retiring related Git repositories.

Also roll some related cleanup into this for the already retired
puppet-kibana module.

Change-Id: I3cfcc129983e3641dfbe55d5ecc208c554e97de4
2022-04-29 16:34:51 +00:00
Ian Wienand
9f4fa24025 Remove puppet-kibana
I think this was overlooked in the removal of the ELK stack with
I5f7f73affe7b97c74680d182e68eb4bfebbe23e1, the repo is now retired.

Change-Id: I87bfe7be61f20a7c05c500af4e82b787d9c37a8c
2022-04-29 17:35:44 +10:00
Clark Boylan
183f2a9c9c Update Gerrit build checkouts
This will ensure we rebuild our gerrit images on the latest content from
upstream.

Change-Id: Ibcb1d076d897d9e24d7351dae156b37142ad0f34
2022-04-25 11:14:40 -07:00
Zuul
7b09f7baab Merge "Remove configuration management for ELK stack" 2022-04-22 16:04:22 +00:00
Zuul
4de58bd423 Merge "Add Bullseye Python 3.10 base images" 2022-04-20 23:21:06 +00:00
Zuul
42d51e4d2d Merge "Remove python3.7-bullseye docker images" 2022-04-20 22:33:00 +00:00
Zuul
7efc08e7b4 Merge "Remove our buster python images" 2022-04-20 16:42:06 +00:00
Clark Boylan
90effa2af0 Add Bullseye Python 3.10 base images
Now that we've cleaned up the old unused images we can look forward to
new Python. Add Python 3.10 base images based on Bullseye.

As part of this process we update the default var values in our
Dockerfiles to set Bullseye and Python3.10 as our defaults as these
should be valid for some time. We also tidy up some yaml anchor names to
make future copy and paste for new versions of images easier to perform
text replacement on.

Change-Id: I4943a9178334c4bdf10ee5601e39004d6783b34c
2022-04-20 08:39:52 -07:00
Zuul
795a509138 Merge "Switch Refstack image over to python3.9" 2022-04-19 19:27:16 +00:00
Clark Boylan
2d43f9322b Remove python3.7-bullseye docker images
Everything is running on 3.8 or newer which should allow us to remove
the 3.7 images. This reduces the total set before we add python3.10
images and acts as good cleanup.

Change-Id: I2cc02fd681485f35a1b0bf1c089a12a4c5438df3
2022-04-18 13:42:30 -07:00