17307 Commits

Author SHA1 Message Date
Jeremy Stanley
0991ec62be Stop updating Gerrit RDBMS for repo renames
The version of Gerrit we're running no longer keeps account and
project details in an SQL database, so drop the tasks from our
rename playbook related to this.

Change-Id: I4b11b627a1285617496ca7b53f9b7e3f8251630c
2021-06-24 17:01:43 +00:00
Ian Wienand
c9528f6d12 borg-backup: exclude /var/lib/snapd
Not necessary to backup

Change-Id: I74d19b951934661a9dc22f707cc0a0db119ddc5a
2021-06-23 15:37:15 +10:00
Zuul
73ce0a7d7b Merge "Add eavesdrop01.opendev.org to backup group" 2021-06-23 03:28:58 +00:00
Zuul
d4c4d2f91b Merge "centos-mirror: exclude ppc64le" 2021-06-23 01:12:29 +00:00
Zuul
6c7fc990da Merge "centos-mirror: add dry run mode" 2021-06-23 01:12:17 +00:00
Ian Wienand
0e9b950086 Add eavesdrop01.opendev.org to backup group
This saves a copy of our channel/meeting logs.

Change-Id: I376d1426573416ff0c2e633fa40e4d93adc89483
2021-06-23 10:48:38 +10:00
Jeremy Stanley
b66cd3fdde Correct the meeting base URL for our meetbot
Configure the base URL to something which actually resolves in DNS
and serves the expected content.

Change-Id: I4cfb22c8ba573ea7b689d5131b8a977d0eef5518
2021-06-22 22:55:33 +00:00
Ian Wienand
74fcf2a338 centos-mirror: exclude ppc64le
Nothing uses this architecture and it takes up considerable space on
the mirror volumes.

Change-Id: I8808419372f69c9968928d4c9b34a40d0349dc66
2021-06-22 10:08:12 +10:00
Ian Wienand
917544546f centos-mirror: add dry run mode
Add a dry-run flag and abstract the rsync command to make for easier
testing if modifying the copy flags.

Change-Id: Ie658b60257b94436b1eda0cddf6deb639a87d659
2021-06-22 10:08:05 +10:00
Clark Boylan
3d5d2779d2 Be explicit about server used in acme.sh
Acme.sh is updating their defaults to use zerossl instead of
letsencrypt [0]. This has resulted in errors like:

  Can not resolve _eab_id

When our runs of acme.sh attempt to communicate with zerossl. While the
default change isn't supposed to happen until August 1 we hit it early
because we consume the dev branch of acme.sh.

We avoid this entirely by being explicit about the server to communicate
to in our acme.sh driver script. We explicitly set --server to
letsencrypt.

Note that a followup should likely update our use of --staging to set
--server letsencrypt_test as --staging enforces their defaults as well.

[0] https://github.com/acmesh-official/acme.sh/wiki/Change-default-CA-to-ZeroSSL

Change-Id: Ia6a8da80869f1c4ff3240712bcd320bfc6f29e93
2021-06-18 08:48:35 -07:00
Zuul
2fd7960dec Merge "review02 : bump heap limit to 96gb" 2021-06-18 03:53:54 +00:00
Zuul
430ffb2e80 Merge "Add note about afs01's mirror-update vos releases to docs" 2021-06-18 03:39:36 +00:00
Ian Wienand
2791684d39 review02 : bump heap limit to 96gb
This host has 128gb RAM.  96gb still leaves a considerable amount for
cache.

Change-Id: I1245c03ae6fbfa77743296e28b52a6a62395fc36
2021-06-18 13:20:37 +10:00
Zuul
2a1505dd5b Merge "review02 : switch reviewdb to mariadb_container type" 2021-06-17 22:57:51 +00:00
Clark Boylan
b400dfcb90 Add note about afs01's mirror-update vos releases to docs
I tripped over this during recent afs fileserver reboots. Note it in the
docs so that we are aware of this in the future when doing maintenance.

Change-Id: Iac20fa6b9ec17f1eb69c50bc8f5736b34967fd83
2021-06-17 09:53:08 -07:00
Zuul
5bef87f4a4 Merge "bridge: upgrade to Ansible 4.0.0" 2021-06-17 06:20:10 +00:00
Zuul
1289e32f6d Merge "openafs-client: add service timeout override" 2021-06-17 00:22:24 +00:00
Zuul
9181d5198d Merge "gerrit: add mariadb_container option" 2021-06-16 23:14:48 +00:00
Ian Wienand
d1924491d6 review02 : switch reviewdb to mariadb_container type
This switches review02 to use a mariadb container for the change
review database.

Change-Id: Idc6183d63e22e7484a4127a3b71b29cb53c23c51
2021-06-16 13:57:19 +10:00
Ian Wienand
570ca85cd8 gerrit: add mariadb_container option
This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database.  This is inspired by a few things

 - since migration to NoteDB, there is only one table left where
   Gerrit records what files have been reviewed for a change.  This
   logically scales with the number of reviews users are doing.
   Pulling the stats on this, we can see since the NoteDB upgrade this
   went from a very busy database (~300 queries/70 commits per second)
   to barely registering one hit per second :
   https://imgur.com/a/QGJV7Fw

   Thus separating the db to an external host for performance reasons
   is not a large concern any more.

 - emperically we've done a bad job in keeping the existing hosted db
   up-to-date; it's still running mysql 5.1 and we have been hit by
   bugs such as the one referenced in-line which silently drops
   backups.

 - The other gerrit option is to use an on-disk H2 database.  This is
   certainly an option, however you need special tools to interact
   with it for migration, etc. and it's not safe to backup from files
   on disk (as opposed to mysqldump).  Upstream advice is unclear, and
   varies between H2 being a performance bottleneck to this being
   ephemeral data that users don't care about.  We know how to admin
   mariadb/mysql and this allows us to migrate and backup data, so
   seems like the best choice.

 - we have a pressing need to update the server to a new operating
   system.  Running the db alongside the gerrit instance minimises
   fiddling we have to do manging connections to and migrating the
   hosted db systems.

 - related to that, we are tending towards more provider independence
   for control-plane servers.  A hosted database product is not always
   provided, so this gives us more flexibility in moving things
   around.

 - the main concern here is memory usage.  "docker stats" reports a
   quiescent container, freshly started on a 8GB host:

    gerrit-compose_mariadb_1  67.32MiB

   After loading a copy of the production table, and then dumping it
   back to a file the same container reports:

    gerrit-compose_mariadb_1  462.6MiB

The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).

Backups of the local container need different dump commands; backups
are relocated to a new file and updated.

Testing is converted to use this rather than a local H2 database.

[1] https://docs.docker.com/compose/startup-order/

Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
2021-06-16 13:57:13 +10:00
Ian Wienand
65991cf92e openafs-client: add service timeout override
Starting the openafs-client service is an intensive operation as it
walks the cache registering various things.  We've seen on our
production ARM64 mirror this can take longer than the 1:30 default
timeout.  This is a fatal issue, as the module will try to unload
while afsd is still spinning and working resulting in completely
corrupt kernel state.

This is about double the longest time we've seen, so should give
plenty of overhead.

Change-Id: I37186494b9afd72eab3a092279579f1a5fa5d22c
2021-06-16 11:50:53 +10:00
Jeremy Stanley
f2859c55de ircbot: flush channel logging continuously
We've seen evidence of channel logs not getting flushed for very
long timespans (as much as a day on some channels), so adjust the
configuration to flush channel log files immediately after each line
is written. We don't log *that* many channels and our discussion
volume isn't so high that this is likely to cause I/O performance
issues for the server.

Change-Id: Ia56d3d8c21c48d8ed6ba2466c914e0c20a6192c3
2021-06-15 15:11:23 +00:00
Zuul
c4ce6d8546 Merge "statusbot: don't prefix with extra # for testing" 2021-06-15 04:38:22 +00:00
Zuul
dd6302672f Merge "nodepool-builder: add volume for /var/lib/containers" 2021-06-15 04:38:12 +00:00
Zuul
649285aab2 Merge "ircbot: update limnoria" 2021-06-15 00:08:44 +00:00
Zuul
4ae44282c6 Merge "Update eavesdrop deploy job" 2021-06-15 00:08:37 +00:00
Ian Wienand
4e559edbf5 nodepool-builder: add volume for /var/lib/containers
podman, used by the new containerfile element, requires a
non-overlayfs volume at /var/lib/containers to be able to start and
extract the container images for us to build from.  Add a separate
volume for this.

Change-Id: I6629034ad0b300d392d3d989dbbf17a1343c06e1
2021-06-15 09:24:08 +10:00
Zuul
d8e3e9b91a Merge "Use tmpfiles.d to create /var/run/reprepro" 2021-06-14 22:57:08 +00:00
Ian Wienand
9df7fd5880 ircbot: update limnoria
This branch now has the two fixes noted inline.  Pull this in until
everything merged to master upstream.

Change-Id: I0aa4716ae26cf6fb8068665a1f21a7c66503bcff
2021-06-15 08:19:03 +10:00
Clark Boylan
d72012fceb Use tmpfiles.d to create /var/run/reprepro
The mirror-update server uses /var/run/reprepro to stash reprepro flock
files. We do that to ensure that we don't have stale locks after a
reboot bceause /var/run is cleaned on reboot. Problem is we rely on
daily ansible runs to recreate this dir which means that after a reboot
we can wait up to 24 hours before we get reprepro mirroring again.

Fix this via the use of tmpfiles.d which instructs systemd to create the
dir for us on boot. We specifically note (via the !) that this directory
should only be created on boot and we set the age value to - to prevent
systemd from deleting this directory.

Change-Id: I68e49475c54e756ce5a6933390dbe13ace976c29
2021-06-11 15:35:56 -07:00
Clark Boylan
c8be6be1b8 Fix some hostnames in afs docs
Noticed this when doing some afs maintenance. We want the bos status of
fileservers when rebooting those servers not the status of the db
servers.

Change-Id: I30f6a2320487c302fda2ffe300daa1d91c7dec45
2021-06-11 14:21:03 -07:00
Zuul
ca60545eb1 Merge "statusbot: don't use opendevstatus name in testing" 2021-06-11 14:02:00 +00:00
Ian Wienand
ef14d11eae statusbot: don't prefix with extra # for testing
statusbot doesn't need a prefix, let's not pollute another channel.

Change-Id: Ifcacad64286c281bf668870688af8dca35622551
2021-06-11 23:30:46 +10:00
Ian Wienand
f304c1a161 Update eavesdrop deploy job
This was missed when adding the statusbot/ircbot containers

Change-Id: I198da471b8a0dd648a8e9f1bfe41988561a745f8
2021-06-11 23:23:20 +10:00
Zuul
bb15850b8d Merge "limnoria: fix nicks syntax" 2021-06-11 13:17:10 +00:00
Ian Wienand
4ffcc89c8a statusbot: don't use opendevstatus name in testing
Currently when we run tests, this connects to OFTC and tries to use
the opendevstatus nick as it is the default.  Replace this with a
random username.  Also override the channels list, so it only joins

Limnoria was already using a non-conflicting name, but switch it to a
random one for consistency and possible parallel running.  This also
already only joins #opendev-sandbox.

Change-Id: I860b0f1ed4f99140dda0f4d41025f0b5fb844115
2021-06-11 22:59:24 +10:00
Ian Wienand
f95b139be5 limnoria: fix nicks syntax
This was introduced with I9b5fdf484b6e5d8c9af60708ff02d3c60e427fbd

Change-Id: I4d8726ad519f0faf03c2de9d566529491edcfb8a
2021-06-11 21:15:05 +10:00
Zuul
533c9c6855 Merge "limnoria: disable flush (really this time) and don't join until nickserv" 2021-06-11 11:12:24 +00:00
Ian Wienand
ff899732b4 limnoria: disable flush (really this time) and don't join until nickserv
Change-Id: Ibb352ded6b913d73a1e8b5621c4f01e9d8548f6f
2021-06-11 20:09:44 +10:00
Ian Wienand
8a688b768b limnoria: don't log channel join/parts
Change-Id: Ic4401647391fd88077bbcee2ce61803f54595879
2021-06-11 19:47:02 +10:00
Ian Wienand
868a42a85a Move statusbot channels out of hiera
This makes I246b2723372594e65bcd1ba90215d6831d4c0c72 active

Change-Id: I5a9efa2edc2fe6fb70e21d4b58fd4283d2d5972d
2021-06-11 18:15:48 +10:00
Zuul
b9d885ff2d Merge "Run statusbot from eavesdrop01.opendev.org" 2021-06-11 07:45:55 +00:00
Zuul
fe6581f89f Merge "Cleanup eavesdrop puppet references" 2021-06-11 07:45:46 +00:00
Zuul
048aa39ef0 Merge "static: enable SSLProxyEngine for meetings" 2021-06-11 05:02:34 +00:00
Ian Wienand
438d5037af static: enable SSLProxyEngine for meetings
I4a422bb9589c8a8761191313a656f8377e93422f switched this to proxy via
SSL, however this is required for that to work.

Change-Id: I9b9150b7b1ed53a3e8f742156b686daf156a15b9
2021-06-11 13:42:10 +10:00
Ian Wienand
b2ca63c3b7 limnoria: production fixes
Don't flush the config.  We don't want limnoria to overwrite our
config, and we dont' configure it manually via interaction.

Make sure the Services plugin is loaded to identify with nickserv.

Set the logs2html job to 15 minutes, same as the old puppet setting.

Set the logging level to INFO to avoid verbose logging.

Set the flush option to True so logs are written immediately

Setup rotation on the logfile

Change-Id: I9b5fdf484b6e5d8c9af60708ff02d3c60e427fbd
2021-06-11 13:14:42 +10:00
melanie witt
18df17d5ff Re-enable update_blueprint for patchset-created
The update_blueprint script has been updated to call the Gerrit REST API
instead of relying on the old Gerrit DB.

Depends-On: https://review.opendev.org/c/opendev/jeepyb/+/795912

Change-Id: Ie21ee33801429ef4398f70b22223ee1e9bea1301
2021-06-11 01:34:11 +00:00
Zuul
f80ab86043 Merge "Move meetbot config to eavesdrop01.opendev.org" 2021-06-11 00:10:56 +00:00
Zuul
a17e7a6b81 Merge "Add Fedora 34 mirrors" 2021-06-10 22:27:19 +00:00
Ian Wienand
23fac31c92 Run statusbot from eavesdrop01.opendev.org
This installs statusbot on eavesdrop01.opendev.org.

Otherwise it's just config translation and bringing up the daemon.

Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
2021-06-11 07:52:51 +10:00