The version of Gerrit we're running no longer keeps account and
project details in an SQL database, so drop the tasks from our
rename playbook related to this.
Change-Id: I4b11b627a1285617496ca7b53f9b7e3f8251630c
Configure the base URL to something which actually resolves in DNS
and serves the expected content.
Change-Id: I4cfb22c8ba573ea7b689d5131b8a977d0eef5518
Add a dry-run flag and abstract the rsync command to make for easier
testing if modifying the copy flags.
Change-Id: Ie658b60257b94436b1eda0cddf6deb639a87d659
Acme.sh is updating their defaults to use zerossl instead of
letsencrypt [0]. This has resulted in errors like:
Can not resolve _eab_id
When our runs of acme.sh attempt to communicate with zerossl. While the
default change isn't supposed to happen until August 1 we hit it early
because we consume the dev branch of acme.sh.
We avoid this entirely by being explicit about the server to communicate
to in our acme.sh driver script. We explicitly set --server to
letsencrypt.
Note that a followup should likely update our use of --staging to set
--server letsencrypt_test as --staging enforces their defaults as well.
[0] https://github.com/acmesh-official/acme.sh/wiki/Change-default-CA-to-ZeroSSL
Change-Id: Ia6a8da80869f1c4ff3240712bcd320bfc6f29e93
I tripped over this during recent afs fileserver reboots. Note it in the
docs so that we are aware of this in the future when doing maintenance.
Change-Id: Iac20fa6b9ec17f1eb69c50bc8f5736b34967fd83
This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database. This is inspired by a few things
- since migration to NoteDB, there is only one table left where
Gerrit records what files have been reviewed for a change. This
logically scales with the number of reviews users are doing.
Pulling the stats on this, we can see since the NoteDB upgrade this
went from a very busy database (~300 queries/70 commits per second)
to barely registering one hit per second :
https://imgur.com/a/QGJV7Fw
Thus separating the db to an external host for performance reasons
is not a large concern any more.
- emperically we've done a bad job in keeping the existing hosted db
up-to-date; it's still running mysql 5.1 and we have been hit by
bugs such as the one referenced in-line which silently drops
backups.
- The other gerrit option is to use an on-disk H2 database. This is
certainly an option, however you need special tools to interact
with it for migration, etc. and it's not safe to backup from files
on disk (as opposed to mysqldump). Upstream advice is unclear, and
varies between H2 being a performance bottleneck to this being
ephemeral data that users don't care about. We know how to admin
mariadb/mysql and this allows us to migrate and backup data, so
seems like the best choice.
- we have a pressing need to update the server to a new operating
system. Running the db alongside the gerrit instance minimises
fiddling we have to do manging connections to and migrating the
hosted db systems.
- related to that, we are tending towards more provider independence
for control-plane servers. A hosted database product is not always
provided, so this gives us more flexibility in moving things
around.
- the main concern here is memory usage. "docker stats" reports a
quiescent container, freshly started on a 8GB host:
gerrit-compose_mariadb_1 67.32MiB
After loading a copy of the production table, and then dumping it
back to a file the same container reports:
gerrit-compose_mariadb_1 462.6MiB
The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).
Backups of the local container need different dump commands; backups
are relocated to a new file and updated.
Testing is converted to use this rather than a local H2 database.
[1] https://docs.docker.com/compose/startup-order/
Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
Starting the openafs-client service is an intensive operation as it
walks the cache registering various things. We've seen on our
production ARM64 mirror this can take longer than the 1:30 default
timeout. This is a fatal issue, as the module will try to unload
while afsd is still spinning and working resulting in completely
corrupt kernel state.
This is about double the longest time we've seen, so should give
plenty of overhead.
Change-Id: I37186494b9afd72eab3a092279579f1a5fa5d22c
We've seen evidence of channel logs not getting flushed for very
long timespans (as much as a day on some channels), so adjust the
configuration to flush channel log files immediately after each line
is written. We don't log *that* many channels and our discussion
volume isn't so high that this is likely to cause I/O performance
issues for the server.
Change-Id: Ia56d3d8c21c48d8ed6ba2466c914e0c20a6192c3
podman, used by the new containerfile element, requires a
non-overlayfs volume at /var/lib/containers to be able to start and
extract the container images for us to build from. Add a separate
volume for this.
Change-Id: I6629034ad0b300d392d3d989dbbf17a1343c06e1
This branch now has the two fixes noted inline. Pull this in until
everything merged to master upstream.
Change-Id: I0aa4716ae26cf6fb8068665a1f21a7c66503bcff
The mirror-update server uses /var/run/reprepro to stash reprepro flock
files. We do that to ensure that we don't have stale locks after a
reboot bceause /var/run is cleaned on reboot. Problem is we rely on
daily ansible runs to recreate this dir which means that after a reboot
we can wait up to 24 hours before we get reprepro mirroring again.
Fix this via the use of tmpfiles.d which instructs systemd to create the
dir for us on boot. We specifically note (via the !) that this directory
should only be created on boot and we set the age value to - to prevent
systemd from deleting this directory.
Change-Id: I68e49475c54e756ce5a6933390dbe13ace976c29
Noticed this when doing some afs maintenance. We want the bos status of
fileservers when rebooting those servers not the status of the db
servers.
Change-Id: I30f6a2320487c302fda2ffe300daa1d91c7dec45
Currently when we run tests, this connects to OFTC and tries to use
the opendevstatus nick as it is the default. Replace this with a
random username. Also override the channels list, so it only joins
Limnoria was already using a non-conflicting name, but switch it to a
random one for consistency and possible parallel running. This also
already only joins #opendev-sandbox.
Change-Id: I860b0f1ed4f99140dda0f4d41025f0b5fb844115
I4a422bb9589c8a8761191313a656f8377e93422f switched this to proxy via
SSL, however this is required for that to work.
Change-Id: I9b9150b7b1ed53a3e8f742156b686daf156a15b9
Don't flush the config. We don't want limnoria to overwrite our
config, and we dont' configure it manually via interaction.
Make sure the Services plugin is loaded to identify with nickserv.
Set the logs2html job to 15 minutes, same as the old puppet setting.
Set the logging level to INFO to avoid verbose logging.
Set the flush option to True so logs are written immediately
Setup rotation on the logfile
Change-Id: I9b5fdf484b6e5d8c9af60708ff02d3c60e427fbd
The update_blueprint script has been updated to call the Gerrit REST API
instead of relying on the old Gerrit DB.
Depends-On: https://review.opendev.org/c/opendev/jeepyb/+/795912
Change-Id: Ie21ee33801429ef4398f70b22223ee1e9bea1301
This installs statusbot on eavesdrop01.opendev.org.
Otherwise it's just config translation and bringing up the daemon.
Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72