This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.
Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
This file is now removed (I0cbcd4694a4796573fe48383756be03597d2da0f);
get rid of this to avoid any confusion.
Change-Id: I837d1fccbfa2461eb1315eac54c2a017fcb86511
This syslog configuration is what sends any logs with a program-name
of "docker-<foo>" to /var/log/containers/foo.log. However, at 98-
level the rules are after the default 50- rules, so we're seeing the
logs copied to both syslog and /var/log/containers. Since this
contains a "stop" command, we should move this earlier before the
default rules and the docker logs will not be duplicated.
Change-Id: I0cbcd4694a4796573fe48383756be03597d2da0f
As described inline, ensure that minimal facts for the backup servers'
are loaded before running the backup roles on hosts, so they can read
the ansible_ssh_host_key_ed25519_public fact for each backup server
and ensure it is accepted.
Update the other comments slightly as well.
Change-Id: I1f207ca0770d58f61a89f9ade0bd26cebc982c62
I introduced this typo with I500062c1c52c74a567621df9aaa716de804ffae7.
Luckily Ibb63f19817782c25a5929781b0f6342fe4c82cf0 has alerted us to
this problem.
Change-Id: I02bf2f4fa1041642a719100e9591bf5cd1a0bf49
So we can stop/pull/start, move the pull tasks to their own files
and add a playbook that invokes them.
Change-Id: I4f351c1d28e5e4606e0a778e545a3a805525ac71
This includes a fix for I216528a76307189d8d87bd2fcfeff95c6ceb53cc.
Now it's released we can be a bit more explicit about why we added the
workaround.
Change-Id: Ibaf1850549b5e7ec3622418b650bc5e59a289ab6
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.
Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
This sets a global BORG_UNDER_CRON=1 environment variable for
production hosts and makes the borg-backup script send an email if any
part of the backup job appears to fail (this avoids spamming ourselves
if we're testing backups, etc).
We should ideally never get this email, but if we do it's something we
want to investigate quickly. There's nothing worse than thinking
backups are working when they aren't.
Change-Id: Ibb63f19817782c25a5929781b0f6342fe4c82cf0
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role. Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.
Then separate out the two into afs-<file|db>-server groups for
consistent naming.
Rename afs-admin for consistent naming.
The service file is updated to reflect the new groups.
Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
This checks the backup archives and alerts us if anything seems wrong.
This will take a few hours, so we run once a week.
Change-Id: I832c0d29a37df94d4bf2704c59bb3f8d855c3cc8
We have setup rsyslogd/logrotate to handle anything with docker- tags
to be persisted to disk in /var/log/containers. Set this up here so
we keep track of the mariadb and refstack logs.
Change-Id: I760cfeb7226f79986fbf9d7dbc5f899fc87a0cd1
This change splits our existing system-config-run-review job into two
jobs, one for gerrit 3.2 and another for 3.3. The biggest change is that
we use a var called zuul_test_gerrit_version to select which version we
want and that ends up in the fake group file written out by Zuul for the
nested ansible run. The nested ansible run will then populate the
docker-compose file with the appropriate version for us.
Change-Id: I00b52c0f4aa8df3ecface964007fcf5724887e5e
The mariadb container currently doesn't persist it's database
anywhere. Map /var/lib/refstack/db to /var/lib/mysql in the
container.
We have /var/refstack and /var/lib/refstack with various things.
While we're here move everythign under /var/lib/refstack.
Also use 127.0.0.1 to ensure mysql doesn't try to connect over a
socket, but tcp (I think pymsql does anyway, but it's a little
clearer).
Change-Id: I5605eac2848a6b2222698bf20c707baa4442fcd5
This slipped in with I4e80ad8ffe1d4992e405ea516b8762109758d7eb; it
should be openafs, not openstack.
Change-Id: Iefc41f9085d86e9fdaa13c6e5b90f1c99b7a2d83
It is buggy (throwing exceptions for undefinied variables which are
actualyl defined via set_fact), and we frequently run into problems
using it in this repo. It was designed to lint roles for Galaxy,
not the way we write ansible. As of the 5.0.0 release it's
generating >4.5K lines of complaints about files in this repository.
Change-Id: If9d8c19b5e663bdd6b6f35ffed88db3cff3d79f8
It's not obvious, but the if statements can change the PIPESTATUS
meaning we're not matching what we think we're matching. Save the
pipestatus of the backup commands so we exit the backup script with
the right code.
Change-Id: I83c7db45d3622067eb05107e26fbdc7a8aeecf63
Due to backups running in append-only mode, we do not have a way to
safely automatically prune backups. To reduce the likelyhood we
forget about backups and end up with failing jobs, add a cron job to
send a email to infra-root if the backup partition goes over 90%
usage. At this point a manual prune should be run
(I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e).
Change-Id: I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7
Due to [1] --all-databases is no longer working with our version of
database. Move to explicitly backing up the only database we care
about now, which is accountPatchReviewDb; everything else is in
notedb.
[1] https://bugs.launchpad.net/ubuntu/+source/mysql-5.7/+bug/1914695
Change-Id: Iab2a8ab612cc0a0f10c90123f2936c0abda9e76f
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.
Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
These were gleaned from looking at what files are taking up space in
the deltas of backups. Nothing major, but mlocate in partiuclar is
taking up to a couple of hundred mb on some servers.
Change-Id: I4b08c4e2491fa7138045aabcb23017ff8cef7600