This should be called "_extra" ... currently it overrides the default
exclude list. This means /var/lxcfs gets incorrectly included in the
backup and makes it error out as it has sockets and weird stuff that
can't be backed up; this is why we are getting failure mail.
Change-Id: Idea70c32b2d42f77fee2b35487d88a8ee982c856
This is a Focal server that will replace zm01.openstack.org. Once this
is deployed and happy we can also move forward and do the remainder of
the mergers.
Change-Id: I139c52e26d17ac8d9b604366a3333556d23c5536
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c renamed the afs and afsdb
groups to afs-file-server and afs-db-server, but didn't update the
group files.
Previously the firewall rules were duplicated in the afs/afsdb group;
but now all afs servers are in the afs-server-common group. Rename
afs.yaml->afs-server-common.yaml and remove the now unnecessary
afsdb.yaml.
Remove one of the old group vars files and rename the other to
afs-server-common so we can restore the udp ports they open in our
firewall rules.
Change-Id: I17dd0596660addf061ade31b4450bf040c01ffe8
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role. Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.
Then separate out the two into afs-<file|db>-server groups for
consistent naming.
Rename afs-admin for consistent naming.
The service file is updated to reflect the new groups.
Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
Currently this variable is setting several URL's used in the config to
internal http links (port 8000). This bubbles through to the UI which
then can't talk to the API. Emperically, changing these values in the
container config and restarting it makes things work. Update this
variable to make it talk to external https.
Change-Id: If61ec1e0383b98d34d092c55ca0095588487902a
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.
Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
As noted inline, a recent mysql client update has broken the
"--all-databases" flag, at least for the client version and very old
server version we use.
Emperically, dumping individual databases still works with this
client. Switch this to stream the db directly into borg.
Ignore the old backups and remove the bup backup while we are here,
since this is all borg now.
Change-Id: I5fe762a003ce2c2ba4830367be87598f67f7e763
Despite be deprecated, the ask server is our 3rd biggest backup. Even
though the site is R/O we're still backing up the fresh rotations of
the gzipped backups every day.
To reduce the incremental space requirements, move to our plain-text
streaming for the db backup. This just needs a file dropped in /etc;
see the backup-borg role README documentation. We do this in puppet
to avoid complexity adding this deprecated service to ansible. This
then excludes the on-disk db backup dir.
Drop the bup backups while we are here.
Change-Id: Icfd81aca58b9a0dc3a3b74de04c1b00f03160327
The floating IP of this host was changed during a network issue;
matches I898dbf7417fb01f608eded85faaae5a417ad2e98
Change-Id: Icf1daa4a761403a3927bcadab08656cd1f42f1aa
Add facility to borg-backup role to run a command and save the output
of it to a separate archive file during the backup process.
This is mostly useful for database backups. Compressed on-disk logs
are terrible for differential backups because revisions have
essentially no common data. By saving the uncompressed stream
directly from mysqldump, we allow borg the chance to de-duplicate,
saving considerable space on the backup servers.
This is implemented for our ansible-managed servers currently doing
dumps. We also add it to the testinfra.
This also separates the archive names for the filesystem and stream
backup with unique prefixes so they can be pruned separately.
Otherwise we end up keeping only one of the stream or filesystem
backups which isn't the intention. However, due to issues with
--append-only mode we are not issuing prune commands at this time.
Note the updated dump commands are updated slightly, particularly with
"--skip-extended-insert" which was suggested by mordred and
significantly improves incremental diff-ability by being slightly more
verbose but keeping much more of the output stable across dumps.
Change-Id: I500062c1c52c74a567621df9aaa716de804ffae7
Move common setup steps into a openafs-server-config role, and create
openafs-file-server and openafs-db-server roles to manage fileserver
and db servers respectively.
Modify the playbook to run these roles against the AFS servers.
Change-Id: I4e80ad8ffe1d4992e405ea516b8762109758d7eb
With all AFS file-servers upgraded to 1.8, we can move afs01.dfw back
and rename the group to just "afs".
Change-Id: Ib31bde124e01cd07d6ff7eb31679c55728b95222
This starts at migrating OpenAFS server setup to Ansible.
Firstly we split up the groups and explicitly name hosts, as we will
me migrating each one step-by-step. We split out 1.8 hosts into a new
afs-1.8 group; the first host is afs01.ord.openstack.org which already
has openafs 1.8 installed manually.
An openafs-server role is introduced that does the same setup as the
extant puppet.
The AFS job is renamed to infra-prod-afs as the puppet component will
eventually disappear. Otherwise it runs in the same way, but also
runs the openafs-server role for the 1.8 servers.
Once this is merged, we can run it against afs01.ord.openstack.org to
ensure it works and is idempotent. We can then take on upgrading the
other file servers, and work further on the database servers.
Change-Id: I7998af43961999412f58a78214f4b5387713d30e
This servers were spun up to handle extra load to the openstackid
service during the virtual summit. The load is no longer present and we
have been asked to dial back to the normal setup for this service.
Clean these servers up to stop using unneeded resources. We will start
by removing them from inventory, then dns, and then shut them down. If
everything continues to look happy after that we will delete them.
Change-Id: I469d16f80dcc6c20891556272a94b1f7404b3620
The old ethercalc01 server has been deleted as have its DNS entries.
Belatedly update cacti to query the new server, and remove an old
unused reference which was at one time disabling the former server.
Change-Id: Ide70c7d03bfff5bd695272c696913dfb3decc525
This reverts commit 95d9b838140e44c9547ad1fa28bc88206823198c.
We've found that we run out of memory at 44g. Bump back up to 48g as
that should give us a bit more headroom.
Change-Id: I14a8f2b298aa1d3cb5c0829508ee137a6769675b
We had been setting this to 48GB on java 8, but recent gerrit service
issues indicate that this may be too large for our current system on
java 11. In particular it appears the non heap portions of the jvm may
be in the ~8GB range leaving only about 5-6GB of usable system memory
for other activities like web servers, backups, and garbage collection.
Reduce this to 44GB to increase headroom to see if that helps us. Java
11 is reported to be much more efficient at garbage collecting so
hopefully that makes up the difference between lower memory and where we
were on java 8. As a side note we could revert back to java 8 as another
option.
Change-Id: Ie326aad2a9895098b484924a26c9257cd009d89e
The hound project has undergone a small re-birth and moved to
https://github.com/hound-search/hound
which has broken our deployment. We've talked about leaving
codesearch up to gitea, but it's not quite there yet. There seems to
be no point working on the puppet now.
This builds a container than runs houndd. It's an opendev specific
container; the config is pulled from project-config directly.
There's some custom scripts that drive things. Some points for
reviewers:
- update-hound-config.sh uses "create-hound-config" (which is in
jeepyb for historical reasons) to generate the config file. It
grabs the latest projects.yaml from project-config and exits with a
return code to indicate if things changed.
- when the container starts, it runs update-hound-config.sh to
populate the initial config. There is a testing environment flag
and small config so it doesn't have to clone the entire opendev for
functional testing.
- it runs under supervisord so we can restart the daemon when
projects are updated. Unlike earlier versions that didn't start
listening till indexing was done, this version now puts up a "Hound
is not ready yet" message when while it is working; so we can drop
all the magic we were doing to probe if hound is listening via
netstat and making Apache redirect to a status page.
- resync-hound.sh is run from an external cron job daily, and does
this update and restart check. Since it only reloads if changes
are made, this should be relatively rare anyway.
- There is a PR to monitor the config file
(https://github.com/hound-search/hound/pull/357) which would mean
the restart is unnecessary. This would be good in the near and we
could remove the cron job.
- playbooks/roles/codesearch is unexciting and deploys the container,
certificates and an apache proxy back to localhost:6080 where hound
is listening.
I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.
Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
These changes are squashed together to simplify applying them to config
management without zuul and ansible running one of these without the
others. We essentially need them all in place at the same time to
accurately reflect the post upgrade state.
We stop blocking /p/ in gerrit's apache vhost. /p/ is used for
dashboards.
We add a few java options that new gerrit sets by default.
We update the gerrit image in docker compose to 3.2.
We update zuul to use basic auth instead of digest auth when talking to
Gerrit.
Change-Id: I6ea38313544ce1ecbc4cfd914b1f33e77d0d2d03
Follow-on to Ia9579c7b3204b47d453fc51388265bf1867af20c, this also
matches the web-debug* log files
Change-Id: Ibabbfa3b01317528a75eeec17ea28168da57123a
This cuts out the bulk of the storage expense, but leaves us with the
regular logs for enhanced audit trails.
Change-Id: Ia9579c7b3204b47d453fc51388265bf1867af20c
This should help reduce the bulk of the review site backups
* launchpadlib cache has ~650,000 files which we don't need to track
* review_site/tmp has ~50,000 files
* review_site/cache is about 9gb
* review_site/index is optional to backup, but a) it's very unlikley
to be useful in a full restore situation; we'd have to re-create
them and b) things seem to come and go under this directory during
the backup, causing it to exit with an error status.
Change-Id: If7009cfcd5a3a07c07108149772cc8c1873bf277
This serverId value is used by notedb to identify the gerrit cluster
that notedb contents belong to. By default a random uuid is generated by
gerrit for this value. In order to avoid config management and gerrit
fighting over this value after we upgrade we set a value now.
This should be safe to land on 2.13 as old gerrit should ignore the
value.
Change-Id: I57c9b436a9d0d1dfe77eee907d50fc1dcda6ab12
bup is going crazy and filling the disk when making its backups. We
have moved this into the borg backup group and run some backups, so
rather than spending time debugging this, we are just going to disable
bup on the server.
Change-Id: I1daad4eb05f8222131dc84c12577dec924874466
Backups have been going well on ethercalc02, so add borg backup runs
to all backed-up servers. Port in some additional excludes for Zuul
and slightly modify the /var/ matching.
Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb