419 Commits

Author SHA1 Message Date
Ian Wienand
c27915c3a7 translate: fix backup extras match
This should be called "_extra" ... currently it overrides the default
exclude list.  This means /var/lxcfs gets incorrectly included in the
backup and makes it error out as it has sockets and weird stuff that
can't be backed up; this is why we are getting failure mail.

Change-Id: Idea70c32b2d42f77fee2b35487d88a8ee982c856
2021-02-23 02:00:34 +00:00
Clark Boylan
1e18cd0163 Add new zm01.opendev.org server
This is a Focal server that will replace zm01.openstack.org. Once this
is deployed and happy we can also move forward and do the remainder of
the mergers.

Change-Id: I139c52e26d17ac8d9b604366a3333556d23c5536
2021-02-22 10:58:56 -08:00
Ian Wienand
39ffc685d6 backups: remove all bup
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.

For reference, the servers being backed up at this time are:

 borg-ask01
 borg-ethercalc02
 borg-etherpad01
 borg-gitea01
 borg-lists
 borg-review-dev01
 borg-review01
 borg-storyboard01
 borg-translate01
 borg-wiki-update-test
 borg-zuul01

This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.

For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.

Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
2021-02-16 16:00:28 +11:00
Zuul
60b5f789ad Merge "Clean up ethercalc server replacement transition" 2021-02-15 22:20:10 +00:00
Jeremy Stanley
6d0c4b0b3b Update AFS group vars filenames
Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c renamed the afs and afsdb
groups to afs-file-server and afs-db-server, but didn't update the
group files.

Previously the firewall rules were duplicated in the afs/afsdb group;
but now all afs servers are in the afs-server-common group.  Rename
afs.yaml->afs-server-common.yaml and remove the now unnecessary
afsdb.yaml.

Remove one of the old group vars files and rename the other to
afs-server-common so we can restore the udp ports they open in our
firewall rules.

Change-Id: I17dd0596660addf061ade31b4450bf040c01ffe8
2021-02-12 18:23:45 +11:00
Zuul
036ac31060 Merge "Refactor AFS groups" 2021-02-11 22:46:00 +00:00
Ian Wienand
312b9bec24 Refactor AFS groups
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role.  Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.

Then separate out the two into afs-<file|db>-server groups for
consistent naming.

Rename afs-admin for consistent naming.

The service file is updated to reflect the new groups.

Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
2021-02-11 13:35:16 +11:00
Ian Wienand
32b48c81a2 refstack: use external https for API
Currently this variable is setting several URL's used in the config to
internal http links (port 8000).  This bubbles through to the UI which
then can't talk to the API.  Emperically, changing these values in the
container config and restarting it makes things work.  Update this
variable to make it talk to external https.

Change-Id: If61ec1e0383b98d34d092c55ca0095588487902a
2021-02-11 11:44:39 +11:00
Ian Wienand
5a7511f6a6 refstack: move non-private variables to public
These two variables can be deployed via system-config

Change-Id: If696945d7b01ee42eb822d2391405277eb6c23d3
2021-02-10 07:10:39 +11:00
Zuul
accfb8b0fd Merge "Add refstack01.openstack.org" 2021-02-09 04:24:10 +00:00
Zuul
f526060e39 Merge "Deploy refstack with ansible docker" 2021-02-09 03:58:22 +00:00
Ian Wienand
cf36af34c1 Add refstack01.openstack.org
See Icade6c713fa9bf6ab508fd4d8d65debada2ddb30

Change-Id: I96ba37a1c872d9f5c20224bbad48bc1d17bdc438
2021-02-09 14:39:12 +11:00
Clark Boylan
a4604ae0b3 Deploy refstack with ansible docker
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.

Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
2021-02-05 19:23:34 +00:00
Ian Wienand
56277bf70a ask: fix backup typo and ignore live postgresql
This was overriding the main list of ignores; also ignore the live db.

Change-Id: Idf5ae8e88805829ee44e7f4ba003ac086f5f1206
2021-02-05 17:40:02 +11:00
Ian Wienand
01990670c9 translate: backup zanata db directly to borg
As noted inline, a recent mysql client update has broken the
"--all-databases" flag, at least for the client version and very old
server version we use.

Emperically, dumping individual databases still works with this
client.  Switch this to stream the db directly into borg.

Ignore the old backups and remove the bup backup while we are here,
since this is all borg now.

Change-Id: I5fe762a003ce2c2ba4830367be87598f67f7e763
2021-02-05 14:05:24 +11:00
Ian Wienand
f9184ce323 ask: stream db backup
Despite be deprecated, the ask server is our 3rd biggest backup.  Even
though the site is R/O we're still backing up the fresh rotations of
the gzipped backups every day.

To reduce the incremental space requirements, move to our plain-text
streaming for the db backup.  This just needs a file dropped in /etc;
see the backup-borg role README documentation.  We do this in puppet
to avoid complexity adding this deprecated service to ansible.  This
then excludes the on-disk db backup dir.

Drop the bup backups while we are here.

Change-Id: Icfd81aca58b9a0dc3a3b74de04c1b00f03160327
2021-02-05 13:24:57 +11:00
Ian Wienand
16d26586cf Update airship mirror address
The floating IP of this host was changed during a network issue;
matches I898dbf7417fb01f608eded85faaae5a417ad2e98

Change-Id: Icf1daa4a761403a3927bcadab08656cd1f42f1aa
2021-02-04 11:11:37 +11:00
Zuul
89cd6972f2 Merge "borg-backup: implement saving a stream, use for database backups" 2021-02-03 03:11:11 +00:00
Zuul
70bd9166f7 Merge "Manage afsdb servers with Ansible" 2021-02-03 02:03:28 +00:00
Ian Wienand
51733e5623 borg-backup: implement saving a stream, use for database backups
Add facility to borg-backup role to run a command and save the output
of it to a separate archive file during the backup process.

This is mostly useful for database backups.  Compressed on-disk logs
are terrible for differential backups because revisions have
essentially no common data.  By saving the uncompressed stream
directly from mysqldump, we allow borg the chance to de-duplicate,
saving considerable space on the backup servers.

This is implemented for our ansible-managed servers currently doing
dumps.  We also add it to the testinfra.

This also separates the archive names for the filesystem and stream
backup with unique prefixes so they can be pruned separately.
Otherwise we end up keeping only one of the stream or filesystem
backups which isn't the intention.  However, due to issues with
--append-only mode we are not issuing prune commands at this time.

Note the updated dump commands are updated slightly, particularly with
"--skip-extended-insert" which was suggested by mordred and
significantly improves incremental diff-ability by being slightly more
verbose but keeping much more of the output stable across dumps.

Change-Id: I500062c1c52c74a567621df9aaa716de804ffae7
2021-02-03 11:43:12 +11:00
Zuul
e762fd3677 Merge "gitea backup: prune some large directories" 2021-01-21 00:22:06 +00:00
Ian Wienand
c98505c8f2 Manage afsdb servers with Ansible
Move common setup steps into a openafs-server-config role, and create
openafs-file-server and openafs-db-server roles to manage fileserver
and db servers respectively.

Modify the playbook to run these roles against the AFS servers.

Change-Id: I4e80ad8ffe1d4992e405ea516b8762109758d7eb
2021-01-21 07:08:37 +11:00
Ian Wienand
92250eca82 Remove afs-1.8 group
With all AFS file-servers upgraded to 1.8, we can move afs01.dfw back
and rename the group to just "afs".

Change-Id: Ib31bde124e01cd07d6ff7eb31679c55728b95222
2021-01-21 07:08:29 +11:00
Ian Wienand
99a36d790e gitea backup: prune some large directories
It's not necessary to capture the live db or git trees, so prune these
from the backups.

Change-Id: I7a27c49035eb0590d0157766eb3392a0f6331aea
2021-01-20 16:01:16 +11:00
Ian Wienand
60a7bfc5f6 Move afs02.dfw.openstack.org to afs-1.8 group
This host is now running OpenAFS 1.8 and should be Ansible managed
now.

Change-Id: Ia0cf0672f3e924a3b6d8e337d3355f6216796e92
2021-01-19 09:34:26 +11:00
Ian Wienand
7683fa11b3 openafs-server : add ansible roles for OpenAFS servers
This starts at migrating OpenAFS server setup to Ansible.

Firstly we split up the groups and explicitly name hosts, as we will
me migrating each one step-by-step.  We split out 1.8 hosts into a new
afs-1.8 group; the first host is afs01.ord.openstack.org which already
has openafs 1.8 installed manually.

An openafs-server role is introduced that does the same setup as the
extant puppet.

The AFS job is renamed to infra-prod-afs as the puppet component will
eventually disappear.  Otherwise it runs in the same way, but also
runs the openafs-server role for the 1.8 servers.

Once this is merged, we can run it against afs01.ord.openstack.org to
ensure it works and is idempotent.  We can then take on upgrading the
other file servers, and work further on the database servers.

Change-Id: I7998af43961999412f58a78214f4b5387713d30e
2021-01-19 08:08:33 +11:00
Clark Boylan
44a076998b Cleanup openstackid02 and openstackid03
This servers were spun up to handle extra load to the openstackid
service during the virtual summit. The load is no longer present and we
have been asked to dial back to the normal setup for this service.

Clean these servers up to stop using unneeded resources. We will start
by removing them from inventory, then dns, and then shut them down. If
everything continues to look happy after that we will delete them.

Change-Id: I469d16f80dcc6c20891556272a94b1f7404b3620
2021-01-11 10:20:20 -08:00
Jeremy Stanley
7d48d972b5 Clean up ethercalc server replacement transition
The old ethercalc01 server has been deleted as have its DNS entries.
Belatedly update cacti to query the new server, and remove an old
unused reference which was at one time disabling the former server.

Change-Id: Ide70c7d03bfff5bd695272c696913dfb3decc525
2021-01-05 16:27:09 +00:00
Clark Boylan
613810dba1 Revert "Reduce gerrit heap limit to 44g"
This reverts commit 95d9b838140e44c9547ad1fa28bc88206823198c.

We've found that we run out of memory at 44g. Bump back up to 48g as
that should give us a bit more headroom.

Change-Id: I14a8f2b298aa1d3cb5c0829508ee137a6769675b
2020-12-09 15:26:43 -08:00
Clark Boylan
95d9b83814 Reduce gerrit heap limit to 44g
We had been setting this to 48GB on java 8, but recent gerrit service
issues indicate that this may be too large for our current system on
java 11. In particular it appears the non heap portions of the jvm may
be in the ~8GB range leaving only about 5-6GB of usable system memory
for other activities like web servers, backups, and garbage collection.

Reduce this to 44GB to increase headroom to see if that helps us. Java
11 is reported to be much more efficient at garbage collecting so
hopefully that makes up the difference between lower memory and where we
were on java 8. As a side note we could revert back to java 8 as another
option.

Change-Id: Ie326aad2a9895098b484924a26c9257cd009d89e
2020-12-08 07:31:53 -08:00
fungi.admin
2197f11a0f Merge "Omnibus Gerrit 3.2 changes" 2020-11-21 17:19:58 +00:00
Zuul
ba27a1fda6 Merge "Add codesearch.opendev.org server" 2020-11-19 23:42:56 +00:00
Zuul
1b16dae681 Merge "Migrate codesearch site to container" 2020-11-19 22:26:12 +00:00
Ian Wienand
4ce223d83a Add codesearch.opendev.org server
Change-Id: I1e75ca551871999a654000f103aaf833679e804e
Depends-On: https://review.opendev.org/763297
2020-11-20 07:41:43 +11:00
Ian Wienand
368466730c Migrate codesearch site to container
The hound project has undergone a small re-birth and moved to

 https://github.com/hound-search/hound

which has broken our deployment.  We've talked about leaving
codesearch up to gitea, but it's not quite there yet.  There seems to
be no point working on the puppet now.

This builds a container than runs houndd.  It's an opendev specific
container; the config is pulled from project-config directly.

There's some custom scripts that drive things.  Some points for
reviewers:

 - update-hound-config.sh uses "create-hound-config" (which is in
   jeepyb for historical reasons) to generate the config file.  It
   grabs the latest projects.yaml from project-config and exits with a
   return code to indicate if things changed.

 - when the container starts, it runs update-hound-config.sh to
   populate the initial config.  There is a testing environment flag
   and small config so it doesn't have to clone the entire opendev for
   functional testing.

 - it runs under supervisord so we can restart the daemon when
   projects are updated.  Unlike earlier versions that didn't start
   listening till indexing was done, this version now puts up a "Hound
   is not ready yet" message when while it is working; so we can drop
   all the magic we were doing to probe if hound is listening via
   netstat and making Apache redirect to a status page.

 - resync-hound.sh is run from an external cron job daily, and does
   this update and restart check.  Since it only reloads if changes
   are made, this should be relatively rare anyway.

 - There is a PR to monitor the config file
   (https://github.com/hound-search/hound/pull/357) which would mean
   the restart is unnecessary.  This would be good in the near and we
   could remove the cron job.

 - playbooks/roles/codesearch is unexciting and deploys the container,
   certificates and an apache proxy back to localhost:6080 where hound
   is listening.

I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.

Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
2020-11-20 07:41:12 +11:00
Clark Boylan
57f9e54ad8 Omnibus Gerrit 3.2 changes
These changes are squashed together to simplify applying them to config
management without zuul and ansible running one of these without the
others. We essentially need them all in place at the same time to
accurately reflect the post upgrade state.

We stop blocking /p/ in gerrit's apache vhost. /p/ is used for
dashboards.

We add a few java options that new gerrit sets by default.

We update the gerrit image in docker compose to 3.2.

We update zuul to use basic auth instead of digest auth when talking to
Gerrit.

Change-Id: I6ea38313544ce1ecbc4cfd914b1f33e77d0d2d03
2020-11-17 16:04:56 -08:00
Zuul
2c7591c318 Merge "Set gerrit.serverId in gerrit.config" 2020-11-17 21:22:53 +00:00
Ian Wienand
c16501af8a zuul backup : expand debug log match
Follow-on to Ia9579c7b3204b47d453fc51388265bf1867af20c, this also
matches the web-debug* log files

Change-Id: Ibabbfa3b01317528a75eeec17ea28168da57123a
2020-11-13 14:34:06 +11:00
Ian Wienand
dbff6071b1 backup: skip zuul debug logs for backup
This cuts out the bulk of the storage expense, but leaves us with the
regular logs for enhanced audit trails.

Change-Id: Ia9579c7b3204b47d453fc51388265bf1867af20c
2020-11-12 12:11:39 +11:00
Ian Wienand
6bcfe05742 review: trim backups
This should help reduce the bulk of the review site backups

 * launchpadlib cache has ~650,000 files which we don't need to track
 * review_site/tmp has ~50,000 files
 * review_site/cache is about 9gb
 * review_site/index is optional to backup, but a) it's very unlikley
   to be useful in a full restore situation; we'd have to re-create
   them and b) things seem to come and go under this directory during
   the backup, causing it to exit with an error status.

Change-Id: If7009cfcd5a3a07c07108149772cc8c1873bf277
2020-11-11 23:36:11 +00:00
Clark Boylan
b9b1cba959 Set gerrit.serverId in gerrit.config
This serverId value is used by notedb to identify the gerrit cluster
that notedb contents belong to. By default a random uuid is generated by
gerrit for this value. In order to avoid config management and gerrit
fighting over this value after we upgrade we set a value now.

This should be safe to land on 2.13 as old gerrit should ignore the
value.

Change-Id: I57c9b436a9d0d1dfe77eee907d50fc1dcda6ab12
2020-11-10 10:30:58 -08:00
Ian Wienand
b05a98440a Remove etherpad from bup backup
bup is going crazy and filling the disk when making its backups.  We
have moved this into the borg backup group and run some backups, so
rather than spending time debugging this, we are just going to disable
bup on the server.

Change-Id: I1daad4eb05f8222131dc84c12577dec924874466
2020-11-10 13:52:03 +11:00
Zuul
9ff95a5f00 Merge "etherpad: ignore live db for borg backups" 2020-11-10 00:11:22 +00:00
Zuul
d11949817d Merge "Add all backup hosts to borg backups" 2020-11-09 23:39:51 +00:00
Ian Wienand
b26622ad12 etherpad: ignore live db for borg backups
Change-Id: Ie7f7e189720e68ec0b07a727be0f5752da20566d
2020-11-10 10:11:24 +11:00
Zuul
d3a53e8ec0 Merge "Remove mirror-update server and related puppet" 2020-11-09 21:07:11 +00:00
Ian Wienand
d533e89089 Add all backup hosts to borg backups
Backups have been going well on ethercalc02, so add borg backup runs
to all backed-up servers.  Port in some additional excludes for Zuul
and slightly modify the /var/ matching.

Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb
2020-11-09 17:23:22 +11:00
Ian Wienand
3568b76c3c Add * match to grafana.opendev.org
This wasn't matching grafana01

Change-Id: I930a6d1428d8becd29d15fdb53d26b0c186b79fd
2020-11-05 11:35:57 +11:00
Zuul
1dc940c74f Merge "RAX DFW/IAD : add internal mirror DNS to cert" 2020-11-04 03:28:57 +00:00
Ian Wienand
676c5dad44 Add borg backup server in RAX ORD
This is our second backup server for borg, hosted in RAX/ORD.

Change-Id: I2c896345e497067ce12863bdb1dda8ce467e2243
2020-10-30 16:39:25 +11:00