Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
This shifts our Gerrit upgrade testing ahead to testing 3.3 to 3.4
upgrades as we have upgraded to 3.3 at this point.
Change-Id: Ibb45113dd50f294a2692c65f19f63f83c96a3c11
This bumps the gerrit image up to our 3.3 image. Followup changes will
shift upgrade testing to test 3.3 to 3.4 upgrades, clean up no longer
needed 3.2 images, and start building 3.4 images.
Change-Id: Id0f544846946d4c50737a54ceb909a0a686a594e
This uses the opendev assets bundle image created with
I3166679bde6d771276289b9d32e7e4407957b2f8.
The mount options require using BuildKit, hence the Dockerfile update.
Otherwise conceptually it's fairly simple; copy in the files from the
opendevorg/assets image rather than the file-system.
Change-Id: I36bdc76471eec5380a676ebcdd885a88d3985976
Move some common assets into a top-level assets/ directory. Services
can reference these assets via
https://opendev.org/opendev/system-config/raw/branch/master/assets/<file>
in <img> tags, etc.
Some services want to embed these into their images, but we wish to
only keep one canonical copy. For this, add a Dockerfile and jobs
that creates a simple bundle of assets in opendevorg/assets. This can
be referenced in other builds; the new BuildKit bind-mount is
particularly useful for this
(c.f. I36bdc76471eec5380a676ebcdd885a88d3985976).
Change-Id: I3931566eb86a0618705d276445fa0a5f659692ea
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.
Change-Id: I364037232cf0e6f3fa150f4dbb736ef27d1be3f8
Update the file matchers to actually match the current set of puppet
things. This ensure the deploy job runs when we want it and we can catch
up daily instead of hourly.
Previously a number of the matchers didn't actually match the puppet
things because the path prefix was wrong or works were in different
orders for the dir names.
Change-Id: I3510da81d942cf6fb7da998b8a73b0a566ea7411
This is being done beacuse we don't make many changes to the
zuul-preview service but it runs in the hourly buildset starving deploy
runs. Since this doesn't change much we can move it to the daily run
instead.
If we need to update it we can run the playbook manually or land a
change to trigger it.
Change-Id: I89d2c712fcfd18bd4f694b2c90067295253b8836
This is a job that takes quite a bit of time, but only rarely do we need
the updates encoded in this job. Move the job from our hourly deployment
to the daily deployment to make its impact less painful.
Change-Id: I724bcdd67f4c324f497a9d8239bcfd8d37528956
Thin runs the new matrix-eavesdrop bot on the eavesdrop server.
It will write logs out to the limnoria logs directory, which is mounted
inside the container.
Change-Id: I867eec692f63099b295a37a028ee096c24109a2e
The paste service needs an upgrade; since others have created a
lodgeit container it seems worth us keeping the service going if only
to maintain the historical corpus of pastes.
This adds the ansible to deploy lodgeit and a sibling mariadb
container. I have imported a dump of the old data as a test. The
dump is ~4gb and imported it takes up about double that; certainly
nothing we need to be too concerned over. The server will be more
than capable of running the db container alongside the lodgeit
instance.
This should have no effect on production until we decide to switch
DNS.
Change-Id: I284864217aa49d664ddc3ebdc800383b2d7e00e3
This installs our Limnoira/meetbot container and configures it on
eavesdrop01.opendev.org. I have ported the configuration from the old
puppet as best I can (it is very verbose); my procedure was to use the
Limnoira wizard to start a new config file then backport everything
from the old file. I felt this was best to not miss any new options.
This does channel logging (via built-in ChannelLogger plugin, along
with a cron job for logs2html) and runs our fork of meetbot.
It exports the channel logs via HTTP to /irclogs and meetings logs to
/meetings. meetings.opendev.org will proxy to these two locations
when the server is active.
Note this has not ported the channel list; so the bot will not be
listening in our channels.
Change-Id: I9f9a466c271e1a706f9f98f816de0e84047519f1
This container installs Limnoria, the supybot replacement as the
generic ircbot container. We install meetbot plugin as a sibling
project.
Previously we've conflated supybot with meetbot, which is a bit
confusing because meetbot is a plugin, but we also use other plugins
such as the channel logger. We also hope to convert some of our other
bots to Limnoria (ptgbot?) to consolidate everything. For this reason
I've called this the more generic "ircbot". The image installs
meetbot as a sibling project, with the idea being any other plugins
would also be installed as siblings.
The siblings install expects the work directory to be a relative
directory. I'm not sure we run this from other projects, but this
will work the same if we do.
Depends-On: https://review.opendev.org/c/opendev/meetbot/+/793876
Change-Id: Icee4c6bbb5ea235ba69c10f800a14bbf5beef3d5
We're trying to phase out the ELK systems. While we have agreed to not
immediately turn anything off we probably don't need to keep running the
system-config-legacy-logstash-filters job as ELK should remain fairly
fixed unless someone rewrites config management for it and modernizes
it. And if that happens they will want new modern testing too.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/792710
Change-Id: I9ac6f12ec3245e3c1be0471d5ed17caec976334f
This job is not added in the parent so that we can manually run
playbooks after the parent lands. Once we are happy with the results
from the new service-lists.yaml playbook we can land this change and
have zuul automatically apply it when necessary.
Change-Id: I38de8b98af9fb08fa5b9b8849d65470cbd7b3fdc
Python 3.9 is released, so let's build containers.
This splits the docker-images/ files up as they are becoming a bit
crowded.
Change-Id: Id68080575a30e4a08c99df0af603fbb65a0983bd
With our increased ability to test in the gate, there's not much use
for review-dev any more. Remove references.
Change-Id: I97e9865e0b655cd157acf9ffa7d067b150e6fc72
This adds a program, zookeeper-statsd, which monitors zookeeper
metrics and reports them to statsd. It also adds a container to
run that program. And it runs the container on each of the
ZooKeeper quorum members. And it updates the graphite host to
allow statsd traffic from quorum members. And it updates the
4-letter-word whitelist to allow the mntr command (which is used
to gather metrics) to be issued.
Change-Id: I298f0b13a05cc615d8496edd4622438507fc5423
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.
This role automates realm creation, initial setup, key material
distribution and replica host configuration. None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.
Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.
Change-Id: I60b40897486b29beafc76025790c501b5055313d
This has our change to open etherpad on join, so we should no longer need
to run a fork of the web server. Switch to the upstream container image
and stop building our own.
Change-Id: I3e8da211c78b6486a3dcbd362ae7eb03cc9f5a48
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
We need to depend on the buildset registry as we are building this image
in a separate job. We also don't need to depend on the build job in
gate, we only need the upload job.
Change-Id: Ie7c2ed29c028f8c23d67ad38edbe04b12e22d026
This change splits our existing system-config-run-review job into two
jobs, one for gerrit 3.2 and another for 3.3. The biggest change is that
we use a var called zuul_test_gerrit_version to select which version we
want and that ends up in the fake group file written out by Zuul for the
nested ansible run. The nested ansible run will then populate the
docker-compose file with the appropriate version for us.
Change-Id: I00b52c0f4aa8df3ecface964007fcf5724887e5e
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.
Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
This starts at migrating OpenAFS server setup to Ansible.
Firstly we split up the groups and explicitly name hosts, as we will
me migrating each one step-by-step. We split out 1.8 hosts into a new
afs-1.8 group; the first host is afs01.ord.openstack.org which already
has openafs 1.8 installed manually.
An openafs-server role is introduced that does the same setup as the
extant puppet.
The AFS job is renamed to infra-prod-afs as the puppet component will
eventually disappear. Otherwise it runs in the same way, but also
runs the openafs-server role for the 1.8 servers.
Once this is merged, we can run it against afs01.ord.openstack.org to
ensure it works and is idempotent. We can then take on upgrading the
other file servers, and work further on the database servers.
Change-Id: I7998af43961999412f58a78214f4b5387713d30e
Having upgraded to 3.2, we don't need these versions any more.
Change-Id: Ifc37a75aa62b2498e649a4c81b589a04c794184a
Depends-On: https://review.opendev.org/763617
The hound project has undergone a small re-birth and moved to
https://github.com/hound-search/hound
which has broken our deployment. We've talked about leaving
codesearch up to gitea, but it's not quite there yet. There seems to
be no point working on the puppet now.
This builds a container than runs houndd. It's an opendev specific
container; the config is pulled from project-config directly.
There's some custom scripts that drive things. Some points for
reviewers:
- update-hound-config.sh uses "create-hound-config" (which is in
jeepyb for historical reasons) to generate the config file. It
grabs the latest projects.yaml from project-config and exits with a
return code to indicate if things changed.
- when the container starts, it runs update-hound-config.sh to
populate the initial config. There is a testing environment flag
and small config so it doesn't have to clone the entire opendev for
functional testing.
- it runs under supervisord so we can restart the daemon when
projects are updated. Unlike earlier versions that didn't start
listening till indexing was done, this version now puts up a "Hound
is not ready yet" message when while it is working; so we can drop
all the magic we were doing to probe if hound is listening via
netstat and making Apache redirect to a status page.
- resync-hound.sh is run from an external cron job daily, and does
this update and restart check. Since it only reloads if changes
are made, this should be relatively rare anyway.
- There is a PR to monitor the config file
(https://github.com/hound-search/hound/pull/357) which would mean
the restart is unnecessary. This would be good in the near and we
could remove the cron job.
- playbooks/roles/codesearch is unexciting and deploys the container,
certificates and an apache proxy back to localhost:6080 where hound
is listening.
I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.
Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
This will scale up our meetpad install by 50% giving us more capacity
for PTG sessions.
We also increase the tox linters job timeout as it is slow pip
installing then slow running ansible-lint. Do this until we can sort out
why it is slow.
Change-Id: Ieceafefa27266f0bc0f427af790f920a8c44326c
Now that gerritbot is deployed from containers on eavesdrop we want to
run the infra-prod-service-eavesdrop job hourly to ensure that we keep
the docker image up to date there.
We haven't added the service-eavesdrop job to a deploy pipeline in
gerritbot because that would require us to add gerritbot's project ssh
key to bridge.
Change-Id: I5aba91f2ae5c018ee9b2d0481a53b630fc5d1ab7
This adds roles to implement backup with borg [1].
Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal. This means it is effectively end-of-life. borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported. It also has the clarkb seal
of approval :)
As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server. The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.
This chooses to install borg in a virtualenv on /opt. This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync. Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ. I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.
Borg has a lot of support for encrypting the data at rest in various
ways. However, that introduces the possibility we could lose both the
key and the backup data. Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.
The remote end server is configured via ssh command rules to run in
append-only mode. This means a misbehaving client can't delete its
old backups. In theory we can prune backups on the server side --
something we could not do with bup. The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.
Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server. Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.
No hosts are currently in the borg groups, so this can be applied
without affecting production. I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this. After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.
[1] https://borgbackup.readthedocs.io/en/stable/
Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
There is a new release, update base container. Add promote job that
was forgotten with the original commit
Iddfafe852166fe95b3e433420e2e2a4a6380fc64.
Change-Id: Ie0d7febd2686d267903b29dfeda54e7cd6ad77a3