This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database. This is inspired by a few things
- since migration to NoteDB, there is only one table left where
Gerrit records what files have been reviewed for a change. This
logically scales with the number of reviews users are doing.
Pulling the stats on this, we can see since the NoteDB upgrade this
went from a very busy database (~300 queries/70 commits per second)
to barely registering one hit per second :
https://imgur.com/a/QGJV7Fw
Thus separating the db to an external host for performance reasons
is not a large concern any more.
- emperically we've done a bad job in keeping the existing hosted db
up-to-date; it's still running mysql 5.1 and we have been hit by
bugs such as the one referenced in-line which silently drops
backups.
- The other gerrit option is to use an on-disk H2 database. This is
certainly an option, however you need special tools to interact
with it for migration, etc. and it's not safe to backup from files
on disk (as opposed to mysqldump). Upstream advice is unclear, and
varies between H2 being a performance bottleneck to this being
ephemeral data that users don't care about. We know how to admin
mariadb/mysql and this allows us to migrate and backup data, so
seems like the best choice.
- we have a pressing need to update the server to a new operating
system. Running the db alongside the gerrit instance minimises
fiddling we have to do manging connections to and migrating the
hosted db systems.
- related to that, we are tending towards more provider independence
for control-plane servers. A hosted database product is not always
provided, so this gives us more flexibility in moving things
around.
- the main concern here is memory usage. "docker stats" reports a
quiescent container, freshly started on a 8GB host:
gerrit-compose_mariadb_1 67.32MiB
After loading a copy of the production table, and then dumping it
back to a file the same container reports:
gerrit-compose_mariadb_1 462.6MiB
The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).
Backups of the local container need different dump commands; backups
are relocated to a new file and updated.
Testing is converted to use this rather than a local H2 database.
[1] https://docs.docker.com/compose/startup-order/
Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
This installs statusbot on eavesdrop01.opendev.org.
Otherwise it's just config translation and bringing up the daemon.
Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
This installs our Limnoira/meetbot container and configures it on
eavesdrop01.opendev.org. I have ported the configuration from the old
puppet as best I can (it is very verbose); my procedure was to use the
Limnoira wizard to start a new config file then backport everything
from the old file. I felt this was best to not miss any new options.
This does channel logging (via built-in ChannelLogger plugin, along
with a cron job for logs2html) and runs our fork of meetbot.
It exports the channel logs via HTTP to /irclogs and meetings logs to
/meetings. meetings.opendev.org will proxy to these two locations
when the server is active.
Note this has not ported the channel list; so the bot will not be
listening in our channels.
Change-Id: I9f9a466c271e1a706f9f98f816de0e84047519f1
This container installs Limnoria, the supybot replacement as the
generic ircbot container. We install meetbot plugin as a sibling
project.
Previously we've conflated supybot with meetbot, which is a bit
confusing because meetbot is a plugin, but we also use other plugins
such as the channel logger. We also hope to convert some of our other
bots to Limnoria (ptgbot?) to consolidate everything. For this reason
I've called this the more generic "ircbot". The image installs
meetbot as a sibling project, with the idea being any other plugins
would also be installed as siblings.
The siblings install expects the work directory to be a relative
directory. I'm not sure we run this from other projects, but this
will work the same if we do.
Depends-On: https://review.opendev.org/c/opendev/meetbot/+/793876
Change-Id: Icee4c6bbb5ea235ba69c10f800a14bbf5beef3d5
This updates our plugin versions to keep up with Gerrit releases. We
build from the stable branch which means we need to periodically bump
plugin tagged versions to keep up when stable branches for the plugins
are not available.
Change-Id: I5bb50d8610ebdbdb9c6a70ad8dde732067cc368f
This moves these services to eavesdrop01.opendev.org, a new
Focal-based server to host IRC services.
We have stopped running puppet on eavesdrop01.openstack.org so there
is nothing left for it to do (note the server is still running
meetbot/ptgbot). Remove the commented out puppet run, and remove the
server from puppet groups. Update the host in the Zuul jobs to the
new node.
Change-Id: I809f9af3e78f566362142790f6c79654ef5b8959
When the testinfra tests for listservs change we should run the
system-config-run-lists job. Make it so with file matchers.
Change-Id: I64402345ceaab2056d8ed1b3a579063de0b29c9b
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
Nodepool doesn't use any puppetry anymore. We don't need to test changes
to puppet against nodepool.
Change-Id: I0ec824e21c17b074dd82eb298de6d196802aac28
We're trying to phase out the ELK systems. While we have agreed to not
immediately turn anything off we probably don't need to keep running the
system-config-legacy-logstash-filters job as ELK should remain fairly
fixed unless someone rewrites config management for it and modernizes
it. And if that happens they will want new modern testing too.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/792710
Change-Id: I9ac6f12ec3245e3c1be0471d5ed17caec976334f
This reduces the scope of our puppet related testing to things that
continue to use puppet. This is probably not strictly necessary but
helps keep us up to date with our TODO list.
Change-Id: I52bfff09ad0ddeabe7ad151bcf88c912f86a76ec
We run these ansible jobs serially which means we don't gain much by
forcing ansible to use a small number of forks. Double the default for
our infra prod job fork count from 5 to 10 to see if this speeds up our
deploy jobs.
Note some jobs override this value to either add more forks or fewer
when necessary. These are left as is.
Change-Id: I6fded724cb9c8654153bcc5937eae7203326076e
This cleans up zuul01 as it should no longer be used at this point. We
also make the inventory groups a bit more clear that all zuul servers
are under the opendev.org domain now.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/790483
Change-Id: I7885fe60028fbd87688f3ae920a24bce4d1a3acd
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.
I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
This job is not added in the parent so that we can manually run
playbooks after the parent lands. Once we are happy with the results
from the new service-lists.yaml playbook we can land this change and
have zuul automatically apply it when necessary.
Change-Id: I38de8b98af9fb08fa5b9b8849d65470cbd7b3fdc
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
These files are involved in creating gerrit docker images; make sure
we trigger jobs when they are modified.
Change-Id: I7c4436e066cfb0c2d0b2ca7adf54c99b09dac95f
It seems we have some debugging to do on the openafs roles. The other
roles here, particularly the bazelisk one, aren't tested here, so
reduce the file matcher.
We can overhaul this more, but it seems like a post-puppet/xenial
thing to do.
Change-Id: I0a41ef48eab0560a23a4e29463435dfe0758d01e
Python 3.9 is released, so let's build containers.
This splits the docker-images/ files up as they are becoming a bit
crowded.
Change-Id: Id68080575a30e4a08c99df0af603fbb65a0983bd
Bump this timeout for a couple of reasons. First we've seen the job
timeout at least once in the last month. This seems to be due to gitea
portions of the job running slowly.
Second we're planning some large scale updates to the openstack acls and
a longer timeout should help us get those in in larger batches. We can
consider trimming this back again after these updates are done if gitea
doesn't continue to give us trouble.
Change-Id: Ib61849b4c73a1b3fa2a0bbe90ace29fb23849449
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
With our increased ability to test in the gate, there's not much use
for review-dev any more. Remove references.
Change-Id: I97e9865e0b655cd157acf9ffa7d067b150e6fc72
This should ensure that if we have a parent job that updates the gitea
version and a do not merge child job that induces an artificial failure
for zuul hold purposes that we test the correct image in the child job's
changes.
Prior to this we were testing the existing published images, but
provides + requires will give the correct signaling to make the desired
"test new proposed image" behavior happen in the child change builds.
Change-Id: Ie6b827b650e0f32606dc5ec7f4aa0adfeebdeb5e
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.
Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8
This adds a program, zookeeper-statsd, which monitors zookeeper
metrics and reports them to statsd. It also adds a container to
run that program. And it runs the container on each of the
ZooKeeper quorum members. And it updates the graphite host to
allow statsd traffic from quorum members. And it updates the
4-letter-word whitelist to allow the mntr command (which is used
to gather metrics) to be issued.
Change-Id: I298f0b13a05cc615d8496edd4622438507fc5423
This adds the new focal nodepool launchers replacements for nl02-04 to
our inventory. This will configure them with an idle configuration. We
then confirm they are happy running in an idle state then switch over
the config from the old to new servers.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/780982
Change-Id: Iea645925caaeee6f498aa690c4f2c848f6899317
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.
This role automates realm creation, initial setup, key material
distribution and replica host configuration. None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.
Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.
Change-Id: I60b40897486b29beafc76025790c501b5055313d
There is some correlation that running the manage-projects playbook
gives our gitea fits. The bulk of the work done here is in trying to
update the descriptions of all projects. There isn't a good way to see
if the description is already set first so we just try and ignore
errors. This creates potentially thousands of operations all at once and
could be why things are sad.
We move these operations under the always update flag which is not set
on normal runs. If we really need to converge to a good updated state we
can manually run the playbook/role with always update set.
We also don't set a limit on the number of ThreadPoolExecutor workers
which will default to 5 * NumProcs. Could be that tuning this down would
make gitea happier.
One other thought is that we may not be using request sessions properly
for connection reuse. In particular requests notes that you need to set
stream to False or read request content to return a connection back to
the pool for reuse. We might look into this for further improvements.
Change-Id: I6e6fb1eb08303e9da7e38cf493d1871364340000
This is a new focal replacement for nl01.openstack.org. We keep
nl01.openstack.org in our inventory for now because we want ansible to
update the nodepool.yaml configs for these two hosts to coordinate a
hand off of responsibilities once we are happy with the new deployment.
We also switch the testing hostname to nl04.openstack.org as this will
be the last nodepool launcher to be removed. When we swap it out the
testing will be updated to use focal hosts.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/779863
Change-Id: Ib3ea6586fe0567c1edf6255ee9be50164d35db62