193 Commits

Author SHA1 Message Date
Zuul
9181d5198d Merge "gerrit: add mariadb_container option" 2021-06-16 23:14:48 +00:00
Ian Wienand
570ca85cd8 gerrit: add mariadb_container option
This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database.  This is inspired by a few things

 - since migration to NoteDB, there is only one table left where
   Gerrit records what files have been reviewed for a change.  This
   logically scales with the number of reviews users are doing.
   Pulling the stats on this, we can see since the NoteDB upgrade this
   went from a very busy database (~300 queries/70 commits per second)
   to barely registering one hit per second :
   https://imgur.com/a/QGJV7Fw

   Thus separating the db to an external host for performance reasons
   is not a large concern any more.

 - emperically we've done a bad job in keeping the existing hosted db
   up-to-date; it's still running mysql 5.1 and we have been hit by
   bugs such as the one referenced in-line which silently drops
   backups.

 - The other gerrit option is to use an on-disk H2 database.  This is
   certainly an option, however you need special tools to interact
   with it for migration, etc. and it's not safe to backup from files
   on disk (as opposed to mysqldump).  Upstream advice is unclear, and
   varies between H2 being a performance bottleneck to this being
   ephemeral data that users don't care about.  We know how to admin
   mariadb/mysql and this allows us to migrate and backup data, so
   seems like the best choice.

 - we have a pressing need to update the server to a new operating
   system.  Running the db alongside the gerrit instance minimises
   fiddling we have to do manging connections to and migrating the
   hosted db systems.

 - related to that, we are tending towards more provider independence
   for control-plane servers.  A hosted database product is not always
   provided, so this gives us more flexibility in moving things
   around.

 - the main concern here is memory usage.  "docker stats" reports a
   quiescent container, freshly started on a 8GB host:

    gerrit-compose_mariadb_1  67.32MiB

   After loading a copy of the production table, and then dumping it
   back to a file the same container reports:

    gerrit-compose_mariadb_1  462.6MiB

The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).

Backups of the local container need different dump commands; backups
are relocated to a new file and updated.

Testing is converted to use this rather than a local H2 database.

[1] https://docs.docker.com/compose/startup-order/

Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
2021-06-16 13:57:13 +10:00
Ian Wienand
f304c1a161 Update eavesdrop deploy job
This was missed when adding the statusbot/ircbot containers

Change-Id: I198da471b8a0dd648a8e9f1bfe41988561a745f8
2021-06-11 23:23:20 +10:00
Zuul
b9d885ff2d Merge "Run statusbot from eavesdrop01.opendev.org" 2021-06-11 07:45:55 +00:00
Ian Wienand
23fac31c92 Run statusbot from eavesdrop01.opendev.org
This installs statusbot on eavesdrop01.opendev.org.

Otherwise it's just config translation and bringing up the daemon.

Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
2021-06-11 07:52:51 +10:00
Zuul
bfaa4713eb Merge "Remove system-config-legacy-logstash-filters job" 2021-06-10 17:29:17 +00:00
Zuul
084879c1fa Merge "limnoria/meetbot setup on eavesdrop01.opendev.org" 2021-06-10 02:04:53 +00:00
Zuul
c51b860620 Merge "Create ircbot container" 2021-06-10 01:04:32 +00:00
Ian Wienand
403773d55a limnoria/meetbot setup on eavesdrop01.opendev.org
This installs our Limnoira/meetbot container and configures it on
eavesdrop01.opendev.org.  I have ported the configuration from the old
puppet as best I can (it is very verbose); my procedure was to use the
Limnoira wizard to start a new config file then backport everything
from the old file.  I felt this was best to not miss any new options.

This does channel logging (via built-in ChannelLogger plugin, along
with a cron job for logs2html) and runs our fork of meetbot.

It exports the channel logs via HTTP to /irclogs and meetings logs to
/meetings.  meetings.opendev.org will proxy to these two locations
when the server is active.

Note this has not ported the channel list; so the bot will not be
listening in our channels.

Change-Id: I9f9a466c271e1a706f9f98f816de0e84047519f1
2021-06-10 09:02:16 +10:00
Ian Wienand
0d00b28da8 Create ircbot container
This container installs Limnoria, the supybot replacement as the
generic ircbot container.  We install meetbot plugin as a sibling
project.

Previously we've conflated supybot with meetbot, which is a bit
confusing because meetbot is a plugin, but we also use other plugins
such as the channel logger.  We also hope to convert some of our other
bots to Limnoria (ptgbot?) to consolidate everything.  For this reason
I've called this the more generic "ircbot".  The image installs
meetbot as a sibling project, with the idea being any other plugins
would also be installed as siblings.

The siblings install expects the work directory to be a relative
directory.  I'm not sure we run this from other projects, but this
will work the same if we do.

Depends-On: https://review.opendev.org/c/opendev/meetbot/+/793876
Change-Id: Icee4c6bbb5ea235ba69c10f800a14bbf5beef3d5
2021-06-10 09:00:43 +10:00
Clark Boylan
b53914f812 Update Gerrit plugin versions for Gerrit 3.2 and 3.3
This updates our plugin versions to keep up with Gerrit releases. We
build from the stable branch which means we need to periodically bump
plugin tagged versions to keep up when stable branches for the plugins
are not available.

Change-Id: I5bb50d8610ebdbdb9c6a70ad8dde732067cc368f
2021-06-08 13:06:00 -07:00
Ian Wienand
fec8018581 Move gerritbot/accessbot to new eavesdrop server
This moves these services to eavesdrop01.opendev.org, a new
Focal-based server to host IRC services.

We have stopped running puppet on eavesdrop01.openstack.org so there
is nothing left for it to do (note the server is still running
meetbot/ptgbot).  Remove the commented out puppet run, and remove the
server from puppet groups.  Update the host in the Zuul jobs to the
new node.

Change-Id: I809f9af3e78f566362142790f6c79654ef5b8959
2021-06-08 08:16:56 +10:00
Zuul
d0eabc863c Merge "Run system-config-run-lists when testinfra changes" 2021-06-02 04:38:13 +00:00
Zuul
1953b693fc Merge "Cleanup puppet things from zuul where we don't puppet anymore" 2021-06-01 20:35:44 +00:00
Clark Boylan
ebaf6b436c Run system-config-run-lists when testinfra changes
When the testinfra tests for listservs change we should run the
system-config-run-lists job. Make it so with file matchers.

Change-Id: I64402345ceaab2056d8ed1b3a579063de0b29c9b
2021-06-01 12:57:13 -07:00
David Moreau Simard
fb8a5145df Update ARA
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.

In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
2021-06-01 09:38:32 +10:00
Clark Boylan
75356b4bc3 Stop testing nodepool when puppet changes
Nodepool doesn't use any puppetry anymore. We don't need to test changes
to puppet against nodepool.

Change-Id: I0ec824e21c17b074dd82eb298de6d196802aac28
2021-05-26 13:50:51 -07:00
Clark Boylan
6e04e500fd Remove system-config-legacy-logstash-filters job
We're trying to phase out the ELK systems. While we have agreed to not
immediately turn anything off we probably don't need to keep running the
system-config-legacy-logstash-filters job as ELK should remain fairly
fixed unless someone rewrites config management for it and modernizes
it. And if that happens they will want new modern testing too.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/792710
Change-Id: I9ac6f12ec3245e3c1be0471d5ed17caec976334f
2021-05-21 17:03:32 -07:00
Clark Boylan
30a916ff94 Cleanup puppet things from zuul where we don't puppet anymore
This reduces the scope of our puppet related testing to things that
continue to use puppet. This is probably not strictly necessary but
helps keep us up to date with our TODO list.

Change-Id: I52bfff09ad0ddeabe7ad151bcf88c912f86a76ec
2021-05-21 17:03:08 -07:00
Zuul
be4f67f23e Merge "Add infra-prod-service-lists job" 2021-05-19 19:16:32 +00:00
Zuul
9fbd1ccf2c Merge "Ansible mailman configs" 2021-05-19 15:55:09 +00:00
Clark Boylan
8d9975be67 Double the default number of ansible forks
We run these ansible jobs serially which means we don't gain much by
forcing ansible to use a small number of forks. Double the default for
our infra prod job fork count from 5 to 10 to see if this speeds up our
deploy jobs.

Note some jobs override this value to either add more forks or fewer
when necessary. These are left as is.

Change-Id: I6fded724cb9c8654153bcc5937eae7203326076e
2021-05-14 12:14:15 -07:00
Clark Boylan
c743b7e484 Clean up zuul01 from inventory
This cleans up zuul01 as it should no longer be used at this point. We
also make the inventory groups a bit more clear that all zuul servers
are under the opendev.org domain now.

Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/790483
Change-Id: I7885fe60028fbd87688f3ae920a24bce4d1a3acd
2021-05-13 06:58:36 -07:00
Clark Boylan
533594d959 Add zuul02 to inventory
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.

I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.

Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
2021-05-13 06:58:30 -07:00
Clark Boylan
caedb11d3d Add infra-prod-service-lists job
This job is not added in the parent so that we can manually run
playbooks after the parent lands. Once we are happy with the results
from the new service-lists.yaml playbook we can land this change and
have zuul automatically apply it when necessary.

Change-Id: I38de8b98af9fb08fa5b9b8849d65470cbd7b3fdc
2021-05-11 08:40:06 -07:00
Clark Boylan
4c4e27cb3a Ansible mailman configs
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.

Followups will further cleanup the puppetry.

Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
2021-05-11 08:40:01 -07:00
Ian Wienand
5357b33e57 gerrit docker: match some more files
These files are involved in creating gerrit docker images; make sure
we trigger jobs when they are modified.

Change-Id: I7c4436e066cfb0c2d0b2ca7adf54c99b09dac95f
2021-05-07 11:06:13 +10:00
Ian Wienand
57e29c3680 system-config-roles: only match jobs on roles tested
It seems we have some debugging to do on the openafs roles.  The other
roles here, particularly the bazelisk one, aren't tested here, so
reduce the file matcher.

We can overhaul this more, but it seems like a post-puppet/xenial
thing to do.

Change-Id: I0a41ef48eab0560a23a4e29463435dfe0758d01e
2021-05-07 11:05:21 +10:00
Ian Wienand
629fdec768 Build Python 3.9 python-builder/base containers
Python 3.9 is released, so let's build containers.

This splits the docker-images/ files up as they are becoming a bit
crowded.

Change-Id: Id68080575a30e4a08c99df0af603fbb65a0983bd
2021-05-05 09:55:56 +10:00
Zuul
82435b279a Merge "Add zk04.opendev.org" 2021-04-27 16:33:02 +00:00
Clark Boylan
6e7c07411b Bump the infra-prod-manage-projects job timeout
Bump this timeout for a couple of reasons. First we've seen the job
timeout at least once in the last month. This seems to be due to gitea
portions of the job running slowly.

Second we're planning some large scale updates to the openstack acls and
a longer timeout should help us get those in in larger batches. We can
consider trimming this back again after these updates are done if gitea
doesn't continue to give us trouble.

Change-Id: Ib61849b4c73a1b3fa2a0bbe90ace29fb23849449
2021-04-16 14:10:34 -07:00
Clark Boylan
7502b87837 Add zk04.opendev.org
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).

Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
2021-04-15 13:20:29 -07:00
Zuul
a7be740183 Merge "Fix up openafs-client job matching" 2021-04-12 22:43:13 +00:00
Zuul
b5f3f7ef49 Merge "zuul-summary-status : handle SKIPPED and ERROR jobs" 2021-04-09 02:08:58 +00:00
Zuul
3180086559 Merge "Rename refstack group variables" 2021-03-29 21:33:02 +00:00
Ian Wienand
9f11fc5c75 Remove references to review-dev
With our increased ability to test in the gate, there's not much use
for review-dev any more.  Remove references.

Change-Id: I97e9865e0b655cd157acf9ffa7d067b150e6fc72
2021-03-24 11:40:31 +11:00
Zuul
acf0e00478 Merge "Set up gitea image provides and requires for gating" 2021-03-23 18:29:35 +00:00
Clark Boylan
c2d46f4247 Set up gitea image provides and requires for gating
This should ensure that if we have a parent job that updates the gitea
version and a do not merge child job that induces an artificial failure
for zuul hold purposes that we test the correct image in the child job's
changes.

Prior to this we were testing the existing published images, but
provides + requires will give the correct signaling to make the desired
"test new proposed image" behavior happen in the child change builds.

Change-Id: Ie6b827b650e0f32606dc5ec7f4aa0adfeebdeb5e
2021-03-19 10:33:09 -07:00
Ian Wienand
aa94f2d831 Rename refstack group variables
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.

Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8
2021-03-19 16:01:46 +11:00
Zuul
3bb0573f41 Merge "system-config-run-kerberos: run twice" 2021-03-19 00:07:09 +00:00
James E. Blair
96bac7b486 Add zookeeper-statsd
This adds a program, zookeeper-statsd, which monitors zookeeper
metrics and reports them to statsd.  It also adds a container to
run that program.  And it runs the container on each of the
ZooKeeper quorum members.  And it updates the graphite host to
allow statsd traffic from quorum members.  And it updates the
4-letter-word whitelist to allow the mntr command (which is used
to gather metrics) to be issued.

Change-Id: I298f0b13a05cc615d8496edd4622438507fc5423
2021-03-17 14:52:31 -07:00
Zuul
b2b1a9062d Merge "Add new opendev.org nodepool launchers" 2021-03-17 18:13:07 +00:00
Zuul
77b1c14a9a Merge "Use upstream jitsi-meet web image" 2021-03-17 00:22:50 +00:00
Zuul
4524a92caf Merge "kerberos-kdc: role to manage Kerberos KDC servers" 2021-03-16 22:28:46 +00:00
Clark Boylan
680ed17ecd Add new opendev.org nodepool launchers
This adds the new focal nodepool launchers replacements for nl02-04 to
our inventory. This will configure them with an idle configuration. We
then confirm they are happy running in an idle state then switch over
the config from the old to new servers.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/780982
Change-Id: Iea645925caaeee6f498aa690c4f2c848f6899317
2021-03-16 15:21:58 -07:00
Zuul
a5d0329cf7 Merge "Don't always update gitea project descriptions" 2021-03-16 22:20:55 +00:00
Ian Wienand
bf886a5ab6 system-config-run-kerberos: run twice
Run the playbook twice to ensure the role doesn't change anything

Change-Id: I1c0c45ece37035a18eb9468a5d7f4f34cfec4edc
2021-03-17 08:31:55 +11:00
Ian Wienand
c1aff2ed38 kerberos-kdc: role to manage Kerberos KDC servers
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.

This role automates realm creation, initial setup, key material
distribution and replica host configuration.  None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.

Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.

Change-Id: I60b40897486b29beafc76025790c501b5055313d
2021-03-17 08:30:52 +11:00
Clark Boylan
16a4bdce02 Don't always update gitea project descriptions
There is some correlation that running the manage-projects playbook
gives our gitea fits. The bulk of the work done here is in trying to
update the descriptions of all projects. There isn't a good way to see
if the description is already set first so we just try and ignore
errors. This creates potentially thousands of operations all at once and
could be why things are sad.

We move these operations under the always update flag which is not set
on normal runs. If we really need to converge to a good updated state we
can manually run the playbook/role with always update set.

We also don't set a limit on the number of ThreadPoolExecutor workers
which will default to 5 * NumProcs. Could be that tuning this down would
make gitea happier.

One other thought is that we may not be using request sessions properly
for connection reuse. In particular requests notes that you need to set
stream to False or read request content to return a connection back to
the pool for reuse. We might look into this for further improvements.

Change-Id: I6e6fb1eb08303e9da7e38cf493d1871364340000
2021-03-16 13:06:16 -07:00
Clark Boylan
ed61423b6b Add nl01.opendev.org to our inventory
This is a new focal replacement for nl01.openstack.org. We keep
nl01.openstack.org in our inventory for now because we want ansible to
update the nodepool.yaml configs for these two hosts to coordinate a
hand off of responsibilities once we are happy with the new deployment.

We also switch the testing hostname to nl04.openstack.org as this will
be the last nodepool launcher to be removed. When we swap it out the
testing will be updated to use focal hosts.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/779863
Change-Id: Ib3ea6586fe0567c1edf6255ee9be50164d35db62
2021-03-15 09:48:22 -07:00