Previously we had set up the test gerrit instance to use the same
hostname as production: review02.opendev.org. This causes some confusion
as we have to override settings specifically for testing like a reduced
heap size, but then also copy settings from the prod host vars as we
override the host vars entirely. Using a new hostname allows us to use a
different set of host vars with unique values reducing confusion.
Change-Id: I4b95bbe1bde29228164a66f2d3b648062423e294
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
The default channel name in the ptgbot role defaults did not
correctly specify a starting hash which it requires, but also the
test jobs seem to need it set in the eavesdrop group vars specific
to testing.
Change-Id: I16cdeac4f7af50e2cac36c80d78f3a87f482e4aa
This shifts our Gerrit upgrade testing ahead to testing 3.3 to 3.4
upgrades as we have upgraded to 3.3 at this point.
Change-Id: Ibb45113dd50f294a2692c65f19f63f83c96a3c11
This bumps the gerrit image up to our 3.3 image. Followup changes will
shift upgrade testing to test 3.3 to 3.4 upgrades, clean up no longer
needed 3.2 images, and start building 3.4 images.
Change-Id: Id0f544846946d4c50737a54ceb909a0a686a594e
Currently we connect to the LE staging environment with acme.sh during
CI to get the DNS-01 tokens (but we never follow-through and actually
generate the certificate, as we have nowhere to publish the tokens).
We've known for a while that LE staging isn't really meant to be used
by CI like this, and recent instability has made the issue pronounced.
This modifies the driver script to generate fake tokens which work to
ensure all the DNS processing, etc. is happening correctly.
I have put this behind a flag so the letsencrypt job still does this
however. I think it is worth this job actually calling acme.sh to
validate this path; this shouldn't be required too often.
Change-Id: I7c0b471a0661aa311aaa861fd2a0d47b07e45a72
Instead of using the opendev.org/... logo file, host a copy from
gerrit's static location and use that. This isolates us from changes
to the way gitea serves its static assets.
Change-Id: I8ffb47e636a59e5ecc3919cc7a16d93de3eae08d
Copy static files directly into the container image instead of
managing them dynamically with Ansible.
Change-Id: I0ebe40ad2a97e87b00137af7c93a3ffa84929a2e
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers. To
reduce complexity and avoid traps in the future, make it non-optional.
Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.
Change-Id: I364037232cf0e6f3fa150f4dbb736ef27d1be3f8
We are now using the mariadb jdbc connector in production and no longer
need to include the mysql legacy connector in our images. We also don't
need support for h2 or mysql as testing and prod are all using the
mariadb connector and local database.
Note this is a separate change to ensure everything is happy with the
mariadb connector before we remove the fallback mysql connector from our
images.
Change-Id: I982d3c3c026a5351bff567ce7fbb32798718ec1b
This tests that we can rename both the project and the org the project
lives in. Should just add a bit more robustness to our testing.
Change-Id: I0914e864c787b1dba175e0fabf6ab2648a554d16
Previously we were only managing root's known_hosts via ansible but even
then this wasn't happening because the gerrit_self_hostkey var wasn't
set anywhere. On top of that we need to manage multiple known_hosts
because gerrit must recognize itself and all of the gitea servers.
Update the code to take a dict of host key values and add each entry to
known_hosts for both the root and gerrit2 user.
We remove keyscans from tests to ensure that this update is actually
working.
Change-Id: If64c34322f64c1fb63bf2ebdcc04355fff6ebba2
Thin runs the new matrix-eavesdrop bot on the eavesdrop server.
It will write logs out to the limnoria logs directory, which is mounted
inside the container.
Change-Id: I867eec692f63099b295a37a028ee096c24109a2e
It would be useful to test our rename playbook against gitea and gerrit
when we make changes to these related playbooks, roles, and docker
images. To do this we need to converge our test and production setups
for gerrit a bit more. We create an openstack-project-creator account in
the test gerrit to match prod and we have rename_repos.yaml talk to
localhost for gerrit ssh commands.
With that done we can run the rename_repos.yaml playbook from
test-gitea.yaml and test-gerrit.yaml to help ensure the playbook
functions as expected against these services.
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
Change-Id: I49ffaf86828e87705da303f40ad4a86be030c709
The extant variable name is never set so this never writes anything
out. Move it to a dictionary value. Use stub values for testing,
this way we don't need the "when:".
Additionally remove an unused old template file.
Change-Id: Id96fde79e28f309aa13e16bdda29f004c3c69c4b
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
The paste service needs an upgrade; since others have created a
lodgeit container it seems worth us keeping the service going if only
to maintain the historical corpus of pastes.
This adds the ansible to deploy lodgeit and a sibling mariadb
container. I have imported a dump of the old data as a test. The
dump is ~4gb and imported it takes up about double that; certainly
nothing we need to be too concerned over. The server will be more
than capable of running the db container alongside the lodgeit
instance.
This should have no effect on production until we decide to switch
DNS.
Change-Id: I284864217aa49d664ddc3ebdc800383b2d7e00e3
This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database. This is inspired by a few things
- since migration to NoteDB, there is only one table left where
Gerrit records what files have been reviewed for a change. This
logically scales with the number of reviews users are doing.
Pulling the stats on this, we can see since the NoteDB upgrade this
went from a very busy database (~300 queries/70 commits per second)
to barely registering one hit per second :
https://imgur.com/a/QGJV7Fw
Thus separating the db to an external host for performance reasons
is not a large concern any more.
- emperically we've done a bad job in keeping the existing hosted db
up-to-date; it's still running mysql 5.1 and we have been hit by
bugs such as the one referenced in-line which silently drops
backups.
- The other gerrit option is to use an on-disk H2 database. This is
certainly an option, however you need special tools to interact
with it for migration, etc. and it's not safe to backup from files
on disk (as opposed to mysqldump). Upstream advice is unclear, and
varies between H2 being a performance bottleneck to this being
ephemeral data that users don't care about. We know how to admin
mariadb/mysql and this allows us to migrate and backup data, so
seems like the best choice.
- we have a pressing need to update the server to a new operating
system. Running the db alongside the gerrit instance minimises
fiddling we have to do manging connections to and migrating the
hosted db systems.
- related to that, we are tending towards more provider independence
for control-plane servers. A hosted database product is not always
provided, so this gives us more flexibility in moving things
around.
- the main concern here is memory usage. "docker stats" reports a
quiescent container, freshly started on a 8GB host:
gerrit-compose_mariadb_1 67.32MiB
After loading a copy of the production table, and then dumping it
back to a file the same container reports:
gerrit-compose_mariadb_1 462.6MiB
The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).
Backups of the local container need different dump commands; backups
are relocated to a new file and updated.
Testing is converted to use this rather than a local H2 database.
[1] https://docs.docker.com/compose/startup-order/
Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
Currently when we run tests, this connects to OFTC and tries to use
the opendevstatus nick as it is the default. Replace this with a
random username. Also override the channels list, so it only joins
Limnoria was already using a non-conflicting name, but switch it to a
random one for consistency and possible parallel running. This also
already only joins #opendev-sandbox.
Change-Id: I860b0f1ed4f99140dda0f4d41025f0b5fb844115
This installs statusbot on eavesdrop01.opendev.org.
Otherwise it's just config translation and bringing up the daemon.
Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
This enables the new eavesdrop01.opendev.org server in all current
channels. Puppet has been disabled on the old server and we will
manually stop supybot/meetbot and mirgrate logs before this applies.
Change-Id: I4a422bb9589c8a8761191313a656f8377e93422f
The ara-report role used to add this but it hasn't been updated for
the latest ARA (I008b35562994f1205a4f66e53f93b9885a6b8754). Add it
back here.
Change-Id: I2d56e7cde32cd7adabb359a35ecdaa9f0880f7d5
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
We're trying to phase out the ELK systems. While we have agreed to not
immediately turn anything off we probably don't need to keep running the
system-config-legacy-logstash-filters job as ELK should remain fairly
fixed unless someone rewrites config management for it and modernizes
it. And if that happens they will want new modern testing too.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/792710
Change-Id: I9ac6f12ec3245e3c1be0471d5ed17caec976334f