We're going to want Mailman 3 served over HTTPS for security
reasons, so start by generating certificates for each of the sites
we have in v2. Also collect the acme.sh logs for verification.
Change-Id: I261ae55c6bc0a414beb473abcb30f9a86c63db85
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Start backing up the new review server. Stop backing up the old
server. Fix the group matching test for the new server.
Change-Id: I8d84b80099d5c4ff7630aca9df312eb388665b86
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
Once we are satisfied that we have disabled the inputs to firehose we
can land this change to stop managing it in config management. Once that
is complete the server can be removed.
Change-Id: I7ebd54f566f8d6f940a921b38139b54a9c4569d8
We duplicate the KDC settings over all our kerberos clients. Add
clients to a "kerberos-client" group and set the variables in a group
file.
Change-Id: I25ed5f8c68065060205dfbb634c6558488003a38
These are new focal replacement servers. Because this is the last set of
replacements for the executors we also cleanup the testing of the old
servers in the system-config-run-zuul job and the inventory group
checker job.
Change-Id: I111d42c9dfd6488ef69ff1a7f76062a73d1f37bf
We have identified an issue with stevedore < 3.3.0 where the
cloud-launcher, running under ansible, makes stevedore hashe a /tmp
path into a entry-point cache file it makes, causing a never-ending
expansion.
This appears to be fixed by [1] which is available in 3.3.0. Ensure
we install this on bridge. For good measure, add a ".disable" file as
we don't really need caches here.
There's currently 491,089 leaked files, so I didn't think it wise to
delete these in a ansible loop as it will probably time out the job.
We can do this manually once we stop creating them :)
[1] d7cfadbb7d
Change-Id: If5773613f953f64941a1d8cc779e893e0b2dd516
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.
This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.
We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.
Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.
Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role. Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.
Then separate out the two into afs-<file|db>-server groups for
consistent naming.
Rename afs-admin for consistent naming.
The service file is updated to reflect the new groups.
Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
With all AFS file-servers upgraded to 1.8, we can move afs01.dfw back
and rename the group to just "afs".
Change-Id: Ib31bde124e01cd07d6ff7eb31679c55728b95222
As described inline, installing ansible from source now installs the
"ansible-core" package, instead of "ansible-base". Since they can't
live together nicely, we have to do a manual override for the devel
job.
Change-Id: I1299ea330e6de048b661fc087f016491758631c7
Backups have been going well on ethercalc02, so add borg backup runs
to all backed-up servers. Port in some additional excludes for Zuul
and slightly modify the /var/ matching.
Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb
This wasn't quite fixed right when these were moved into
project-config. Get the projects and install them.
Change-Id: I0f854609fc9aebffc1fa2a2e14d5231cce9b71d0
Modules are collected on bridge and then synchronized to remote hosts
where puppet is run. This is done to ensure an atomic run of puppet
across affected hosts.
These modules are described in modules.env and cloned by
install_modules.sh. Currently this is done in install-ansible, but
after some recent refactoring
(I3b1cea5a25974f56ea9202e252af7b8420f4adc9) the best home for it
appears to now be in puppet-setup-ansible; just before the script is
run.
Change-Id: I4b1d709d7037e2851d73be4bc7a202f52858ad4f
Allow speculative testing of ansible collections in the -devel test
job by linking in the git checkouts from the dependent change.
Depends-On: https://review.opendev.org/747596
Change-Id: I014701f41fb6870360004aa64990e16e278381ed
The Ansible devel branch has pulled in some major changes that has
broken our -devel testing job.
Firstly, installing from source checkout now installs the package
"ansible-base"; this means when we install ARA, which has a dependency
on just "ansible" it pulls in the old 2.9 release (which is what the
-devel test is currently testing with -- the reason for this change).
We could remove ARA, but we quite like it's reports for the nested
Ansible runs. So make a dummy "ansible" 2.9 package and install that
to satisfy the dependency.
Secondly, Ansible devel has split out a lot of things into "community
modules". To keep testing the -devel branch into the future, we need
to pull in the community modules for testing as well [1].
After some very useful discussion with jborean93 in #ansible I believe
the best way to do this is to clone the community projects into place
in the ansible configuration directory. Longer term, we should make
Zuul check these out and use that, then we can speculatively test
changes too -- but for now just KISS.
[1] For reference, upstream bundles all this into the "Ansible
Community Distribution" or ACD, which is what you will get when you
download "ansible" from PyPi or similar. But this job should be
pulling the bleeding edge of ansible and the community modules we use
-- that's what it's for.
Depends-On: https://review.opendev.org/747337
Change-Id: I781e275acb6af85f816ebcaf57a9825b50ca1196
This deploys graphite from the upstream container.
We override the statsd configuration to have it listen on ipv6.
Similarly we override the ngnix config to listen on ipv6, enable ssl,
forward port 80 to 443, block the /admin page (we don't use it).
For production we will just want to put some cinder storage in
/opt/graphite/storage on the production host and figure out how to
migrate the old stats. The is also a bit of cleanup that will follow,
because we half-converted grafana01.opendev.org -- so everything can't
be in the same group till that is gone.
Testing has been added to push some stats and ensure they are seen.
Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d
So that we don't end up in a position where we find a DISABLE-ANSIBLE
file in place and wonder what it is or how it got there, ask the user
for a comment to place in the file. Append to the file in case it
already exists. Cat the file at the end to show the user all of the
comments in case there was one previously. Include the date for even
more clues.
Change-Id: I9c22f94c5ea93452b2975d4aae3bf7fbd9c736d0
We are currently cloning all of the puppet modules in install-ansible,
but we only need them when we run run-puppet. Move the cloning there
so that we can stop wasting the time in CI jobs that don't need them.
In prod, this should not have much impact.
Change-Id: I641ffc09e9e0801e0bc2469ceec97820ba354160
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.
Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.
A followup patch will move host-specific values into equivilent
files in inventory/base.
This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.
Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
Remove the separate "mirror_opendev" group and rename it to just
"mirror". Update various parts to reflect that change.
We no longer deploy any mirror hosts with puppet, remove the various
configuration files.
Depends-On: https://review.opendev.org/728345
Change-Id: Ia982fe9cb4357447989664f033df976b528aaf84
We have two standalone roles, puppet and cloud-launcher, but we
currently install them with galaxy so depends-on patches don't
work. We also install them every time we run anything, even if
we don't need them for the playbook in question.
Add two roles, one to install a set of ansible roles needed by
the host in question, and the other to encapsulate the sequence
of running puppet, which now includes installing the puppet
role, installing puppet, disabling the puppet agent and then
running puppet.
As a followup, we'll do the same thing with the puppet modules,
so that we arent' cloning and rsyncing ALL of the puppet modules
all the time no matter what.
Change-Id: I69a2e99e869ee39a3da573af421b18ad93056d5b
Zuul is publishing lovely container images, so we should
go ahead and start using them.
We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.
Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.
Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40
Migration plan:
* add zk* to emergency
* copy data files on each node to a safe place for DR backup
* make a json data backup: zk-shell localhost:2181 --run-once 'mirror / json://!tmp!zookeeper-backup.json/'
* manually run a modified playbook to set up the docker infra without starting containers
* rolling restart; for each node:
* stop zk
* split data and log files and move them to new locations
* remove zk packages
* start zk containers
* remove from emergency; land this change.
Change-Id: Ic06c9cf9604402aa8eb4bb79238021c14c5d9563
So that we can start running things from the zuul source rather
thatn update-system-config and /opt/system-config, we need to
install a few things onto the host in install-ansible so that the
ansible env is standalone.
This introduces a split execution path. The ansible config is
now all installed globally onto the machine by install-ansible
and does not reference a git checkout.
For running ad-hoc commands, an ansible.cfg is introduced inside
the root of the system-config dir. So if ansible-playbook is
executed with PWD==/opt/system-config it will find that ansible.cfg,
it will take precedence, and any content from system-config
will take precedence.
As a followup we'll make /opt/system-config/ansible.cfg written
out by install-ansible from the same template, and we'll update
the split to make ansible only work when executed from one of
the two configured locations, so that it's clear where we're
operating from.
Change-Id: I097694244e95751d96e67304aaae53ad19d8b873
The normal callback plugin is unreadable for stdout and stderr things.
Update to use the debug plugin which prints their output nicely in
the way we'd like.
Change-Id: I3a6b31af7d6132a1ee31a280f7f21f3132856273