We are using synchronize to copy the openstack mailman templates which
preserved the ownership and group and permissions of the source files on
bridge. This isn't a major problem but it is ugly so we fix it.
To fix it we set rsync_opts for synchronize to set a usermap and a
groupmap to map the bridge info to the data we want on the remote.
Change-Id: I209345cbe9e27beb18d1ba31e6715bf850bc022b
The usptream haproxy image switched to running as a user, rather than
as root. This means it can not bind to 80/443 and instantly dies.
I've added a comment with some discussion, but for now, use the root
user.
[1] 82ff028a25
Change-Id: Ic9b04cdd09f73d9df015bcb173871cff1ae58835
The haproxy 2.4 images aren't working for us, docker-compose
perpetually reports the container in a "restarting" state. Pin back
from latest to 2.3 until we can sort out what needs to change in how
we integrate this on the server.
Change-Id: I01ae11a31eb8eaeb9e570692d5ec268395f69a97
This removes the kata-containers tenant backup entry as that tenant no
longer exists. We also add status json backups for the opendev,
vexxhost, zuul, pyca, and pypa tenants. This gets us in sync with the
current tenant list.
Change-Id: I8527676dda67915e6ebe0d1c5fde7a57a7ac2e5b
This fixes the zuul debug log's logrotate filename. We also increase the
rotation count to 30 daily logs for all zuul scheduler zuul processes
(this matches the old server).
We also create a /var/lib/zuul/backup dir so that status.json backups
have a location they can write to. We do this in the base zuul role
which means all zuul servers will get this dir. It doesn't currently
conflict with any of the cluster members' /var/lib/zuul contents so
should be fine.
Change-Id: I4709e3c7e542781a65ae24c1f05a32444026fd26
This cleans up zuul01 as it should no longer be used at this point. We
also make the inventory groups a bit more clear that all zuul servers
are under the opendev.org domain now.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/790483
Change-Id: I7885fe60028fbd87688f3ae920a24bce4d1a3acd
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.
I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
Several of these domains have migrated to be deployed via our
letsencrypt roles and thus no-longer need special casing in the
certcheck list as they are automatically added now.
Change-Id: Id417db6af09f3ba96bb6da09d8cbf28dd8ddf276
This will run the ua tool to attach an UA token and to enable the
esm-infra repos. We also update unattended upgrades to auto pull
security updates from the ESM repos.
Change-Id: Ifb487d12df7b799d5fd2973d56741e0757bc4d4f
With a pure javascript plugin, dropping a new file in the plugins/
directory and reloading the page is sufficient to see changes.
However, with .jar plugins (as zuul-summary-plugin now is) you need to
actually issue a reload, which requires the included permissions.
Enable it dev mode, which is where you'll very likely be trying to
iterate development with a change to a plugin. I don't think it's
really that dangerous for production, but traditionally it's been off
there so let's leave it like that.
While we're here, write out a little script to help you quickly deploy
a new .jar of the plugin when we're testing.
Change-Id: I57fa18755f8a8168da12c48f1f38d272da1c6599
The Limesurvey service hosted at survey.openstack.org was a beta
which saw limited use. The platform it runs on, Xenial, is now EOL
from Ubuntu/Canonical and in order to upgrade to a newer
distribution release we would need to rewrite all the configuration
management (the version of Puppet supported by newer Ubuntu is not
backward-compatible with what we've been running).
If a similar service becomes interesting to users of our
collaboratory in the future, it will need to be reintroduced with
freshly written configuration management anyway. The old configs and
documentation remain in our Git history should anyone wish to use
them as inspiration.
Change-Id: I59b419cf112d32f20084ab93eb6f2417a7f93fdb
We were using a loop index which meant for our cluster size of three we
would always assign server.1 through server.3. Unfortunately, as we
replace servers we may add notes with a myid value >3 which breaks when
we try to assign serverids in this way.
Fix it by using the calculation for myid in the peer listing.
Change-Id: Icf770c75cf3a84420116f47ad691d9f06191fb65
For reasons explained in [1] Debian's lsb_release.py on bullseye is
falling back to probing "apt-cache policy"
When (as currently), stretch is the testing release,
/etc/debian_version contains "stretch/sid", as shipped by
base-files. It is therefore impossible to rely on that file to
differentiate between a host running testing or unstable without
asking apt what is actually preferred when installing packages
(through parsing `apt-cache policy`). That's how `lsb-release --
codename` returns "sid" _xor_ "stretch".
The problem is, this parses the output of "apt-cache policy" which
fails for two reasons; firsly we have cleared out all the cache files,
so our hosts return anything until "apt-get update" is run, but
secondly because our mirrors do not have a "label" that matches in
this code at [2]
e.g. what we get out of "apt-cache policy" is
500 https://mirror.dfw.rax.opendev.org/debian bullseye/main amd64 Packages
release o=Debian,n=bullseye,c=main,b=amd64
origin mirror.dfw.rax.opendev.org
which is missing a "l=" field to make this parsing recognise it as a
valid source.
The label is set by reprepro [3]
Label
This optional field is simply copied into the Release files.
Add a label to make our mirrors look more like regular mirrors.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845651
[2] https://sources.debian.org/src/lsb/11.1.0/lsb_release.py/#L191
[3] https://manpages.debian.org/stretch/reprepro/reprepro.1.en.html
Change-Id: Id705acbb3a01f43ae635a24fa3c24d0a05bdaa16
We are doing this so that we can cleanup the private network + floating
IP setup that the existing mirror does. Once this new mirror is up and
happy we can cname to it and then clean up the old mirror and its
networking config. We do this in order to save an IP that the current
private network router is consuming.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/787628
Change-Id: I50c311087c6c28726e36913c7e081f3b3d0ee049
We have limited ipv4 address space in this cloud. Currently we can do
about 6 IP addresses for test nodes after we account for network
infrastructure and the mirror. By switching these instances to using the
external network directly we can clean up some of the neutron network
infrastructure which we think may free up 2 more IP addresses. That
should get us our originally intended max-servers of 8.
Change-Id: I705ff082ff06ae1c97f4c229a22893e6d87d206d
This adds the new inmotion cloud to clouds.yaml files and the cloud
launcher config. This cloud is running on an openstack as a service
platform so we have quite a bit of freedom to make changes here within
the resource limitations if necessary.
Change-Id: I2aed6dffde4a1d6e3044c4bd8df4ca60065ae1ea
The current loop here uses the ansible_host value of the ZK servers,
which we have set to the IPv4 address in the inventory.
nb03 is constantly dropping out of ZK; for the record the logs record:
2021-04-21 05:56:15,151 WARNING kazoo.client: Connection dropped: socket connection error: Connection reset by peer
2021-04-21 05:56:15,151 WARNING kazoo.client: Transition to CONNECTING
2021-04-21 05:56:15,151 INFO kazoo.client: Zookeeper connection lost
2021-04-21 05:56:15,152 INFO kazoo.client: Connecting to 23.253.90.246(23.253.90.246):2281, use_ssl: True
2021-04-21 05:56:15,176 INFO kazoo.client: Zookeeper connection established, state: CONNECTED
and this happens every few minutes. This cloud does IPv4 behind a NAT
and it seems very likely this is related.
So the primary motivation here is to see if using IPv6 clears this up,
giving us some datapoints. However I think that our other nodepool
hosts should all be fine to use ZK over IPv6. However, I think in the
gate we may have cases where hosts don't have IPv6 addresses, so this
looks for the v6 address and if not found, falls back to the current
ansible_host behaviour.
Change-Id: Ifde86ddd632662f36bcbe2a0dc99660f06b01ac3
Upstream change has merged that makes a REST endpoint that
enables/disables the Zuul Summary tab on a per-project basis in
results. It defaults to enabled.
This happens via a .jar which is now copied in during the build.
Change-Id: If50f0fa3c5fb116bd0a5a78694de1e7067aa7f11
Depends-On: https://gerrit-review.googlesource.com/c/plugins/zuul-results-summary/+/298465/
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
The public5 network has the most IP addresses available and is
recommended for use.
This cloud also has fixed public IP's, not floating
Change-Id: I7ae1bb0081d3a86149225c3400b53a9561ccffe6
Add a variable to configure upload-workers for nodepool-builder
daemons.
Reduce our defaults for nb03 to see if we can get more reliable
uploads.
Change-Id: I819bdd262c7118cbde4e6ffdc12aa3ac64569a96