1670 Commits

Author SHA1 Message Date
Ian Wienand
dc827de23d Add kerberos-client group
We duplicate the KDC settings over all our kerberos clients.  Add
clients to a "kerberos-client" group and set the variables in a group
file.

Change-Id: I25ed5f8c68065060205dfbb634c6558488003a38
2021-03-18 11:59:30 +11:00
Clark Boylan
75a64427a1 Improve meetpad env options for templating
The PUBLIC_URL is quoted which results in quotes ending up in our config
breaking etherpad base url setting in config.js. We remove the quotes as
they are not necessary.

We also remove the /p/ suffix from ETHERPAD_URL_BASE as this causes the
proxying to send extra /p/s to etherpad which results in problems.

Note these fixes appear to be necessary but are not sufficient to have
working meetpad proxying of etherpad. We also need to fix the nginx
meet.conf proxy settings to send valid Host heads. A followup change
will attempt to address that.

Change-Id: I0f59339a33267468ad5481858507a43cefa0021d
2021-03-17 12:47:43 -07:00
Clark Boylan
7b87c7c305 Disable xmpp websocket in jitsi meet config
We unforked our jitsi web container and discovered that etherpad doc
embedding was broken. In the process of debugging this the jitsi meet
services on meetpad were restart which pulled in newer configs which
expect ENABLE_XMPP_WEBSOCKET to be enabled by default. Unfortunately
this wasn't quite working for us. Explicitly disabling this seems to
make audio and video calling work again. But doc sharing isn't even
attempted now.

Let's get this fix in as audio and video are important then we'll keep
debugging the etherpad doc sharing problem.

https://github.com/jitsi/docker-jitsi-meet/issues/902 has details from
others that hit this problem.

Note that part of the issue here seems to be that nginx is using the
default configs in the container found at /default and not the configs
we bind mount at /config. This at least seems to be why the proxying for
etherpad documents is broken.

Change-Id: I03fa9d331e6825b3b953a3573c0dd43c7be478a4
2021-03-17 11:38:56 -07:00
Zuul
77b1c14a9a Merge "Use upstream jitsi-meet web image" 2021-03-17 00:22:50 +00:00
Zuul
4524a92caf Merge "kerberos-kdc: role to manage Kerberos KDC servers" 2021-03-16 22:28:46 +00:00
Zuul
b133afedfd Merge "refstack: cleanup old puppet" 2021-03-16 22:21:03 +00:00
Ian Wienand
c1aff2ed38 kerberos-kdc: role to manage Kerberos KDC servers
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.

This role automates realm creation, initial setup, key material
distribution and replica host configuration.  None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.

Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.

Change-Id: I60b40897486b29beafc76025790c501b5055313d
2021-03-17 08:30:52 +11:00
Ian Wienand
018a14e34f refstack: cleanup old puppet
Remove old puppet configuration for the restack service, which is now
managed by Ansible.

Change-Id: I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3
2021-03-17 07:06:53 +11:00
Clark Boylan
16a4bdce02 Don't always update gitea project descriptions
There is some correlation that running the manage-projects playbook
gives our gitea fits. The bulk of the work done here is in trying to
update the descriptions of all projects. There isn't a good way to see
if the description is already set first so we just try and ignore
errors. This creates potentially thousands of operations all at once and
could be why things are sad.

We move these operations under the always update flag which is not set
on normal runs. If we really need to converge to a good updated state we
can manually run the playbook/role with always update set.

We also don't set a limit on the number of ThreadPoolExecutor workers
which will default to 5 * NumProcs. Could be that tuning this down would
make gitea happier.

One other thought is that we may not be using request sessions properly
for connection reuse. In particular requests notes that you need to set
stream to False or read request content to return a connection back to
the pool for reuse. We might look into this for further improvements.

Change-Id: I6e6fb1eb08303e9da7e38cf493d1871364340000
2021-03-16 13:06:16 -07:00
Zuul
e077281e4e Merge "refstack: fix backup script typo" 2021-03-16 05:43:21 +00:00
Ian Wienand
ea48ffc596 refstack: fix backup script typo
This got copied from another command that also had this typo.

Also, don't bother backing up the on-disk backups, as we backup
directly via the stream dumps.

Change-Id: Ie200a29eec2b1a0725a8872ab548bcb0f26980e6
2021-03-16 15:12:41 +11:00
Zuul
bc94f97de2 Merge "Enable srvr, stat and dump commands in the zk cluster" 2021-03-16 04:10:11 +00:00
Zuul
70079c5771 Merge "gitea-git-repos: update deprecated API path" 2021-03-16 04:00:30 +00:00
Clark Boylan
3f2dd0e681 Enable srvr, stat and dump commands in the zk cluster
Zookeeper supports a number of "4 letter" commands [0] which are useful
for debugging and general diagnostics. By default only srvr is enabled,
but we want to add stat and dump to see details on server and client
connection statuses.

We do this via the 4lw.commands.whitelist configuration option [1] and
not the docker image env vars because we're mounting a zoo.cfg in
already.

[0] https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw
[1] https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_clusterOptions

Change-Id: I24ea9b37cd5766c9d393106e8eab34623cad1624
2021-03-15 16:57:21 -07:00
Ian Wienand
753f9520e6 refstack: add backup
We should be backing up the user-generated refstack data

Change-Id: I1bd5f0de283a4436967dcae6da9c5d9cd055697c
2021-03-12 15:18:04 +11:00
Ian Wienand
d33ce951c0 refstack: use CNAME for production server
The production server is trying to send itself to
refstack01.openstack.org, causing cross-site scripting issues.  In
production, use the CNAME, but use the FQDN for testing.

Fix up job file matchers while here.

Change-Id: I18a5067ee25c59c5eaa17b7c2d9bd5a942a9173d
2021-03-12 10:24:06 +11:00
Zuul
d8cfde1e22 Merge "refstack: Edit URL of public RefStackAPI" 2021-03-11 03:43:17 +00:00
Martin Kopec
834e39fc7e refstack: Edit URL of public RefStackAPI
The previous refstack server had 'api' in the endpoint
addresses of API calls. Let's try to set it in the new
instance as well to keep the same interface.

Also, fix the typo in the testinfra host match and in
the test name.

Change-Id: I7319990144396b3a753678975a09b0add3ac4465
2021-03-10 14:09:20 +11:00
James E. Blair
b768325480 Use upstream jitsi-meet web image
This has our change to open etherpad on join, so we should no longer need
to run a fork of the web server.  Switch to the upstream container image
and stop building our own.

Change-Id: I3e8da211c78b6486a3dcbd362ae7eb03cc9f5a48
2021-03-09 12:35:46 -08:00
Zuul
2a0ea75fb7 Merge "install-ansible: ensure stevedore" 2021-03-09 02:52:10 +00:00
Clark Boylan
a2fd912511 Replace ze09-12.openstack.org with ze09-12.opendev.org
These are new focal replacement servers. Because this is the last set of
replacements for the executors we also cleanup the testing of the old
servers in the system-config-run-zuul job and the inventory group
checker job.

Change-Id: I111d42c9dfd6488ef69ff1a7f76062a73d1f37bf
2021-03-08 10:13:29 -08:00
Daniel Pawlik
97942432c5 Change get-pip url
The path for get-pip.py script in version 3.5 has been changed
with this commit [1].

[1] 2360f025eb

Change-Id: Ie13a6597c23c0a376f9feba2aed664e1129c5b60
2021-03-08 15:03:43 +01:00
Zuul
8998ee96b2 Merge "Update zuul-executor shutdown handling" 2021-03-04 21:48:20 +00:00
Ian Wienand
a12d2fce2b install-ansible: ensure stevedore
We have identified an issue with stevedore < 3.3.0 where the
cloud-launcher, running under ansible, makes stevedore hashe a /tmp
path into a entry-point cache file it makes, causing a never-ending
expansion.

This appears to be fixed by [1] which is available in 3.3.0.  Ensure
we install this on bridge.  For good measure, add a ".disable" file as
we don't really need caches here.

There's currently 491,089 leaked files, so I didn't think it wise to
delete these in a ansible loop as it will probably time out the job.
We can do this manually once we stop creating them :)

[1] d7cfadbb7d

Change-Id: If5773613f953f64941a1d8cc779e893e0b2dd516
2021-03-04 08:29:01 +11:00
Clark Boylan
a42c0b704a Remove ze01.openstack.org
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.

This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.

We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.

Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
2021-03-02 10:21:59 -08:00
Ian Wienand
3f1d67b99f Add afsdb03 openstack.org
We are in the process of upgrading the AFS servers to focal.  As
explained by auristor (extracted from IRC below) we need 3 servers to
actually perform HA with the ubik protocol:

 the ubik quorum is defined by the list of voting primary ip addresses
 as specified in the ubik service's CellServDB file.  The server with
 the lowest ip address gets 1.5 votes and the others 1 vote.  To win
 election requires greater than 50% of the votes.  In a two server
 configuration there are a total of 2.5 votes to cast.  1.5 > 2.5/2 so
 afsdb02.openstack.org always wins regardless of what
 afsdb01.openstack.org says.  And afsb01.openstack.org can never win
 because 1 < 2.5/2.  by adding a third ubik server to the quorum, the
 total votes cast are 3.5 and it always requires the vote of two
 servers to elect a winner ...  if afsdb03 is added with the highest
 ip address, then either afsdb01 or afsdb02 can be elected

Add a third server which is a focal host and related configuration.

Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2
2021-03-01 15:51:49 +11:00
Clark Boylan
85d923b74e Update zuul-executor shutdown handling
We update the docker-compose config for zuul-executor to better handle
its shutdown handling. In particular we want to support zuul-executor
graceful which will pause the server then exit with rc 0 when all builds
complete. To do this we switch restart: always to restart: on-failure.
With the always setting docker simply restarts zuul-executor after a
graceful stop.

We also remove the stop signal of SIGHUP with its long timeout. Zuul
executor does not seem to catch SIGHUP for anything anymore so this is
there for old behavior and can be cleaned up.

Change-Id: I5211b91025ce5a13648f3648db3b42d357ecd590
2021-02-26 08:12:30 -08:00
Clark Boylan
2a0508aa08 Add ze01.opendev.org
This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.

Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
2021-02-25 08:53:40 -08:00
Ian Wienand
f8ca888b2b install-docker: remove fix from prior change
This file is now removed (I0cbcd4694a4796573fe48383756be03597d2da0f);
get rid of this to avoid any confusion.

Change-Id: I837d1fccbfa2461eb1315eac54c2a017fcb86511
2021-02-25 09:19:02 +11:00
Ian Wienand
3303199ba6 install-docker: move rsyslog handler earlier
This syslog configuration is what sends any logs with a program-name
of "docker-<foo>" to /var/log/containers/foo.log.  However, at 98-
level the rules are after the default 50- rules, so we're seeing the
logs copied to both syslog and /var/log/containers.  Since this
contains a "stop" command, we should move this earlier before the
default rules and the docker logs will not be duplicated.

Change-Id: I0cbcd4694a4796573fe48383756be03597d2da0f
2021-02-25 09:16:16 +11:00
Zuul
d1ac0aee2d Merge "etherpad: fix robots.txt" 2021-02-24 00:02:04 +00:00
Zuul
89d73e42f7 Merge "gitea: fix db backup script" 2021-02-23 07:23:01 +00:00
Zuul
70467d8a82 Merge "Stop using mysqlclient ssl flag" 2021-02-23 05:00:42 +00:00
Zuul
6b88e37a50 Merge "service-borg-backup: preload backup server facts" 2021-02-23 03:21:07 +00:00
Ian Wienand
08dba9d026 service-borg-backup: preload backup server facts
As described inline, ensure that minimal facts for the backup servers'
are loaded before running the backup roles on hosts, so they can read
the ansible_ssh_host_key_ed25519_public fact for each backup server
and ensure it is accepted.

Update the other comments slightly as well.

Change-Id: I1f207ca0770d58f61a89f9ade0bd26cebc982c62
2021-02-23 13:04:20 +11:00
Ian Wienand
029dfb55a8 gitea: fix db backup script
I introduced this typo with I500062c1c52c74a567621df9aaa716de804ffae7.
Luckily Ibb63f19817782c25a5929781b0f6342fe4c82cf0 has alerted us to
this problem.

Change-Id: I02bf2f4fa1041642a719100e9591bf5cd1a0bf49
2021-02-23 02:00:20 +00:00
Zuul
4d85fc521a Merge "Use dstat to record performance of system-config-run hosts" 2021-02-23 00:13:59 +00:00
Zuul
1b2435c349 Merge "backups: remove all bup" 2021-02-21 22:41:41 +00:00
James E. Blair
b6cbb52447 Add pull tasks for nodepool/zuul
So we can stop/pull/start, move the pull tasks to their own files
and add a playbook that invokes them.

Change-Id: I4f351c1d28e5e4606e0a778e545a3a805525ac71
2021-02-19 15:42:40 -08:00
Zuul
464bb363e9 Merge "grafana: update to 7.4.2" 2021-02-19 06:03:21 +00:00
Zuul
9db55a55f3 Merge "borg-backup: send explicit email on backup failure" 2021-02-19 05:20:01 +00:00
Ian Wienand
7577439ff8 grafana: update to 7.4.2
This includes a fix for I216528a76307189d8d87bd2fcfeff95c6ceb53cc.
Now it's released we can be a bit more explicit about why we added the
workaround.

Change-Id: Ibaf1850549b5e7ec3622418b650bc5e59a289ab6
2021-02-19 09:54:31 +11:00
Ian Wienand
5a1b8ac179 grafana: take some screenshots during testing
Take some simple screenshots for basic validation of any new releases.

Change-Id: I52770032a6cc91d76da23194f58474f5ceeaed38
2021-02-17 10:43:26 +11:00
Clark Boylan
1560b01f7e Use dstat to record performance of system-config-run hosts
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.

Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
2021-02-16 14:31:30 -08:00
Ian Wienand
39ffc685d6 backups: remove all bup
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.

For reference, the servers being backed up at this time are:

 borg-ask01
 borg-ethercalc02
 borg-etherpad01
 borg-gitea01
 borg-lists
 borg-review-dev01
 borg-review01
 borg-storyboard01
 borg-translate01
 borg-wiki-update-test
 borg-zuul01

This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.

For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.

Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
2021-02-16 16:00:28 +11:00
Zuul
8360a7ceab Merge "Run gerrit 3.2 and 3.3 functional tests" 2021-02-16 04:27:46 +00:00
Ian Wienand
5ca69113fd borg-backup: send explicit email on backup failure
This sets a global BORG_UNDER_CRON=1 environment variable for
production hosts and makes the borg-backup script send an email if any
part of the backup job appears to fail (this avoids spamming ourselves
if we're testing backups, etc).

We should ideally never get this email, but if we do it's something we
want to investigate quickly.  There's nothing worse than thinking
backups are working when they aren't.

Change-Id: Ibb63f19817782c25a5929781b0f6342fe4c82cf0
2021-02-16 14:49:38 +11:00
Zuul
94fe3610e5 Merge "borg-backup-server: make sure to append verification logs" 2021-02-16 03:14:30 +00:00
Ian Wienand
c7de005738 grafana: ensure snapshots api returns a 403
Change-Id: I216528a76307189d8d87bd2fcfeff95c6ceb53cc
2021-02-15 17:01:15 +11:00
Ian Wienand
ece90fb7f7 borg-backup-server: make sure to append verification logs
We don't want to overwrite every run, but rather append to the log
file.

Change-Id: I304caedecbf6a9552f314636ca82a543ef16a8b6
2021-02-15 14:45:03 +11:00