This was missed during recent updates; this UserList needs to be on
all servers to allow bos, vos and backup commands.
Update the documentation to reflect the centralised copy.
Change-Id: I8ada3d5035bb7ef77b19ce6aaffb48335974a124
We are in the process of upgrading the AFS servers to focal. As
explained by auristor (extracted from IRC below) we need 3 servers to
actually perform HA with the ubik protocol:
the ubik quorum is defined by the list of voting primary ip addresses
as specified in the ubik service's CellServDB file. The server with
the lowest ip address gets 1.5 votes and the others 1 vote. To win
election requires greater than 50% of the votes. In a two server
configuration there are a total of 2.5 votes to cast. 1.5 > 2.5/2 so
afsdb02.openstack.org always wins regardless of what
afsdb01.openstack.org says. And afsb01.openstack.org can never win
because 1 < 2.5/2. by adding a third ubik server to the quorum, the
total votes cast are 3.5 and it always requires the vote of two
servers to elect a winner ... if afsdb03 is added with the highest
ip address, then either afsdb01 or afsdb02 can be elected
Add a third server which is a focal host and related configuration.
Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2
This is a new Focal based host, which we want for it's more recent
rsync which hopefully causes less issues resyncing things to AFS
volumes.
See 4918594aa472010a8a112f5f4ed0a471a3351a91 for discussion of the
original issues; we have found that without "-t" all new data seems to
be copied continuously. Empirical testing shows later rsync doesn't
have this issue.
Depends-On: https://review.opendev.org/736859
Change-Id: Iebfffdf8aea6f123e36f264c87d6775771ce2dd8
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.
Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.
Rename zuulcd to zuul
To better align prod and test, name the zuul user zuul.
Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
This script helps restart the AFS servers, which is useful when
updating parameters. It can also enable audit logging.
It can also stop and start the servers, although it's unlikely we'd
want all the servers offline at the same time so stopping has a
warning included.
Documentation is updated to refer to the helper script
Change-Id: Idcb3e43a3f6e614cdb787d4334e692a98bffdd15
As documented in [1]
If the number next to "GotSomeSpaces" or any of the "GSS*" fields is
greater than 0, then the fileserver ran out of callback space and had
to prematurely revoke callback promises from clients in order to free
up space.
Here's our stats on afs01:
$ xstat_fs_test localhost -collID 3 -onceonly
Starting up the xstat_fs service, no debugging, one-shot operation
------------------------------------------------------------
13547865 DeleteFiles
1849223729 DeleteCallBacks
45049055 BreakCallBacks
2098382037 AddCallBack
174 GotSomeSpaces
7800 DeleteAllCallBacks
20778 nFEs
21184 nCBs
1500000 nblks
43425561 CBsTimedOut
0 nbreakers
8 GSS1
4 GSS2
5 GSS3
169 GSS4
4 GSS5
So as noted, the server ran out of callback spaces a few times.
Raising it takes only a little memory, but will help performance.
Thanks to Jeffrey Altman (auristor) for pointing this out.
[1] https://www.openafs.org/pages/newsletter/newsletter-2013-03-volume004-issue05.html
Change-Id: I2ad33dd8918cb559634d2c5b8c4e4e7f2d6d4051
In sphinx, we have a :cgit_file: directive that makes links to files.
Thing is - we're not using cgit anymore. So just rename it to git_file.
Change-Id: I80aca5fb3cc84281e29843944fea33e6f4d9fe6f
There's a lot of these, so doing them in chunks. This fixes
the custom roles.
Remove the git and jjb docs, since we don't use them anymore.
Change-Id: I0c5b74f7b73315dac93bce6be0d920cddb94fb58
Add info on how to kinit and aklog if not using Debuntu deb.conf to set
the correct realm and cell settings.
Change-Id: I80a698649f03863b73399873cf190fda4fa41776
This ate a good chunk of my day before a more AFS-savvy colleague
pointed out that a mountpoint within a volume is just a special kind
of file record and so needed the parent volume released before it
would appear in the read-only path.
Change-Id: Ic3d717d70c8bf2548447550472a52849dd85ffd3
The production directory is a relic from the puppet environment concept,
which we do not use. Remove it.
The puppet apply tests run puppet locally, where the production
environment is still needed, so don't update the paths in the
tools/prep-apply.sh.
Depends-On: https://review.openstack.org/592946
Change-Id: I82572cc616e3c994eab38b0de8c3c72cb5ec5413
This modernises the openstack-infra documentation by switching to
openstackdocstheme. Update dependencies as required.
To remove non-relevant stuff from conf.py, I have just taken the demo
file from openstackdocstheme and lightly modified it.
It seems later sphinx has included it's own ":file:" role which now
conflicts. Change it it ":cgit_file:" in our documentation. Remove
the custom header template which no longer applies. Add the
post-2.0-pbr sphinx-based warning-as-error, which fixes the original
problem that I actually noticed that errors could slip through the
gate tests :)
Change-Id: Ic7bec57b971bb4c75fc839e7269d1f69a576b85c
A simple walkthrough of using an AFS superuser to perform write
operations under an AFS read-write path, including authenticating
and unauthenticating.
Change-Id: If27376745b43f94f27f104bca9309035d265ee72
Jeffrey Altman has pointed out that our settings are not optimal for
our use cases. Turning up threads and callbacks is a start. We
should evaluate the other settings too.
Add notes on how to apply settings manually
Change-Id: I1405b21f97c1ac2d3bd99ffbba18e5fd0ff959b1
During manual runs, you want to do this without a timeout. Add a
flag; I always end up copying the script and manually editing this
stuff in. Add a quick note in the AFS docs.
Change-Id: I239bc1a0b5928673b42cc67291bb519d5f5d2471
This includes some basic info on the new mirror host reverse proxy
caches for resources that aren't simple/easy/practical for proper
mirroring.
Change-Id: If71fa6bf1769ef82ab3a4d2c8a5e78005fc6d7e5
Location of our Puppet modules has changed now that they are split
from system-config, update documentation accordingly.
Change-Id: I4d4adc5d41f50dd92fbd642ac30f95c327a416b2