diff --git a/doc/source/afs.rst b/doc/source/afs.rst index abf6f2923e..bf8e726227 100644 --- a/doc/source/afs.rst +++ b/doc/source/afs.rst @@ -425,3 +425,75 @@ place for Apache on these hosts. This avoids management overheads of a completely new service deployment such as Squid or a caching docker registry daemon. +No Outage Server Maintenance +---------------------------- + +afsdb0X.openstack.org +~~~~~~~~~~~~~~~~~~~~~ + +We have redundant AFS DB servers. You can take one down without causing +a service outage as long as the other remains up. To do this safely:: + + root@afsdb01:~# bos shutdown afsdb01.openstack.org -wait -localauth + root@afsdb01:~# bos status afsdb01.openstack.org -localauth + Instance ptserver, temporarily disabled, currently shutdown. + Instance vlserver, temporarily disabled, currently shutdown. + +Then perform your maintenance on afsdb01. When done a reboot will +automatically restart the bos service or you can manually restart +the openafs-fileserver service:: + + root@afsdb01:~# service openafs-fileserver start + +Finally check that the service is back up and running:: + + root@afsdb01:~# bos status afsdb01.openstack.org -localauth + Instance ptserver, currently running normally. + Instance vlserver, currently running normally. + +Now you can repeat the process against afsdb02. + +afs0X.openstack.org +~~~~~~~~~~~~~~~~~~~ + +Taking down the actual fileservers is slightly more complicated +but works similarly. Basically what we need to do is make sure that +either no one needs the RW volumes hosted by a fileserver before +taking it down or move the RW volume to another fileserver. + +To ensure nothing needs the RW volumes you can hold the various +file locks on hosts that publish to AFS and/or remove cron entries +that perform vos releases or volume writes. + +If instead you need to move the RW volume first step is checking +where the volumes live:: + + root@afsdb01:~# vos listvldb -localauth + VLDB entries for all servers + + mirror + RWrite: 536870934 ROnly: 536870935 + number of sites -> 3 + server afs01.dfw.openstack.org partition /vicepa RW Site + server afs01.dfw.openstack.org partition /vicepa RO Site + server afs01.ord.openstack.org partition /vicepa RO Site + +We see that if we want to allow write to the mirror volume and take +down afs01.dfw.openstack.org we will have to move the volume to one +of the other servers:: + + root@afsdb01:~# screen # use screen as this may take quite some time. + root@afsdb01:~# vos move -id mirror -toserver afs01.ord.openstack.org -topartition vicepa -fromserver afs01.dfw.openstack.org -frompartition vicepa -localauth + +When that is done (use listvldb command above to check) it is now safe +to take down afs01.dfw.openstack.org while having writers to the mirror +volume. We use the same process as for the db server:: + + root@afsdb01:~# bos shutdown afs01.dfw.openstack.org -localauth + root@afsdb01:~# bos status afsdb01.dfw.openstack.org -localauth + Auxiliary status is: file server shut down. + +Perform maintenance, then restart as above and check the status again:: + + root@afsdb01:~# bos status afsdb01.dfw.openstack.org -localauth + Auxiliary status is: file server running.