From 3f1d67b99fd3cbea7e7186f62e60ebf4b4c27cfb Mon Sep 17 00:00:00 2001 From: Ian Wienand Date: Mon, 1 Mar 2021 15:51:49 +1100 Subject: [PATCH] Add afsdb03 openstack.org We are in the process of upgrading the AFS servers to focal. As explained by auristor (extracted from IRC below) we need 3 servers to actually perform HA with the ubik protocol: the ubik quorum is defined by the list of voting primary ip addresses as specified in the ubik service's CellServDB file. The server with the lowest ip address gets 1.5 votes and the others 1 vote. To win election requires greater than 50% of the votes. In a two server configuration there are a total of 2.5 votes to cast. 1.5 > 2.5/2 so afsdb02.openstack.org always wins regardless of what afsdb01.openstack.org says. And afsb01.openstack.org can never win because 1 < 2.5/2. by adding a third ubik server to the quorum, the total votes cast are 3.5 and it always requires the vote of two servers to elect a winner ... if afsdb03 is added with the highest ip address, then either afsdb01 or afsdb02 can be elected Add a third server which is a focal host and related configuration. Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2 --- doc/source/afs.rst | 16 +++++++++++----- hiera/common.yaml | 1 + inventory/base/hosts.yaml | 7 +++++++ .../roles/openafs-server-config/files/CellServDB | 1 + roles/openafs-client/templates/CellServDB | 1 + 5 files changed, 21 insertions(+), 5 deletions(-) diff --git a/doc/source/afs.rst b/doc/source/afs.rst index 79f926a0d5..f1b2108977 100644 --- a/doc/source/afs.rst +++ b/doc/source/afs.rst @@ -23,6 +23,7 @@ At a Glance :Hosts: * afsdb01.openstack.org (a vldb and pts server in DFW) * afsdb02.openstack.org (a vldb and pts server in ORD) + * afsdb03.openstack.org (a second vldb and pts server in DFW) * afs01.dfw.openstack.org (a fileserver in DFW) * afs02.dfw.openstack.org (a second fileserver in DFW) * afs01.ord.openstack.org (a fileserver in ORD) @@ -58,8 +59,11 @@ Our implementation follows the common recommendation to colocate the VLDB and PTS servers, and so they both run on our afsdb* servers. These servers all have the same information and communicate with each other to keep in sync and automatically provide high-availability -service. For that reason, one of our DB servers is in the DFW region, -and the other in ORD. +service. As described in +``__ the Ubik +protocol requires three servers to maintain availability; for that +reason, two of our DB servers are in the DFW region, and the other in +ORD. Fileservers contain volumes, each of which is a portion of the file space provided by that cell. A volume appears as at least one @@ -614,7 +618,7 @@ afsdb0X.openstack.org ~~~~~~~~~~~~~~~~~~~~~ We have redundant AFS DB servers. You can take one down without causing -a service outage as long as the other remains up. To do this safely:: +a service outage as long as the others remain up. To do this safely:: root@afsdb01:~# bos shutdown afsdb01.openstack.org -wait -localauth root@afsdb01:~# bos status afsdb01.openstack.org -localauth @@ -633,7 +637,7 @@ Finally check that the service is back up and running:: Instance ptserver, currently running normally. Instance vlserver, currently running normally. -Now you can repeat the process against afsdb02. +Now you can repeat the process against afsdb02 or afsdb03. afs0X.openstack.org ~~~~~~~~~~~~~~~~~~~ @@ -683,12 +687,14 @@ Perform maintenance, then restart as above and check the status again:: DNS Entries ----------- -AFS uses the following DNS entries:: +AFS uses the following DNS entries which indicate an even balance:: _afs3-prserver._udp.openstack.org. 300 IN SRV 10 10 7002 afsdb01.openstack.org. _afs3-prserver._udp.openstack.org. 300 IN SRV 10 10 7002 afsdb02.openstack.org. + _afs3-prserver._udp.openstack.org. 300 IN SRV 10 10 7002 afsdb03.openstack.org. _afs3-vlserver._udp.openstack.org. 300 IN SRV 10 10 7003 afsdb01.openstack.org. _afs3-vlserver._udp.openstack.org. 300 IN SRV 10 10 7003 afsdb02.openstack.org. + _afs3-vlserver._udp.openstack.org. 300 IN SRV 10 10 7003 afsdb03.openstack.org. Be sure to update them if volume location and PTS servers change. Also note that only A (IPv4 address) records are used in the SRV data. Since OpenAFS diff --git a/hiera/common.yaml b/hiera/common.yaml index 1c47c0270a..bf6f8f00ed 100644 --- a/hiera/common.yaml +++ b/hiera/common.yaml @@ -206,6 +206,7 @@ cacti_hosts: - afs01.ord.openstack.org - afsdb01.openstack.org - afsdb02.openstack.org +- afsdb03.openstack.org - apps.openstack.org - ask.openstack.org - backup01.ord.rax.opendev.org diff --git a/inventory/base/hosts.yaml b/inventory/base/hosts.yaml index f7c6715506..bba9f33236 100644 --- a/inventory/base/hosts.yaml +++ b/inventory/base/hosts.yaml @@ -42,6 +42,13 @@ all: region_name: ORD public_v4: 23.253.200.228 public_v6: 2001:4801:7824:104:805d:9ae0:cb8d:3a86 + afsdb03.openstack.org: + ansible_host: 104.130.158.72 + location: + cloud: openstackci-rax + region_name: DFW + public_v4: 104.130.158.72 + public_v6: 2001:4800:7818:104:be76:4eff:fe04:2952 ask01.openstack.org: ansible_host: 104.239.149.165 location: diff --git a/playbooks/roles/openafs-server-config/files/CellServDB b/playbooks/roles/openafs-server-config/files/CellServDB index 28c01bf3ac..7e772c9a91 100644 --- a/playbooks/roles/openafs-server-config/files/CellServDB +++ b/playbooks/roles/openafs-server-config/files/CellServDB @@ -1,3 +1,4 @@ >openstack.org #Cell name 104.130.136.20 #afsdb01.openstack.org 23.253.200.228 #afsdb02.openstack.org +104.130.158.72 #afsdb03.openstack.org diff --git a/roles/openafs-client/templates/CellServDB b/roles/openafs-client/templates/CellServDB index 86c22ff900..f3bc88735d 100644 --- a/roles/openafs-client/templates/CellServDB +++ b/roles/openafs-client/templates/CellServDB @@ -1,6 +1,7 @@ >openstack.org #OpenStack 104.130.136.20 #afsdb01.openstack.org 23.253.200.228 #afsdb02.openstack.org +104.130.158.72 #afsdb03.openstack.org >grand.central.org #GCO Public CellServDB 28 Jan 2013 18.9.48.14 #grand.mit.edu 128.2.203.61 #penn.central.org