From 3dcfe61d2f8ef9c5d4b6186c703e50c77114badb Mon Sep 17 00:00:00 2001 From: Ian Wienand Date: Wed, 11 Sep 2019 13:15:29 +1000 Subject: [PATCH] fedora mirror update : add sleep As described inline, this should make our mirror pulses more robust against timeouts. This is probably ripe for turning into more of a library situation for all the other "vos release" calls too. But one thing at a time ... I think we test with this for a while to see if stability returns. Change-Id: I041a290053e4e8ceba80785598a5945e5adcf6f1 --- .../mirror-update/files/fedora-mirror-update | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/playbooks/roles/mirror-update/files/fedora-mirror-update b/playbooks/roles/mirror-update/files/fedora-mirror-update index b317c3e5bc..ca1ee276d7 100755 --- a/playbooks/roles/mirror-update/files/fedora-mirror-update +++ b/playbooks/roles/mirror-update/files/fedora-mirror-update @@ -115,6 +115,35 @@ echo_ts "... done" date --iso-8601=ns | $K5START tee $BASE/timestamp.txt +# Now sleep for 20 minutes. openafs "pads" its incremental +# replication on "vos release" by -15 minutes to account for clock +# skew between hosts. +# +# We can get into a negative feedback loop with this, particularly if +# we have a series of big updates, or run things by hand to avoid +# timeouts. +# +# Consider the case of a large mirror pulse (perhaps a new distro +# release is included, etc.). The "Last Update" time on the volume +# will indicate when this run finished. +# +# The last 15 minutes of that run could have brought in a significant +# amount of data. Now we move onto the next mirror pulse, and the +# "vos release" below will try to sync the remote R/O volume from +# "Last Update - 15 minutes" to now(). If you include the data from +# this pulse, we are now dragging across potentially *a lot* of data; +# enough to make the whole thing timeout. Then the volume is locked, +# and we keep putting more data ontop with each cron run making it +# even worse. +# +# By sleeping here for 15+ minutes and doing a trivial write, we can +# ensure that when the *next* release says "sync from Last Update - 15 +# minutes" it will *only* include this trivial write, and not +# potentially this entire mirror pulse data too. + +sleep $(( 20 * 60 )) +date --iso-8601=ns | $K5START tee $BASE/timestamp.txt + echo_ts "Running vos release." k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v $MIRROR_VOLUME | \ while IFS= read -r line; do echo_ts "$line"; done