We found a bug in master which will prevent us from merging a fix;
downgrade the scheduler to 4.1.0 to get that in.
Change-Id: Ie9ad75177ab58b34e20cafab496ba7af6f082551
So we can stop/pull/start, move the pull tasks to their own files
and add a playbook that invokes them.
Change-Id: I4f351c1d28e5e4606e0a778e545a3a805525ac71
This avoids the conflict with the zuul user (1000) on the test
nodes. The executor will continue to use the default username
of 'zuul' as the ansible_user in the inventory.
This change also touches the zk and nodepool deployment to use
variables for the usernames and uids to make changes like this
easier. No changes are intended there.
Change-Id: Ib8cef6b7889b23ddc65a07bcba29c21a36e3dcb5
We noticed that our zuul scheduler was running out of disk and one of
the causes of this is we are pulling all of the wonderful new zuul
images and not pruning them. This happens because we were only pruning
when (re)starting services and we don't do that automatically with Zuul.
Address this by always pruning after pulling even if we don't restart
services. This should be safe because prune will leave the latest tagged
images as well as the running images.
This should keep our disk consumption down.
Change-Id: Ibdd22ac42d86781f1e87c3d11e05fd8f99677167
We're running these in containers now. Please not to try to start
them the old way.
failed_when false is because we can't disable the old service
in the gate if there is no service file installed.
Change-Id: Ia4560f385fc98e23f987a67a1dfa60c3188816b6
We use the zuul_scheduler_start flag to determine if we want to start
the zuul-scheduler when new containers show up. Unfortunately we weren't
setting zuul_scheduler_start in prod so we failed with this error:
error while evaluating conditional (zuul_scheduler_start | bool): 'zuul_scheduler_start' is undefined
Fix this by treating an unset var as equivalent to a set truthy var
value. We do this instead of always setting the var to false in prod as
it simplifies testing.
Change-Id: I1f1a86e80199601646c7f2dec2a91c5d65d77231
If we need to start and stop, it's best to use playbooks.
We already have tasks files with start commands in each role,
so put the stop commands into similar task files.
Make the restart playbook import_playbook the stop and start
playbooks to reduce divergence.
Use the graceful shutdown pattern from the gerrit docker-compose
to stop the zuul scheduler.
Change-Id: Ia20124553821f4b41186bce6ba2bff6ca2333a99
We don't want to HUP all the processes in the container, we just
want zuul to reconfigure. Use the smart-reconfigure command.
Also - start the scheduler in the gate job.
Change-Id: I66754ed168165d2444930ab1110e95316f7307a7
Zuul is publishing lovely container images, so we should
go ahead and start using them.
We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.
Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.
Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40