20 Commits

Author SHA1 Message Date
James E. Blair
b6cbb52447 Add pull tasks for nodepool/zuul
So we can stop/pull/start, move the pull tasks to their own files
and add a playbook that invokes them.

Change-Id: I4f351c1d28e5e4606e0a778e545a3a805525ac71
2021-02-19 15:42:40 -08:00
James E. Blair
bfa60880a1 Remove old service cleanups from zuul
These cleanup tasks have all run and we no longer need to carry them.

Change-Id: I6130d1c2fbfe39ea339f1e18f3306b221b0e12e1
2021-01-22 15:51:06 -08:00
Ian Wienand
10f5a23e4b zuul-web: fix zuul.openstack.org location match
In I4e5f803b9d4fb6c2351cf151a085b93a7fd20f60 I put the wrong thing in
the zuul.openstack.org config; for that site we want to cache
/api/status; not the tenant path.

Change-Id: Iffbd870aeff496b9c259206f866af3a90a4349db
2020-09-15 08:34:10 +10:00
Ian Wienand
8a2289f70a zuul-web: rework caching
mod_mem_cache was removed in Apache 2.4 so all the bits of
configuration gated by the IfModule are currently irrelevant.

The replacement is socache, the in-memory version is "shmcb" (can also
hook up to memcache, etc.).  Enable the socache module, and switch the
cache matching parts to use socache and then fall-back to disk cache
(this is what it says this will do in the manual [1])

The other part of this is to turn the CacheQuickHandler off.  The
manual says about this [2]

  In the default enabled configuration, the cache operates within the
  quick handler phase. This phase short circuits the majority of
  server processing, and represents the most performant mode of
  operation for a typical server. The cache bolts onto the front of
  the server, and the majority of server processing is avoided.

I won't claim to fully understand how our mod_rewrite rules and
mod_proxy all hang together with phases and what-not.  But emperically
with this turned on (default) we do not seem to get any caching on the
tenant status pages, and with it turned off we do.

I've deliberately removed IfModule gating as well.  This actually hid
the problem and made it much more difficult to diagnose; it is much
better if these directives just fail to start Apache if we do not have
the modules we expect to have.

[1] https://httpd.apache.org/docs/2.4/mod/mod_cache_socache.html
[2] https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cachequickhandler

Change-Id: I4e5f803b9d4fb6c2351cf151a085b93a7fd20f60
2020-09-14 13:59:53 +10:00
Ian Wienand
0177b40618 zuul-web: move LogFormat combined-cache into config
These two values overwrite each other, move into common configuration.

The "cache-status" is a verbose string, so quote it.

Change-Id: I3cc4627de3d6a0de1adcfed6b424fc3ed0099245
2020-09-14 11:14:28 +10:00
Clark Boylan
35c9783036 Use LocationMatch to cache zuul api statuses
We need a regex to match the url path for zuul statuses. Our existing
setup assumed this would work in a CacheEnable directive but it seems
that it does not. Move this into a LocationMatch which explicitly
supports regexes.

Change-Id: I9df06d2af31ce6550e537f4594640487cca1d735
2020-09-10 13:17:18 -07:00
Clark Boylan
9c68191ce8 Improve logging and cache config for zuul web proxy
We attempt to cache things served by zuul-web in our apache proxy. This
is to reduce the load on the zuul-web process which has to query
gearman, the sql database, and eventuall the zookeeper database to
produce its responses.

Things are currently operating slowly and it isn't clear if we're
caching properly. To check that better update our logging format to
record cache hits and misses. Also drop an unnecessary .* in the
CacheEnable url-strings for /static/ as it is unclear if the .* is
treated as a regex here.

Change-Id: Ib57c085fa15365b89b3276e037339dbeddb094e3
2020-09-10 11:41:08 -07:00
James E. Blair
09935ff328 Run Zuul as the zuuld user
This avoids the conflict with the zuul user (1000) on the test
nodes.  The executor will continue to use the default username
of 'zuul' as the ansible_user in the inventory.

This change also touches the zk and nodepool deployment to use
variables for the usernames and uids to make changes like this
easier.  No changes are intended there.

Change-Id: Ib8cef6b7889b23ddc65a07bcba29c21a36e3dcb5
2020-05-20 13:17:28 -07:00
Zuul
9437572ce3 Merge "Pull and prune docker images together" 2020-05-07 23:16:58 +00:00
Clark Boylan
c0fd3e0894 Pull and prune docker images together
We noticed that our zuul scheduler was running out of disk and one of
the causes of this is we are pulling all of the wonderful new zuul
images and not pruning them. This happens because we were only pruning
when (re)starting services and we don't do that automatically with Zuul.
Address this by always pruning after pulling even if we don't restart
services. This should be safe because prune will leave the latest tagged
images as well as the running images.

This should keep our disk consumption down.

Change-Id: Ibdd22ac42d86781f1e87c3d11e05fd8f99677167
2020-05-07 12:51:09 -07:00
Zuul
1c657be9e8 Merge "Cache static zuul resources in apache" 2020-05-07 17:10:53 +00:00
Zuul
239bb4e09d Merge "Configure htcacheclean for zuul-web" 2020-05-07 17:10:51 +00:00
Monty Taylor
c836437925 Remove old init scripts and services for zuul/nodepool
We're running these in containers now. Please not to try to start
them the old way.

failed_when false is because we can't disable the old service
in the gate if there is no service file installed.

Change-Id: Ia4560f385fc98e23f987a67a1dfa60c3188816b6
2020-05-06 17:13:58 -05:00
Clark Boylan
3ad3ea7bc5 Revert "Clear LD_PRELOAD variable on zuul-web containers"
This reverts commit 140b95a2d0f6b9ac6509879b41df0afec53b71f6.

The images have removed jemalloc entirely so we don't need to clear
LD_PRELOAD manually anymore.

Change-Id: I21ae61095dd621e4e5c401187a8419a0329b1970
2020-05-05 16:07:59 -07:00
Clark Boylan
140b95a2d0 Clear LD_PRELOAD variable on zuul-web containers
We have been having memory leak issues with zuul-web on our move from
running on the host with python3.5 to running in containers with
python3.7 and python3.8. One other thing that chagned was we added
LD_PRELOAD settings to use jemalloc instead of normal libc provided
malloc. In an effort to rule this out disable jemalloc in the zuul-web
containers.

Change-Id: Icf03b60266f876dd7c322e8c8f7c207b692d3ad7
2020-05-04 13:00:52 -07:00
Clark Boylan
608f56ab82 Configure htcacheclean for zuul-web
We are starting to use the apache2 mod_cache_disk functionality more now
and during use the cache has grown into the 1.5GB range. The
htcacheclean process is cleaning up every 2 hours which is how it is
getting behind with its limit of 300MB. Reduce the interval to 15
minutes by supplying an /etc/default/apache-htcacheclean config.

Note we cache status.json files which are only valid for a very short
period of time. This likely explains the quick growth of the cache.

Change-Id: Iff00fb1806796ef6db26e53e026c533c47a902b4
2020-04-30 16:23:51 -07:00
Clark Boylan
8aab93a4d6 Cache static zuul resources in apache
We are running zuul-web out of a container now which is forcing all http
requests through to cherrypy (eg we no longer serve static resources
from apache directly). Alleviate some of the pressure on cherrypy by
caching static resources in apache.

Change-Id: I77d0df4b4853e4dff3177862a248cdf4efa33765
2020-04-29 17:12:27 -07:00
Clark Boylan
6bc23598d3 Improve zuul-web apache config
Compress css and javascript content as they can be quite large for zuul.

Also, cache status json results when using the non whitelabeled api
paths for zuul.opendev.org. This should improve performance for those
status files.

Change-Id: I7b965b27a88d5fda4d43be31c39989994334989c
2020-04-27 15:08:08 -07:00
Monty Taylor
2a7c755a08 Rework zuul start/stop/restart playbooks for docker
If we need to start and stop, it's best to use playbooks.

We already have tasks files with start commands in each role,
so put the stop commands into similar task files.

Make the restart playbook import_playbook the stop and start
playbooks to reduce divergence.

Use the graceful shutdown pattern from the gerrit docker-compose
to stop the zuul scheduler.

Change-Id: Ia20124553821f4b41186bce6ba2bff6ca2333a99
2020-04-27 09:34:50 -05:00
Monty Taylor
f0b77485ec Run Zuul using Ansible and Containers
Zuul is publishing lovely container images, so we should
go ahead and start using them.

We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.

Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.

Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40
2020-04-24 09:18:44 -05:00