16 Commits

Author SHA1 Message Date
Ian Wienand
1ac445b1d9 nodepool-base: prefer ZK IPv6 addresses
The current loop here uses the ansible_host value of the ZK servers,
which we have set to the IPv4 address in the inventory.

nb03 is constantly dropping out of ZK; for the record the logs record:

 2021-04-21 05:56:15,151 WARNING kazoo.client: Connection dropped: socket connection error: Connection reset by peer
 2021-04-21 05:56:15,151 WARNING kazoo.client: Transition to CONNECTING
 2021-04-21 05:56:15,151 INFO kazoo.client: Zookeeper connection lost
 2021-04-21 05:56:15,152 INFO kazoo.client: Connecting to 23.253.90.246(23.253.90.246):2281, use_ssl: True
 2021-04-21 05:56:15,176 INFO kazoo.client: Zookeeper connection established, state: CONNECTED

and this happens every few minutes.  This cloud does IPv4 behind a NAT
and it seems very likely this is related.

So the primary motivation here is to see if using IPv6 clears this up,
giving us some datapoints.  However I think that our other nodepool
hosts should all be fine to use ZK over IPv6.  However, I think in the
gate we may have cases where hosts don't have IPv6 addresses, so this
looks for the v6 address and if not found, falls back to the current
ansible_host behaviour.

Change-Id: Ifde86ddd632662f36bcbe2a0dc99660f06b01ac3
2021-04-21 16:56:07 +10:00
James E. Blair
7a32463f9d Revert "Revert "Add Zookeeper TLS support""
This reverts commit 05021f11a29a0213c5aecddf8e7b907b7834214a.

This switches Zuul and Nodepool to use Zookeeper TLS.  The ZK
cluster is already listening on both ports.

Change-Id: I03d28fb75610fbf5221eeee28699e4bd6f1157ea
2020-07-15 15:45:48 -07:00
Zuul
b1c2a99ff2 Merge "Zookeeper: listen on plain and TLS ports" 2020-06-19 22:12:21 +00:00
Clark Boylan
b0364059aa No log the make nodepool zk hosts task
Ansible has a tendency to log too much. Make it log less.

Change-Id: Ic32332430d90ff4cb00564943c9281765aa72fb1
2020-06-18 14:52:08 -07:00
Clark Boylan
f7e92ee669 Improve ansible yaml output for humans
We use ansible's to_nice_yaml output filter when writing ansible
datastructures to yaml. This has a default indent of 4, but we humans
usually write yaml with an indent of 2. Make the generated yaml more
similar to what us humans write and set the indent to 2.

Change-Id: I3dc41b54e1b6480d7085261bc37c419009ef5ba7
2020-06-18 10:02:11 -07:00
James E. Blair
a514aa0f98 Zookeeper: listen on plain and TLS ports
To prepare for switching to TLS, set up TLS certs for Zookeeper and
all of Nodepool and Zuul, but do not have them connect over TLS yet.
We have observed problems with Kazoo using TLS in production.  This
will let us run the ZK quorum using TLS internally, and have Zuul
and Nodepool connect over plaintext while also exposing the TLS
client port so that we can perform some more production tests.

Change-Id: If93b27f5b55be42be1cf6ee23258127fab5ce9ea
2020-06-17 10:38:59 -07:00
James E. Blair
05021f11a2 Revert "Add Zookeeper TLS support"
This reverts commit 29825ac18b58145f007f64b2998357445b8fdd91.

We observed this issue in production:
https://github.com/python-zk/kazoo/issues/587

Revert until we find a fix.

Change-Id: Ib7b8e3b06770a83b39458d09d2b1e655bd94bd22
2020-06-16 11:15:48 -07:00
James E. Blair
29825ac18b Add Zookeeper TLS support
This creates TLS certs for Zookeeper, uses them inside the ZK
quorum, and configures Nodepool and Zuul to use them as well.

A full system restart of all ZK-related components will be required
after merging this patch.

Change-Id: I0cb96a989f3d2c7e0563ce8899f2a5945ea225b3
2020-06-15 11:19:47 -07:00
James E. Blair
09935ff328 Run Zuul as the zuuld user
This avoids the conflict with the zuul user (1000) on the test
nodes.  The executor will continue to use the default username
of 'zuul' as the ansible_user in the inventory.

This change also touches the zk and nodepool deployment to use
variables for the usernames and uids to make changes like this
easier.  No changes are intended there.

Change-Id: Ib8cef6b7889b23ddc65a07bcba29c21a36e3dcb5
2020-05-20 13:17:28 -07:00
Monty Taylor
15b662b37a Use ansible_host instead of ansible_default_ip* for zk
Our zk config is a little too brittle. Let's just use the inventory
vars instead of detected network facts.

Change-Id: I288990edf587bc8394c9473388a858f46efb0691
2020-05-11 19:36:02 +00:00
Monty Taylor
8b1b70c77e Configure nodepool to use logging config
We have a logging config to log to /var/log/nodepool but we weren't
using it. Start using it.

Add logging config to nodepool-builder

We should log nodepool builder to /var/log/nodepool too.

Change-Id: I6e7196dc12e8c1bfc54274432b94cf53629bdf3d
2020-05-06 11:18:19 -05:00
Monty Taylor
8d7075b02f Run zookeeper cluster in nodepool jobs
Rather than running a local zookeeper, just run a real zookeeper.
Also, get rid of nb01-test and just use nb04 - what could possibly
go wrong?

Dynamically write zookeeper host information to nodepool.yaml

So that we can run an actual zk using the new zk role on hosts in
ansible inventory, we need to write out the ip addresses of the
hosts that we build in zuul. This means having the info baked in
to the file in project-config isn't going to work.

We can do this in prod too, it shouldn't hurt anything.

Increase timeout for run-service-nodepool

We need to fix the playbook, but we'll do that after we get the
puppet gone.

Change-Id: Ib01d461ae2c5cec3c31ec5105a41b1a99ff9d84a
2020-04-29 16:18:25 -05:00
Monty Taylor
ebae022d07 Use project-config from zuul instead of direct clones
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.

Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.

Rename zuulcd to zuul

To better align prod and test, name the zuul user zuul.

Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
2020-04-15 12:29:33 -05:00
Ian Wienand
1979d6b160 nodepool-builder: deploy from container
This deploys the nodepool-builder container and verifies it has
started in testinfra.

Change-Id: I8a717d06f1291a4112b2753641ff88f074cf0b31
2020-03-11 09:16:24 +11:00
Ian Wienand
e7f1062d51 Add install zookeeper role; use for nodepool-builder testing
This adds a simple role to install Zookeeper.

Add an option to nodepool-base to use this role to install Zookeeper.

Use this in the nodepool-builder gate testing where we are just
validating that the nodepool-builder container starts and is ready to
accept connections.  It needs a zookeeper to talk to, even though it
is not going to do anything.

Change-Id: I4ae89a51e454be4ee53ad4e04407162aaa8d9f9a
2020-03-06 14:02:52 +11:00
Ian Wienand
281425a44d Add initial Ansible for nodepool hosts
This is a start at ansible-deployed nodepool environments.

We rename the minimal-nodepool element to nodepool-base-legacy, and
keep running that for the old nodes.

The groups are updated so that only the .openstack.org hosts will run
puppet.  Essentially they should remain unchanged.

We start a nodepool-base element that will replace the current
puppet-<openstackci|nodepool> deployment parts.  For step one, this
grabs project-config and links in the elements and config file.

A testing host is added for gate testing which should trigger these
roles.  This will build into a full deployment test of the builder
container.

Change-Id: If0eb9f02763535bf200062c51a8a0f8793b1e1aa
Depends-On: https://review.opendev.org/#/c/710700/
2020-03-06 14:02:52 +11:00