We move robots.txt to custom/ instead of custom/public/ as
custom/public/ is now served at /assets/ via the gitea webserver and we
need robots.txt at the root. Related to this we update image urls to be
prefixed with AssetUrlPrefix in their paths so that if this path changes
against in the future we should automatically accomodate that.
Change-Id: I8ce5fe8ff342617ff156a401be8418d593fd35c4
Previously we were doing this weekly. Gerrit does this daily. "Split"
the difference and do gitea every other day.
We have noticed that replication to gitea can be slow at times. One idea
is that the less packed repos on the gitea side may make negotiating the
updates slower. Pack more often to see if this helps.
Change-Id: I8961007dce3e448bfdbf1c5f3e8dfc5ec8eb82fb
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers. To
reduce complexity and avoid traps in the future, make it non-optional.
Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
This makes a number of changes and in depth testing is probably
warranted.
* Bump the golang version to 1.16 to match upstream's dockerfile
golang version.
* Bump the nodejs version to latest stable which is v16.x and
consume it from the buster repo since our image is based on buster.
* Bump the gitea version to 1.14.4.
* Rename all of the opendev-.*.png logo files to logo-.*.png as the
names of these files are no longer customizable. The reason for this
is the user settable json manifest has been removed.
* We keep opendev-sm.png because Gerrit apparently loads this?
* Update html templates to be in line with 1.14.4's templates.
* Use the gitea `admin user create` command instead of `admin
create-user`. While I can't find removal or deprecation notices for
the previous command the current docs note you should use the new
version and the old one is failing in CI.
Change-Id: I0a05ebb963cca5be88aeb2f859bfbeefa0f9c8e0
When we added Apache as a filtering proxy on our Gitea backends in
order to more easily mitigate resource starvation, we did not set
any tuning to tell it when to recycle worker processes. As a result,
backends may continue serving requests with workers which pre-date
certificate rotation. This problem has also become more broadly
prevalent throughout our services with the introduction of Let's
Encrypt's 3-month certificate expirations as compared to our
previous 2-year certificates.
Add the same MaxConnectionsPerChild tuning to our Gitea backend
proxies as we use for our static sites and mirror servers.
Change-Id: I77d89385178a30f7dc5d04bedd0ab3772865c09f
I introduced this typo with I500062c1c52c74a567621df9aaa716de804ffae7.
Luckily Ibb63f19817782c25a5929781b0f6342fe4c82cf0 has alerted us to
this problem.
Change-Id: I02bf2f4fa1041642a719100e9591bf5cd1a0bf49
Because the bulk of this traffic originates with our load balancer we
need to use port info to differentiate between actual source clients in
the load balancer logs. That info is currently missing so add it in.
Change-Id: I737e6373c09669f0321b656ecd4b137b94be38a4
Add facility to borg-backup role to run a command and save the output
of it to a separate archive file during the backup process.
This is mostly useful for database backups. Compressed on-disk logs
are terrible for differential backups because revisions have
essentially no common data. By saving the uncompressed stream
directly from mysqldump, we allow borg the chance to de-duplicate,
saving considerable space on the backup servers.
This is implemented for our ansible-managed servers currently doing
dumps. We also add it to the testinfra.
This also separates the archive names for the filesystem and stream
backup with unique prefixes so they can be pruned separately.
Otherwise we end up keeping only one of the stream or filesystem
backups which isn't the intention. However, due to issues with
--append-only mode we are not issuing prune commands at this time.
Note the updated dump commands are updated slightly, particularly with
"--skip-extended-insert" which was suggested by mordred and
significantly improves incremental diff-ability by being slightly more
verbose but keeping much more of the output stable across dumps.
Change-Id: I500062c1c52c74a567621df9aaa716de804ffae7
This bumps our golang image up to buster-1.15 from buster-1.14 as gitea
bumps their minimum to 1.13 and I figure we should keep up to date.
The templates are updated to accomodate the new gitea templates. Primary
changes here are removal of icon sizes when specified and using imported
templates to simplify bits of code we weren't changing anyway.
We install openssh-server from buster-backports on our gitea-ssh image.
The reason for this is we pull in gitea's sshd_config from gitea itself
and the updated gitea wants to set options that older openssh in buster
proper doesn't support. Accomodate this with the newer openssh found in
backports.
We add a new favicon.svg to override the new default gitea svg favicon
which is served otherwise.
One other thing to call out is that gitea 1.13.0 added support for
kanban and similar project management tooling. We have explicitly
disabled this along with the wiki, issues and pull requests via
app.ini's repository.DISABLE_REPO_UNITS setting. You can find out more
about this setting here:
https://docs.gitea.io/en-us/config-cheat-sheet/#repository-repository
Change-Id: I4c483f90c7495ee1f80eacd2c79c38836aa6f483
We're using logrotate to keep a small number of db backups locally. We
write these backups to disk compressed. We don't want logrotate to
recompress them. This is unnecessary extra work.
Change-Id: Iafe1628ff421f47cf3e5cbee14998eeceb60be4c
This started with me wondering why gerritbot was putting all its
output into /var/log/syslog -- it turns out Xenial docker is
configured to use journalctl (which forwards to syslog) and Bionic
onwards uses json-file.
Both are sub-optimial; but particularly the json-file because we lose
the logs when the container dies. This proposes moving to a more
standard model of having the containers log to syslog and redirecting
that to files on disk.
Install a rsyslog configuration to capture "docker-*" program names
and put them into logfiles in /var/log/containers. Also install
rotation for these files.
In an initial group of docker-compose files, setup logging to syslog
which should then be captured into these files. Add some basic
testing.
If this works OK, I think we can standardise our docker-compose files
like this to caputure the logs the same everywhere.
Change-Id: I940a5b05057e832e2efad79d9a2ed5325020ed0c
The user agent filter has been turned into a reusable Ansible role
containing a macro definition. Add that role and replace the
hard-coded copy of the user agent filter here with that
UserAgentFilter macro.
Change-Id: Ic24a38c93f0f68fab9ef1168de91ffad477fe13c
The existing filtered UAs seem to catch the bulk of the traffic but
there are a few common ones that are still sneaking through. Add four
new rules for these cases.
All three are MSIE variants from version 6 to 9. All old enough that we
should be able to do this safely without impacting real users.
Change-Id: I8ae59f38de8b30bd06e1643ddbccf81ea32858aa
Gitea has added a STACKTRACE_LEVEL config option to set which log level
will also generate stack traces when logging. We want them for at least
Error and above so set this to Error for now. In particular there seems
to be a commit cache issue which results in errors that having stack
traces for would be helpful to debug.
Change-Id: I0491373ef143dfa753c011d02e3c670c699d2a52
The Apache 3081 proxy allows us to do layer 7 filtering on incoming
requests. However, it was returning 502 errors because it proxies to
https://localhost and the certificate doesn't match (see
SSLProxyCheckPeerName directive). However, we can't use the full
hostname in the gate because our self-signed certificate doesn't cover
that.
Add a variable and proxy to localhost in the gate, and the full
hostname in production. This avoids us having to turn off
SSLProxyCheckPeerName.
Change-Id: Ie12178a692f81781b848beb231f9035ececa3fd8
We had assigned a value of 300 to this setting but gitea ignored it and
continued to use a 30 second timeout instead. Rereading docs and code it
appears that we may need a unit to accompany the value. Set it to 300s
instead of 300.
Change-Id: I763092c0371a15a417313ed05a9fd27d0e6e7f93
The default indexer timeout is 30 seconds. During a recent gitea restart
gitea01 hit this timeout five times: 150 seconds. Increase the timeout
to double that value: 300 seconds.
This is important to ensure that our graceful restarts are in fact
graceful. We don't want the sshd container running while web is being
restarted multiple times. Doing so can lead to lost replication events
from gerrit.
Change-Id: I1f9253ccd6fbb055f848e186f478651454fee7e0
I476674036748d284b9f51e30cc2ffc9650a50541 did not open port 3081 so
the proxy isn't visible. Also this group variable is a better place
to update the setting.
Change-Id: Iad0696221bb9a19852e4ce7cbe06b06ab360cf11
We have decided to go with the layer 7 reject rules; enable the
reverse proxy for production hosts.
Change-Id: I476674036748d284b9f51e30cc2ffc9650a50541
As described inline, this crawler is causing us problems as it hits
the backends indiscriminately. Block it via the known UA strings,
which luckily are old so should not cause real client issues.
Change-Id: I0d78a8b625b69f600e00e8b3ea64576e0fdb84d9
This adds an option to have an Apache based reverse proxy on port 3081
forwarding to 3000. The idea is that we can use some of the Apache
filtering rules to reject certain traffic if/when required.
It is off by default, but tested in the gate.
Change-Id: Ie34772878d9fb239a5f69f2d7b993cc1f2142930
We use the Ctx.Req object's RemoteAddr value as it should include the
IP:port combo according to https://golang.org/pkg/net/http/#Request. The
default template uses Ctx.RemoteAddr which Macaron attempts to parse for
x-forwarded-for values but this has the problem of stripping out any
port info.
The port info is important for us because we are doing layer 4 load
balancing and not http l7 load balancing. That means the ip:port
mappings are necessary to map between haproxy and gitea logs.
Change-Id: Icea0d3d815c9d8dd2afe2b1bae627510c1d76f99
This will write an NCSA style access.log file to the logs volume.
This will let us see user agents, etc, to aid in troubleshooting.
Change-Id: I64457f631861768928038676545067b80ef7a122
In places like crontab entries we use full paths to executables because
PATH is different under cron. Unfortunately, this meant we broke
docker-compose commands using /usr/bin/docker-compose when we installed
it under /usr/local/bin/docker-compose. In particular this impacted
database backups on gitea nodes and etherpad.
Update these paths so that everything is happy again.
Change-Id: Ib001baab419325ef1a43ac8e3364e755a6655617
We want to use stop_grace_period to manage gerrit service stops. This
feature was added in docker-compose 1.10 but the distro provides 1.5.
Work around this by installing docker-compose from pypi.
This seems like a useful feature and we want to manage docker-compose
the same way globally so move docker-compose installation into the
install-docker role.
New docker-compose has slightly different output that we must check for
in the gitea start/stop machinery. We also need to check for different
container name formatting in our test cases. We should pause here and
consider if this has any upgrade implications for our existing services.
Change-Id: Ia8249a2b84a2ef167ee4ffd66d7a7e7cff8e21fb
Gerrit replication plugin is good about retrying replication if its
connectivity to the remote fails. It however thinks everything is happy
if it can connect and push even when gitea-web isn't running.
Make the whole replication system happier by stopping gitea-ssh before
other services and starting it after other services. This way gerrit
should fail to replicate until gitea is ready for it to ssh in.
Change-Id: I3440d8dd8a01a3aaf5d18c9c2ca48e7ead63856f
To make it clear that docker hub is but one of many possible registries,
update our usage of FROM and image: lines to include docker.io in the
path.
There are a few other FROM lines for the gitea images which are handled
in a separate stack.
Change-Id: I6fafd5f659ad19de6951574afc9a6b6a4cf184df
1.10 introduces a PASSWORD_COMPLEXITY setting with a default value
of lower,upper,digit,spec - which requires passwords to have an
upper, lower, digit and special character. Our example password does
not have this, so set the PASSWORD_COMPLEXITY setting. We could
alternately leave it at the default and ensure that our passwords
meet the spec.
The sshd_config file is templated now, so we can set the listen port
via env var.
Change-Id: I6e4b595eabb9c6885d78fff1109ea9f602e89ef7
This is the first step in managing the opendev.org cert with LE. We
modify gitea01.opendev.org only to request the cert so that if this
breaks the other 7 giteas can continue to serve opendev.org. When we are
happy with the results we can merge the followup change to update the
other 7 giteas.
Depends-On: https://review.opendev.org/694182
Change-Id: I9587b8c2896975aa0148cc3d9b37f325a0be8970
Randomising the time of this job should help avoid a thundering herd
of I/O intensive operations in the gitea environment.
Change-Id: I035f7781a397665357b6d039b989ab9fe6a46b8a
Add the full remote_puppet_git playbook that we actually use in
production so that we can test the whole kit and caboodle. For
now don't add a review.o.o server to the mix, because we aren't
testing anything about it.
Change-Id: If1112a363e96148c06f8edf1e3adeaa45fc7271c
Sadly, as readable as the use of the uri module to do the interactions
with gitea is, more reent ansible changed how subprocesses are forked
and this makes iterating over all the projects in projects.yaml take
an incredibly long amount of time.
Instead of doing it in yaml, make a python module that takes the list
one time and does looping and requests calls. This should make it be
possible to run the actual gitea creation playbook in integration tests.
Change-Id: Ifff3291c1092e6df09ae339c9e7dddb5ee692685
During a db recovery to rebuild a host using the existing db backups
resulted in a corrupt mysql.proc table. The issue seemed to be
attempting to restore the mysql database. Instead of dumping all
databases lets just backup the one we care about: gitea.
Change-Id: Ia2c87b62736fda1c8a9ce77126e383ec74990b4a
The stdout progress feed from `git gc` is fairly verbose and
targeted at audiences running it interactively. Since our cron for
this iterates over thoudands of repositories on our Gitea servers,
we don't need to send the progress info to all our sysadmins by
E-mail. Instead use the --quiet option to the gc subcommand so that
progress output will be suppressed.
If this still proves too verbose (as in, continues to result in
E-mail to root even when there are no failures), we can try
redirecting stdout to /dev/null.
Change-Id: Idc06e48cbf85e127a343c2a3cf51a35e6ed09685
This isn't added as a separate role because it heavily relies on the
gitea deployment specific (docker-compose, service names, etc). If we
end up running more services with docker-compose and databases we can
probably make this reconsumable.
Change-Id: I7b9084a8a90a86f73f5b24de505978d3f286850b
As new change refs accumulate, replication pushes and page loads
will take longer as git stats all of the refs/ files. To avoid
that, pack refs and gc every week to keep the number of files
and space used minimal.
Change-Id: Iff273ebbc25a512ab7e12b8418ceb30e7c722f92
This has a few emergency local patches while we wait for them to
appear in an upstream release.
This updates the modified templates to match the changes in 1.8.0
upstream.
This also disables the oauth2 service, which is new in 1.8.0.
Without disabling this, gitea tries to generate a JWT secret and
write it to the file, which in our case is read only. If we want
to enable it, we need to add a new JWT_SECRET setting.
Change-Id: I969682bce6ff25b7614ce9265097307ee9cbc6cb
Co-Authored-By: Monty Taylor <mordred@inaugust.com>
This is a first step toward making smaller playbooks which can be
run by Zuul in CD.
Zuul should be able to handle missing projects now, so remove it
from the puppet_git playbook and into puppet.
Make the base playbook be merely the base roles.
Make service playbooks for each service.
Remove the run-docker job because it's covered by service jobs.
Stop testing that puppet is installed in testinfra. It's accidentally
working due to the selection of non-puppeted hosts only being on
bionic nodes and not installing puppet on bionic. Instead, we can now
rely on actually *running* puppet when it's important, such as in the
eavesdrop job. Also remove the installation of puppet on the nodes in
the base job, since it's only useful to test that a synthetic test
of installing puppet on nodes we don't use works.
Don't run remote_puppet_git on gitea for now - it's too slow. A
followup patch will rework gitea project creation to not take hours.
Change-Id: Ibb78341c2c6be28005cea73542e829d8f7cfab08