This change adds fields such as host OS, version and platform to the
core beats output, giving extra query/filter capabilities.
Change-Id: Iff61bb4402eaa45b8f1c134a6a39cebe6613cbf3
The previous code would terminate the play immediatley if any hosts
in the environment did not have a journal directory. This change runs
the journalbeat install role selectively on hosts that have the journal
directory, and skips hosts that do not.
In addition a legacy task to stop the play after uninstallation is removed,
this functionality is currently broken.
Change-Id: I412e3594c4b2292caafafb580bb4ede9ccfd3944
Previously the beat setup tasks were tagged with 'setup' but the include
statements were not, so the tasks were always skipped when using '--tags
setup'. This change adds tags to the includes so that the tasks are executed
as expected
Change-Id: If16069cd273d84a22b229b8140e5a8d56eed86d1
While the use of "to_json" resulted in a string it also created a JSON
escaped a string full of slashes which then corrupts kibana index patterns.
Change-Id: I2c26ab9dd4930226f3e554c2f9bed5e382cdafa5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
To upload the Kibana dashboard behind http proxies we need to
* Point to a local mirror for nodejs if one is defined
* Install nodejs directly with apt rather than the magic script
* Be explicit about no_proxy when uploading the dashboard
This change also uploads the dashboard only once rather than on
each elastic-logstash node.
Change-Id: I4695d6fe6f85d9120f83abc9a92c54ac3ad68c95
The PrivateDevices option, in LXC containers, causes more problems than
it solves. This change removes that option generally so that its no
longer a problem no matter how the installation of the ELK services is
done.
Change-Id: I7f881ac0da9eb6154f8e6b977df815736ac04264
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The hashicorp plugin for vault requires the hvac library. This change
adds the required lib to the embeded ansible.
Change-Id: If24767a647bc3fb359e67bac46ca4a626bbc6e54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This defaults to 60s, and there is a constant background of errors making
noise in the logs of the form:
ERROR pipeline/output.go:121 Failed to publish events: write tcp 10.11.3.128:40770->10.11.128.94:5044: write: connection reset by peer
Advice seems to be to increase client_inactivity_timout [1]. Tested in one
environment and the log noise is gone.
[1] https://discuss.elastic.co/t/solved-filebeat-logstash-connection-reset-by-peer/87012
Change-Id: Ia93867826a0c32192e3c37ea101f9a95a29e3d00
The json visualization templates are limited to filebeat-*
index-pattern. To avoid incompatibility between versions
of kibana, this patch aims to set the index-pattern of the
openstack visualizations as the global index-pattern *.
Change-Id: I2c41b94b0a9c323ecac47db0dd1e4075ad6a432a
The oslo.log has a default pattern for logging all of the entries
with context, so let's use that in a common place to avoid duplicating
all the information.
Change-Id: I7f326221c01f53710f3adbc5fc2d416bec6aef8f
At the moment, we're adding an extra field called "logdate" rather
than using the built-in timestamp. This makes things go to the
right field.
Change-Id: I5e56d01692b7205418e6aba89d1c7c44fa1abfef
The filebeat does not ship anything tagged with oslofmt, the
openstack tag gives us all we need to parse things correctly.
Change-Id: I614e4bc5d85559540a9d616407da993ed90de87e
Ceph has a problem where logs that were introduced which are
debug messages are logged as normal. They cause a lot of extra
useless messages and overloading ELK cluster
This was fixed in 12.2.9 which is not out yet, so let's work
around it for now by avoiding shipping it.
Change-Id: I36a503b7380ce62c65570232a18d2179a98ecfa1
This patch adds support for ARM64 beats. Unfortunately, Elastic does
not publish any packages, so this points at local builds. Also, it
looks like Packetbeat fails to builds so for now we just don't do
anything about it on ARM.
Change-Id: I1889ce51f1a4c13c311165b8b76dde7c71ecfa2d
The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.
* Async translog allows elasticsearch to using run fsync in the
background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
of 30. This integer is representitive of the seconds between index
refresh calls which greatly lowers the load generated across the
cluster.
* All documents were fingerprinted before writting to the cluster. This
was a costly operation as elasticsearch will do a forward lookup on all
documents with a preset ID resulting in 100's, if not 1000's, of extra
reads. The purpose of the fingerprint function is to limit repeading
writes so to keep some of this functionality the fingerprint function is
now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
> 6GiB. Early versions of elasticsearch did not recommend this setting
however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
allowing these tasks to trigger service restarts when changes are made.
Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This patch aims to provide the user a way to enable/disable
beats by overriding {beatname}_service_state variable accordingly
to the beats that the users wants to be receiving data.
There are some use cases that users just wants a subset of the
beats provided, mostly to avoid unecessary use of bandwidth
with data that woudn't be used. So the way that this patch proposes
this use case is just enable/disable after install, keeping the service
installed in case of the users needs it.
Change-Id: I2251095d7fcfc48a239fe9d4984269503cc835da
The node roles would apply attributes to hosts if an override was set or
if a node was part of a given group as determined through auto-detection.
This change will now add nodes to a given role when set manually and
will ensure no extra nodes are added to the role if the count meets or
exceeds what's required to run the service.
Change-Id: Ied5f564f0328488d3359ec4dc8e9ad17fefe5eaf
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When upgrading or updating a template fields within kibana will not be
updated until they're manually refreshed. This change uses the kibana
API to gather field information from the indexes and update kibana
automatically.
Change-Id: Ia5de566521d79da070f4377d1d7cb4d9786447b4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Ensures that any monitoring indexes are made with replicas in a custer
setup, which will ensure we're able to monitor the growth of ES indexes.
The curator action plugin timer was updated to use two different timer
files instead of combining them into one timer.
Change-Id: I2184ac4ec0b75e442ee8ae6ca8bd2c6f04d51401
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The curator action plugin does not use a logical OR when parsing
multiple filters. The only way to do this is to run curator with
different action filter files.
Change-Id: I97c93c87d6254f79831f2a177098ea52a3a3a49d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
* We don't need to create the containers as they are created during the
initial run.
* Remove quoting in favor of {% raw %} blocks
Change-Id: Ied696ad0882169d523a60a900788e7c2ba1d3fa3
The log rotate configuration was leaving too many logs in place and
allowing them to grow too large. This tunes up the logrotation process
to ensure we're retaining information but not excessively.
Change-Id: If0f02352ee2c274f4c589b05630d28126ceba2ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.
Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change allows this playbook to be run using an older version of
ansible. This change is necessary for my use case where I am running all
OSA and related playbooks in a docker container locally for a Newton
deploy.
The use of Newton OSA's ansible bootstrap script means that the
openstack-ansible my workflow uses requires Ansible 2.1, which does not
support `include_tasks`. This change addresses that problem by replacing
`include_tasks` in the playbook that needs to be run using
openstack-ansible with `include` which produces the desired result.
Change-Id: I8b2a0217e851d022ee40cbdd8bc8045e18d5a07d
Also, halve the loadbalancer default disk size; this is the only group
I'm somewhat confident doesn't need 90 GB of disk.
Change-Id: I40b46c8d978cdefbed8c4cd5586c7ded0fe318dc
This is essentially a noop that establishes a healthy DRY pattern of
group var definitions in the pxe_server subgroups.
Change-Id: Ie458f0684f720879c21ca639320814ef12d5dc4e
... to allow usage of host vars and group vars by looping the part of
the play that reifies the VM templates over pxe_servers rather than
vm_hosts. Instead of looping over pxe_servers hosts on each task, loop
over and delegate to vm_hosts on each task.
What this makes possible is the definition of default_vm_storage on a
per-host basis. The specific use case I have in mind is to allocate a
larger share of VG space to compute nodes without artificially limiting
how much space the compute nodes can have by bloating every other VM
in proportion.
Change-Id: I5f9a14038d59af7740acb64cce7f83fd88e5555a