The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.
* Async translog allows elasticsearch to using run fsync in the
background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
of 30. This integer is representitive of the seconds between index
refresh calls which greatly lowers the load generated across the
cluster.
* All documents were fingerprinted before writting to the cluster. This
was a costly operation as elasticsearch will do a forward lookup on all
documents with a preset ID resulting in 100's, if not 1000's, of extra
reads. The purpose of the fingerprint function is to limit repeading
writes so to keep some of this functionality the fingerprint function is
now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
> 6GiB. Early versions of elasticsearch did not recommend this setting
however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
allowing these tasks to trigger service restarts when changes are made.
Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.
The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.
Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.
Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The apache2 monitoring process requires a couple interactions to deploy
successfully. This change will ensure that if the apache2 monitoring
fails, in any way, it does not block the deployment.
Change-Id: Ibe35197a1c65f4abe9e4870c07ee15f37f9a58ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
* Implements G1 GC optionally. The variable `elastic_g1gc_enabled` has
been added with a default of false. If this option is set true and the
system has more than 4GiB of RAM G1GC will be enabled.
* Adds new thread options
* Better constraints coordination nodes
* Interface recover speed has been limited
* Buffer size is now set correctly
* Serialize elk deployment so that upgrades are non-impacting
Change-Id: I89224eeaf4ed29c3bb1d7f8010b69503dbc74e11
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Logstash is able to handle arcsight events, this PR enables that
capability.
Change-Id: Id220c671cc5d7cb7ee33fb53e2ae4185d579fc2a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds retention policy weighting based on experience with the
indexes in production in large scale clouds.
Change-Id: I0d09d4cfc68f70fe790170d5d54f1585616c5524
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The heartbeat probe was making an assumption that the deployment will
always be an OSA one by using the group "utility_all" as a deployment
target. This change moves heartbeat to the first kibana three kibana
nodes by default which corrects the previous assumption.
Change-Id: Ic1b90eb94dd20dc2273542333de47bfd690af1dd
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The following options will reduce cluster pressure and generally
improve search performance.
Change-Id: I1619680db1fd595503f0845b182d6f6ce4c59f3c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The curator retention policies will now query the storage nodes within
a given deployment and set a suitable index retention policy based on
the total amount of storage each index is assumed to produce every day.
To ensure we're minimizing the storage required and optimizing search
performance several actions are now being taken:
* Indexes will be shrunk after a quarter of their retention time.
* Indexes will be deleted should they exceed the retention time.
Change-Id: I8bf548620b5404d25deaadba8fda93452ef64fa0
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Remove spaces in service names, and don't duplicate the protocol as
heartbeat includes these into the monitor.scheme and monitor.id
fields by default.
Change-Id: If7633dd5ca23c22eff37a8b7140fff4bf0911432
These files provide an alternative for those who want their
custom dashboards on kibana. The playbook setupKibanaDashboards.yml
installs elasticdump and uses it to dump into kibana's index a simple
dashboard that collects logs from filebeat.
Change-Id: Ibb3407b1f19eac5f7cda753e00c3bc6f3ff16da7
Now that the v2.0 API has been removed, we don't have a reason to
include deployment instructions for two separate applications on
different ports.
Change-Id: I0c8451207afec77c9a8071ca8035337ffd0ac9f0
This change will allow a deployer to directly ship data from
logstash into Kafka.
Change-Id: I5de0caf270c8ced8111ac099cb91a70814f80259
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
To ensure users can upgrade packages the variable
`"{{ elk_package_state | default('present') }}"` has been added
to all package installs.
Change-Id: I0238d9e1ed991cb1480bd924f2d5a09687890da3
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Elasticsearch can be used as a smart load balancer for all traffic
which will remove the requirement for a VIP and move the cluster to a
mesh topology. All of the Kibana nodes will now run elasticsearch as
cordonator.
* Kibana will now connect to elasticsearch on localhost.
* All of the beats have been setup to use use the new mesh topology.
* jvm memory management has been updated to reflect the additional
services.
More on node assigments can be found here:
* https://www.elastic.co/guide/en/elasticsearch/reference/6.2/modules-node.html#modules-node
* The readme has been updated to reflect these changes.
Change-Id: I769e0251072f5dbde56fcce7753236d37d5c3b19
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This adds the ability to include logstash log parsing filters for
various openstack and service logs. These filters are disabled by
default and can be enabled by toggling the deploy_logstash_filters
variable.
Change-Id: I5c46f78f232d3fb604283ae623cd3975a8346c7c
Several API services use 300 to indicate it's up, this change add the
ability to check for that.
Change-Id: Ic85f6cff3bc225b29ae0e3e8fbd19eceece00441
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
At present we're collecting too much info by default. We're seeing
+500GB on a <50 node environment in just two weeks. While we dont expect
the data set to grow much larger given the use of curator, this change
lowers the default collection intervals of the various beats and updates
the retention / detection policies so we're not storing too much
information.
To correct a unicode problem with py2 the host index loops have been
updated.
Curator has also been updated to run everyday.
Change-Id: Ic202eb19806d1b805fa314d3d8bde05b286740e0
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Curator has been added to automatically maintain the cluster with
sensible defaults when it pertains to data retention.
The index counts have been modified such that they're determined by the
size of the initial cluster. While these shard counts can be modified
post deployment by reindexing the data, it's not something being done at
this time.
Depends-On: https://review.openstack.org/c/565807
Change-Id: I249d715ae5241ab57c4117b14377e4d07cb6e984
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
A deployer may want to run these tools within a legacy environment
(running Ansible <2.4) but will find it the deployment of these
playbooks impossible to due to the use of new-ish task syntax,
roles, and modules. This change gives deployers options when running
within legacy environments by providing everything needed to deploy
these playbooks using embeded ansible.
Change-Id: Ic99b93017129321b2eb8b773a77f7fa478cc8dc7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
* Added options for the rollback plan so that if a rollback is executed
all beat packages will be removed.
* additional updates to streamline elk and fix container bindmounts,
the use of group information for metric and heartbeat information.
* Readme information has been fixed
Change-Id: Icd070259db5b19d289d10033b1f055125f56e18c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The elastic Stack has the ability to get application performance metrics
using the built in APM server. This change implements the APM server in
an existing ELK environment.
Change-Id: Ie6f533b81cfdb0c6a4ba2f33fd3b9f0a3e49a1fc
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
With the option to deploy grafana the following changes allow a user to
automatically connect ELK and Grafana.
Change-Id: Ic8e64a31d860940c6863f46ce558908d5ef8f8e7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change introduces grafana into the stack which gives us a great
way to visualize the data. The grafana role from cloudalchemy is being
used for the bulk of the deployment.
Because the grafana deployment playbook is now standalone the mentions
of grafana in the other ops directories have been removed.
Change-Id: I23e1c96cd1fda7ece9b86a69f9f0326913de714d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Most of the changes in this PR are for style and to adapt the playbooks
so that the system can operate on a multi-node cloud.
Functional change includes the removal of mainline Java 8 in favor of
OpenJDK 8.
A site playbook was add to allow an operator to just run everything.
Old tools that no longer function within the stack have been removed.
Packetbeat was added to the install list
Auditbeat was added to the install list
All of the config files have been updated for the recent ElasticStack
6.x changes.
Change-Id: I01200ad4772ff200b9c5c93f8f121145dfb88170
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This addition is an updated of the curent elk_metrics which will install Elasticsearc, Logstash and Kibana 6.x.
It also include configuration guide for haproxy endpoints
Change-Id: Iac4dec6d17bc75433e5fe672f3b9781536b8e619