The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.
The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.
Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.
Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.
Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Upgrading the ELK stack to 6.4.0 leaves logstash only listening on
an ipv6 address and thereby unable to receive existing beats inputs.
This change makes the jvm prefer binding to ipv4 addresses.
Change-Id: I04a0fdbcb253a0a6a3bcc3759eb0b9d0f1962621
There is no guarantee that all container IP addressess will be included
in an existing no_proxy environment variable. This will cause failures
when an http proxy is configured, but the proxy does not allow traffic
to 'hairpin' back to internal addresses.
This change forces no_proxy to the specific address of the kibana
and coordinator endpoints when the uri module is used to load dashboards
and configure rollups.
Change-Id: I669334c722cce79459b522e6e2d7e1aaec49ef24
This increases the default value on elastic hosts from 32k to 1M which
improves general stability, especially on high traffic hosts.
Change-Id: I18f3e7005d2798dd4008215c7aa949cc37084f5c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
It looks like when using a normal include, Ansible tries to include
all sorts of extra things, further breaking the environment by trying
to resolve unaccessible variables.
This patch refactors all of those includes to macros therefore including
a single file and also decreasing the memory usage to avoid copying
over the entire context.
Change-Id: Ie8733c7d52b1fc5bde484855988bddf6a06dbe00
The apache2 monitoring process requires a couple interactions to deploy
successfully. This change will ensure that if the apache2 monitoring
fails, in any way, it does not block the deployment.
Change-Id: Ibe35197a1c65f4abe9e4870c07ee15f37f9a58ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The existing device monitoring was relying on block UUIDs which are
not always populated within a container. This change converts the UUID
use to device path which is always available no matter container engine
type.
Change-Id: Iec1493a58d9aab8b2a9cb67b9202e1751606bbc4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The current setup was using processor cores from ansible facts which in
a multi-core, single socket system could result in 1. Using the
processor count will return the logical processor count giving us a more
performant setup when the compute power is present.
Change-Id: Ia5b63d45691f58e848d05cc4a4e5f353b993a347
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The KVM collector would attempt to use a network connection which
normally results in failure. This changes the collector to use a file
system socket by default.
Change-Id: Id1698a95644c6a6d5102e371a7266794196393c8
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When deploying metricbeat there are some container environments that
will have a mount point within the container that may differ from that
of the host; like in the case of an LXC container with an LVM backed
bind mount or an NFS share. This change will now check if the filesystem
within a container has any mounts and compare the return to that of the
provided physical host if the list is >0 the fs checks will be deployed
within the contianer environment.
Change-Id: Iae5827f4e7e0a85eb733128b54d6ef4c8721537a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When running on a system where there is no journal or no systemd skip
the journalbeat playbooks.
Change-Id: I92c804e8eb2ab2f9b86eca09fc51d19be66c7190
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The embedded bootstrap process can support all of our OS's, this change
ensures thats possible.
Change-Id: I730bb775aa5e9f87609ea885142d7361203cbb2c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The basic index template will ensure that the replication policy is
enforced on the revolving indexes we know will exist.
Change-Id: I1e3edcfd00a73cbdd328d50e8ba6492ac2248b72
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Now that the roll-up has been implemented the original shrink method is
no longer required or useful. This change cleans up things up.
Change-Id: I24fd5b4daafc2f48ee5a3421f6b58b157a7aff6c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds SUSE 42.3 support to the elastic telemetry solutions.
Change-Id: Ibe93ea0d1ead9e7fe6da16d89989cfe5ade0f43e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change will help with organization throughout the stack.
Change-Id: I2ad865db534ae1d377bbdecd4b421ee0fc802536
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The new option logstash_syslog_input_enabled has been added which will
allow users to enable a direct syslog input. When enabled, messages will
be processed via logstash and sent directly to elasticsearch.
Change-Id: Icb7712ecb8aae3d7f99df80ae1c5cd647a15ce83
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This job will created an ELK cluster using nspawn containers.
Change-Id: I11eefee65cf738b9915ccab9c0470538ef1b2cec
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds Ubuntu 14.04 support to the project.
Change-Id: I20695e19409b63c6e1def4ccf8929c6d52be647e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds the scaffolding required to get multi-distro support
running in the roles. The change breaks up our playbooks converting all
of the tasks into various roles with internal dependencies. While this
will improve execution time, the change is being done to reduce boiler
plate and to allow us to build on the pattern used in OSA to provide
multi-distro capabilities.
A side effect of this change is a major improvement in idempotency. The
playbooks should now be 100% idempotent.
All of the templates have been left in the main playbook directory. This
was done to help ease the transition. In a future PR the template
structure will be moved into the roles where it needs to be.
The main variable files has been left intact. This file will be carved
up into role defaults in a future PR.
Change-Id: I938a10564128ce4078fa12edcf614dcdbd684b25
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The current variable, when there's only one host will result in 'false'
instead of 0, which is a jinja-ism however due to java-ism's "false" is
evaluated to 5 and that makes the index retention policy very wrong.
Change-Id: I2668e17c1cf15fe47842ff349ffa4f71c70257e5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The logstash groks were running in line using the legacy method which uses
lexical sorting of all logstash filter files and loads them in order. While
this works it makes it so all data has to travel through all filters.
This change makes use of the logstash multi-pipeline capabilities
using a distributor and fork pattern. This allows data to flow through
logstash more quickly and not block whenever there's an issue with an
output plugin.
Finger-prints using SHA1 when there's a message and UUID when not. This
will ensure we're duplicating log entries which will help speed up
transations and further reduce the storage required.
Change-Id: I38268e33b370da0f1e186ecf65911d4a312c3e6a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This allows alternate algorithms to be developed and enables the
same tasks to be included in a test to verify the planned retention
does not exceed the cluster storage capacity.
Change-Id: Ie3d80d6cfad16b946ccd790859bc7cd92b90fdef
* Implements G1 GC optionally. The variable `elastic_g1gc_enabled` has
been added with a default of false. If this option is set true and the
system has more than 4GiB of RAM G1GC will be enabled.
* Adds new thread options
* Better constraints coordination nodes
* Interface recover speed has been limited
* Buffer size is now set correctly
* Serialize elk deployment so that upgrades are non-impacting
Change-Id: I89224eeaf4ed29c3bb1d7f8010b69503dbc74e11
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Logstash is able to handle arcsight events, this PR enables that
capability.
Change-Id: Id220c671cc5d7cb7ee33fb53e2ae4185d579fc2a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The beats were all enabled with the default settings to enable
dashboards and templating when the specific beat starts. In a large
scale environment this creates a DDOS against Kibana as all beats begin
uploading templates and dashboards clobbering one another. This change
moves the dashboard config into a common template and sets everything
using sane defaults so that we're not inadvertently killing our clusters
when rolling restarts happen, like in the event of an upgrade.
Change-Id: Ib48ea34a350335b72c3e3df941853c405072446a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds the xpack monitoring capabilities to all of the core
beats we deploy.
Change-Id: Ib09388b83e18953cb180cdb93fec34e5917cc82c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>