205 Commits

Author SHA1 Message Date
Kevin Carter
0d4a4a92c7
Converg the logstash pipelines and enhance memory backed queues
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.

The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.

Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.

Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-16 23:44:58 -05:00
Zuul
be70a2078c Merge "Set the max user watches to 1M" 2018-09-16 03:05:37 +00:00
Kevin Carter
a98035e177
Correct elasticsearch list entropy
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.

Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-12 13:13:13 -05:00
Zuul
42f7f896b4 Merge "Enforce no_proxy when setting up ELK dashboards and rollups" 2018-09-10 22:15:48 +00:00
Jonathan Rosser
1b267c475c Ensure logstash listens on ipv4 address
Upgrading the ELK stack to 6.4.0 leaves logstash only listening on
an ipv6 address and thereby unable to receive existing beats inputs.

This change makes the jvm prefer binding to ipv4 addresses.

Change-Id: I04a0fdbcb253a0a6a3bcc3759eb0b9d0f1962621
2018-09-10 21:14:21 +00:00
Jonathan Rosser
c2d3c44fd8 Enforce no_proxy when setting up ELK dashboards and rollups
There is no guarantee that all container IP addressess will be included
in an existing no_proxy environment variable. This will cause failures
when an http proxy is configured, but the proxy does not allow traffic
to 'hairpin' back to internal addresses.

This change forces no_proxy to the specific address of the kibana
and coordinator endpoints when the uri module is used to load dashboards
and configure rollups.

Change-Id: I669334c722cce79459b522e6e2d7e1aaec49ef24
2018-09-10 21:14:11 +00:00
Kevin Carter
bb4954b598 Set the max user watches to 1M
This increases the default value on elastic hosts from 32k to 1M which
improves general stability, especially on high traffic hosts.

Change-Id: I18f3e7005d2798dd4008215c7aa949cc37084f5c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-10 21:12:14 +00:00
Zuul
31d0cff14c Merge "Allow using custom publish host" 2018-09-08 20:53:51 +00:00
Mohammed Naser
b1ffb5e36c Allow using custom publish host
Change-Id: I232bca0854e2a7cd19304bc1301dff73e15c9bda
2018-09-03 15:48:58 +00:00
Mohammed Naser
4933114a94 Refactor templates to use a single macro template file
It looks like when using a normal include, Ansible tries to include
all sorts of extra things, further breaking the environment by trying
to resolve unaccessible variables.

This patch refactors all of those includes to macros therefore including
a single file and also decreasing the memory usage to avoid copying
over the entire context.

Change-Id: Ie8733c7d52b1fc5bde484855988bddf6a06dbe00
2018-09-02 20:04:28 +00:00
Kevin Carter
1c56b7f034
Add option block to ensure apache2 is enabled correctly
The apache2 monitoring process requires a couple interactions to deploy
successfully. This change will ensure that if the apache2 monitoring
fails, in any way, it does not block the deployment.

Change-Id: Ibe35197a1c65f4abe9e4870c07ee15f37f9a58ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-29 15:39:08 -05:00
Kevin Carter
111fda8d87
Add device monitoring in containers
The existing device monitoring was relying on block UUIDs which are
not always populated within a container. This change converts the UUID
use to device path which is always available no matter container engine
type.

Change-Id: Iec1493a58d9aab8b2a9cb67b9202e1751606bbc4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-29 12:20:00 -05:00
Kevin Carter
3a67604fe3
add missing environment entry-point
Change-Id: I4c016219189c90437df43afcb1d43a9476e17ad0
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-28 00:01:00 -05:00
Kevin Carter
ce9007cda5
Thread pools should be based on processor counts
The current setup was using processor cores from ansible facts which in
a multi-core, single socket system could result in 1. Using the
processor count will return the logical processor count giving us a more
performant setup when the compute power is present.

Change-Id: Ia5b63d45691f58e848d05cc4a4e5f353b993a347
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-25 01:51:16 -05:00
Kevin Carter
42603ab112
When collecting KVM data use a local connection
The KVM collector would attempt to use a network connection which
normally results in failure. This changes the collector to use a file
system socket by default.

Change-Id: Id1698a95644c6a6d5102e371a7266794196393c8
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-25 00:02:00 -05:00
Kevin Carter
d69c75e8e4
Add FS detection in metricbeat for containers
When deploying metricbeat there are some container environments that
will have a mount point within the container that may differ from that
of the host; like in the case of an LXC container with an LVM backed
bind mount or an NFS share. This change will now check if the filesystem
within a container has any mounts and compare the return to that of the
provided physical host if the list is >0 the fs checks will be deployed
within the contianer environment.

Change-Id: Iae5827f4e7e0a85eb733128b54d6ef4c8721537a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-17 16:24:47 -05:00
Kevin Carter
57754a4346 Make journalbeat install detect if it should install
When running on a system where there is no journal or no systemd skip
the journalbeat playbooks.

Change-Id: I92c804e8eb2ab2f9b86eca09fc51d19be66c7190
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-16 04:45:53 +00:00
Kevin Carter
cd299ee1ce
Extend the embedded bootstap process
The embedded bootstrap process can support all of our OS's, this change
ensures thats possible.

Change-Id: I730bb775aa5e9f87609ea885142d7361203cbb2c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-15 23:25:53 -05:00
Kevin Carter
d62666b455
Add basic index templates
The basic index template will ensure that the replication policy is
enforced on the revolving indexes we know will exist.

Change-Id: I1e3edcfd00a73cbdd328d50e8ba6492ac2248b72
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-15 21:35:31 -05:00
Kevin Carter
6da0fca375 Update curator to better metric storage
Now that the roll-up has been implemented the original shrink method is
no longer required or useful. This change cleans up things up.

Change-Id: I24fd5b4daafc2f48ee5a3421f6b58b157a7aff6c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-14 22:37:28 +00:00
Kevin Carter
e4c84aa28d
Add Redhat to the ELK deployment capabilities
Change-Id: Id34e046a546f8d0878843596f53e400165e37c6e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-13 18:59:57 -05:00
Kevin Carter
bf6a8d85e7
Add SUSE support
This change adds SUSE 42.3 support to the elastic telemetry solutions.

Change-Id: Ibe93ea0d1ead9e7fe6da16d89989cfe5ade0f43e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-12 23:53:23 -05:00
Kevin Carter
8db0238749 Move most of the variables into the roles
Change-Id: I82a48c554c164c7166c1a0d4e3192332af5024fb
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-13 03:20:33 +00:00
Kevin Carter
45df59ed7e
move the bulk to templates into the new roles
This change will help with organization throughout the stack.

Change-Id: I2ad865db534ae1d377bbdecd4b421ee0fc802536
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-12 22:19:19 -05:00
Zuul
8d08863b42 Merge "Add syslog input into logstash" 2018-08-11 19:30:28 +00:00
Kevin Carter
b9fa34d42e Add syslog input into logstash
The new option logstash_syslog_input_enabled has been added which will
allow users to enable a direct syslog input. When enabled, messages will
be processed via logstash and sent directly to elasticsearch.

Change-Id: Icb7712ecb8aae3d7f99df80ae1c5cd647a15ce83
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-11 03:08:32 -05:00
Kevin Carter
c1b2a863ca
Adds a job to test ELK using a clustered setup
This job will created an ELK cluster using nspawn containers.

Change-Id: I11eefee65cf738b9915ccab9c0470538ef1b2cec
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-11 00:48:37 -05:00
Kevin Carter
79c3a3cf93
Add trusty support to the project
This change adds Ubuntu 14.04 support to the project.

Change-Id: I20695e19409b63c6e1def4ccf8929c6d52be647e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-11 00:00:03 -05:00
Kevin Carter
6aa88dd7b7
Add bionic job to elk_metrics_6x testing
Change-Id: I67bbfa116c45a82eb8b5bc191d19d203493f0b00
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-09 00:44:32 -05:00
Kevin Carter
3a0b3d2cde
Convert playbooks into roles
This change adds the scaffolding required to get multi-distro support
running in the roles. The change breaks up our playbooks converting all
of the tasks into various roles with internal dependencies. While this
will improve execution time, the change is being done to reduce boiler
plate and to allow us to build on the pattern used in OSA to provide
multi-distro capabilities.

A side effect of this change is a major improvement in idempotency. The
playbooks should now be 100% idempotent.

All of the templates have been left in the main playbook directory. This
was done to help ease the transition. In a future PR the template
structure will be moved into the roles where it needs to be.

The main variable files has been left intact. This file will be carved
up into role defaults in a future PR.

Change-Id: I938a10564128ce4078fa12edcf614dcdbd684b25
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-09 00:41:05 -05:00
Kevin Carter
ca23f9a987 Update index retention policy to ensure its an int
The current variable, when there's only one host will result in 'false'
instead of 0, which is a jinja-ism however due to java-ism's "false" is
evaluated to 5 and that makes the index retention policy very wrong.

Change-Id: I2668e17c1cf15fe47842ff349ffa4f71c70257e5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-03 02:21:02 +00:00
Zuul
200e1dbe4c Merge "Convert logstash groks to a multi-pipeline setup" 2018-07-31 07:15:15 +00:00
Zuul
050fdbc880 Merge "Make index retention calculation pluggable" 2018-07-28 08:29:46 +00:00
Kevin Carter
b6343c57a4
Convert logstash groks to a multi-pipeline setup
The logstash groks were running in line using the legacy method which uses
lexical sorting of all logstash filter files and loads them in order. While
this works it makes it so all data has to travel through all filters.
This change makes use of the logstash multi-pipeline capabilities
using a distributor and fork pattern. This allows data to flow through
logstash more quickly and not block whenever there's an issue with an
output plugin.

Finger-prints using SHA1 when there's a message and UUID when not. This
will ensure we're duplicating log entries which will help speed up
transations and further reduce the storage required.

Change-Id: I38268e33b370da0f1e186ecf65911d4a312c3e6a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-27 12:04:05 -05:00
Jonathan Rosser
eeb2e5a853 Make index retention calculation pluggable
This allows alternate algorithms to be developed and enables the
same tasks to be included in a test to verify the planned retention
does not exceed the cluster storage capacity.

Change-Id: Ie3d80d6cfad16b946ccd790859bc7cd92b90fdef
2018-07-26 21:47:44 +01:00
Zuul
a0780fb582 Merge "Further tune the playbooks, configs, and thread pool" 2018-07-26 20:37:01 +00:00
Kevin Carter
f69d391325 Further tune the playbooks, configs, and thread pool
* Implements G1 GC optionally. The variable `elastic_g1gc_enabled` has
  been added with a default of false. If this option is set true and the
  system has more than 4GiB of RAM G1GC will be enabled.
* Adds new thread options
* Better constraints coordination nodes
* Interface recover speed has been limited
* Buffer size is now set correctly
* Serialize elk deployment so that upgrades are non-impacting

Change-Id: I89224eeaf4ed29c3bb1d7f8010b69503dbc74e11
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-26 18:43:13 +00:00
Kevin Carter
7b2e56885b Add arcsight ingestion into logstash
Logstash is able to handle arcsight events, this PR enables that
capability.

Change-Id: Id220c671cc5d7cb7ee33fb53e2ae4185d579fc2a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-26 13:09:53 -05:00
Jonathan Rosser
39e9905d00 Allow mounting of shared filesystems for index backup/restore
Change-Id: I6590bd0b7560fe42bd82d1a8aa7932a45f067ca5
2018-07-25 17:01:32 +01:00
Zuul
506518a638 Merge "Correct kafka template file name" 2018-07-25 02:17:21 +00:00
Kevin Carter
4ab532b391 Correct kafka template file name
Change-Id: I7a658b31a2163519b1fdf7abb46cadbefcb4369a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-24 20:32:56 -05:00
Kevin Carter
720c608011 Add unified setup and beat config
The beats were all enabled with the default settings to enable
dashboards and templating when the specific beat starts. In a large
scale environment this creates a DDOS against Kibana as all beats begin
uploading templates and dashboards clobbering one another. This change
moves the dashboard config into a common template and sets everything
using sane defaults so that we're not inadvertently killing our clusters
when rolling restarts happen, like in the event of an upgrade.

Change-Id: Ib48ea34a350335b72c3e3df941853c405072446a
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-24 18:34:44 -05:00
Zuul
230ea3a7bf Merge "Add xpack monitoring to the beats" 2018-07-24 21:40:37 +00:00
Zuul
72d1de3888 Merge "update default kibana elastic timeout" 2018-07-24 21:37:55 +00:00
Zuul
1957e959b3 Merge "Update task name" 2018-07-24 21:15:51 +00:00
Zuul
aaefe5c2db Merge "Add logrotate to the elasticstack" 2018-07-24 20:29:29 +00:00
Zuul
b62304f6be Merge "Update retention policy weighting" 2018-07-24 19:55:46 +00:00
Kevin Carter
c2c85215e6 Add xpack monitoring to the beats
This change adds the xpack monitoring capabilities to all of the core
beats we deploy.

Change-Id: Ib09388b83e18953cb180cdb93fec34e5917cc82c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-24 14:16:44 -05:00
Victor Palma
08a5f02a78 update default kibana elastic timeout
* set the default elasticsearch request timeout to 60 seconds

Change-Id: Ieac2c96315bbbcfe7cc2d2bff42d2ee15f23fb0b
2018-07-24 13:09:25 -05:00
Kevin Carter
9a20bfc49b Update task name
Change-Id: Ia0eb105a6200104590ffb9c2f35b93a96ffb66ef
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-07-24 12:43:25 -05:00