openstack-ansible-ops

Author	SHA1	Message	Date
Kevin Carter	814622cc6c	Improve logstash and elasticsearch performance The logstash and elasticsearch performance can be improved by using async index options, pulling back the refresh interval, and by not fingerprinting every document. * Async translog allows elasticsearch to using run fsync in the background instead of blocking * the refresh interval will now be 5x the number of replicas with a cap of 30. This integer is representitive of the seconds between index refresh calls which greatly lowers the load generated across the cluster. * All documents were fingerprinted before writting to the cluster. This was a costly operation as elasticsearch will do a forward lookup on all documents with a preset ID resulting in 100's, if not 1000's, of extra reads. The purpose of the fingerprint function is to limit repeading writes so to keep some of this functionality the fingerprint function is now only added to documents with messages. * G1 garbage collection is now enabled by default when the heap size is > 6GiB. Early versions of elasticsearch did not recommend this setting however its since stabalized in recent releases. * JVM options have been moved into the elasticsearch and logstash roles allowing these tasks to trigger service restarts when changes are made. Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-09-21 21:47:07 -05:00
Kevin Carter	0d4a4a92c7	Converg the logstash pipelines and enhance memory backed queues The multi-logstash pipeline setup, while amazingly fast, was crashing and causing index errors when under high load for a long period of time. Because of the crashing behavior and the fact that the folks from Elastic describe multi-pipeline queues to be "beta" at this time the logstash pipelines have been converted back into a single pipeline. The memory backed queue options are now limited by a ram disk (tmpfs) which will ensure that a burst within the queue does not cause OOM issues and ensures a highly performant deployment and limiting memory usage at the same time. Memory backed queues will be enabled when the underlying system is using "rotational" media as detected by ansible facts. This will ensure a fast and consistent experience across all deployment types. Pipeline/ml/template/dashboard setup has been added to the beat configurations which will ensure beats are properly configured even when running in an isolated deployment and outside of normal operations where beats are generally configured on the first data node. Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-09-16 23:44:58 -05:00
Kevin Carter	1c56b7f034	Add option block to ensure apache2 is enabled correctly The apache2 monitoring process requires a couple interactions to deploy successfully. This change will ensure that if the apache2 monitoring fails, in any way, it does not block the deployment. Change-Id: Ibe35197a1c65f4abe9e4870c07ee15f37f9a58ab Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-08-29 15:39:08 -05:00
Kevin Carter	e4c84aa28d	Add Redhat to the ELK deployment capabilities Change-Id: Id34e046a546f8d0878843596f53e400165e37c6e Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-08-13 18:59:57 -05:00
Kevin Carter	8db0238749	Move most of the variables into the roles Change-Id: I82a48c554c164c7166c1a0d4e3192332af5024fb Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-08-13 03:20:33 +00:00
Zuul	a0780fb582	Merge "Further tune the playbooks, configs, and thread pool"	2018-07-26 20:37:01 +00:00
Kevin Carter	f69d391325	Further tune the playbooks, configs, and thread pool * Implements G1 GC optionally. The variable `elastic_g1gc_enabled` has been added with a default of false. If this option is set true and the system has more than 4GiB of RAM G1GC will be enabled. * Adds new thread options * Better constraints coordination nodes * Interface recover speed has been limited * Buffer size is now set correctly * Serialize elk deployment so that upgrades are non-impacting Change-Id: I89224eeaf4ed29c3bb1d7f8010b69503dbc74e11 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-26 18:43:13 +00:00
Kevin Carter	7b2e56885b	Add arcsight ingestion into logstash Logstash is able to handle arcsight events, this PR enables that capability. Change-Id: Id220c671cc5d7cb7ee33fb53e2ae4185d579fc2a Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-26 13:09:53 -05:00
Jonathan Rosser	39e9905d00	Allow mounting of shared filesystems for index backup/restore Change-Id: I6590bd0b7560fe42bd82d1a8aa7932a45f067ca5	2018-07-25 17:01:32 +01:00
Zuul	72d1de3888	Merge "update default kibana elastic timeout"	2018-07-24 21:37:55 +00:00
Victor Palma	08a5f02a78	update default kibana elastic timeout * set the default elasticsearch request timeout to 60 seconds Change-Id: Ieac2c96315bbbcfe7cc2d2bff42d2ee15f23fb0b	2018-07-24 13:09:25 -05:00
Kevin Carter	f7aad4832f	Update retention policy weighting This change adds retention policy weighting based on experience with the indexes in production in large scale clouds. Change-Id: I0d09d4cfc68f70fe790170d5d54f1585616c5524 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-24 09:46:42 -05:00
Kevin Carter	0ab9d82545	Move heartbeat from utility_all to kibana The heartbeat probe was making an assumption that the deployment will always be an OSA one by using the group "utility_all" as a deployment target. This change moves heartbeat to the first kibana three kibana nodes by default which corrects the previous assumption. Change-Id: Ic1b90eb94dd20dc2273542333de47bfd690af1dd Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-20 16:39:10 -05:00
Zuul	f59c4a76e0	Merge "Tidy Heartbeat service names"	2018-07-17 03:28:53 +00:00
Kevin Carter	7a32b5c9a9	Add additional ES cluster tuning The following options will reduce cluster pressure and generally improve search performance. Change-Id: I1619680db1fd595503f0845b182d6f6ce4c59f3c Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-16 22:52:40 +00:00
Kevin Carter	b6a9a6fc7a	Add dynamic retention policies to curator The curator retention policies will now query the storage nodes within a given deployment and set a suitable index retention policy based on the total amount of storage each index is assumed to produce every day. To ensure we're minimizing the storage required and optimizing search performance several actions are now being taken: * Indexes will be shrunk after a quarter of their retention time. * Indexes will be deleted should they exceed the retention time. Change-Id: I8bf548620b5404d25deaadba8fda93452ef64fa0 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-12 17:03:40 +00:00
Jonathan Rosser	eb893f1776	Tidy Heartbeat service names Remove spaces in service names, and don't duplicate the protocol as heartbeat includes these into the monitor.scheme and monitor.id fields by default. Change-Id: If7633dd5ca23c22eff37a8b7140fff4bf0911432	2018-07-10 09:39:33 +01:00
Kevin Carter	91dbd09353	Tune vars to better support an isolated deployment Change-Id: I93d33bed42976d20919f887ef8096b212a6559a2 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-07-09 23:47:40 -05:00
Zuul	09c412e8b0	Merge "Remove the unused port 35357"	2018-06-27 00:59:43 +00:00
Zuul	a9e2d93ec2	Merge "Add kibana custom dashboard"	2018-06-26 05:30:59 +00:00
Guilherme Steinmuller Pimentel	fde2f649bf	Add kibana custom dashboard These files provide an alternative for those who want their custom dashboards on kibana. The playbook setupKibanaDashboards.yml installs elasticdump and uses it to dump into kibana's index a simple dashboard that collects logs from filebeat. Change-Id: Ibb3407b1f19eac5f7cda753e00c3bc6f3ff16da7	2018-06-26 05:10:09 +00:00
ZhijunWei	38f6164556	Remove the unused port 35357 Now that the v2.0 API has been removed, we don't have a reason to include deployment instructions for two separate applications on different ports. Change-Id: I0c8451207afec77c9a8071ca8035337ffd0ac9f0	2018-06-23 00:07:51 -04:00
Kevin Carter	57756eefe2	Add kafka output plugin to logstash This change will allow a deployer to directly ship data from logstash into Kafka. Change-Id: I5de0caf270c8ced8111ac099cb91a70814f80259 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-06-21 13:09:28 +00:00
Jonathan Rosser	e3eb653b37	Add apm-server to loadbalancer Change-Id: I7442296d0ff984839e7f63ffcf82a77db722b72e	2018-06-18 14:24:56 +00:00
Kevin Carter	778002714c	Add upgrade task options To ensure users can upgrade packages the variable `"{{ elk_package_state \| default('present') }}"` has been added to all package installs. Change-Id: I0238d9e1ed991cb1480bd924f2d5a09687890da3 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-06-14 19:30:29 -05:00
Kevin Carter	bc2937d9c9	Use elasticsearch coordinator nodes as smart LBs Elasticsearch can be used as a smart load balancer for all traffic which will remove the requirement for a VIP and move the cluster to a mesh topology. All of the Kibana nodes will now run elasticsearch as cordonator. * Kibana will now connect to elasticsearch on localhost. * All of the beats have been setup to use use the new mesh topology. * jvm memory management has been updated to reflect the additional services. More on node assigments can be found here: * https://www.elastic.co/guide/en/elasticsearch/reference/6.2/modules-node.html#modules-node * The readme has been updated to reflect these changes. Change-Id: I769e0251072f5dbde56fcce7753236d37d5c3b19 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-06-13 23:37:48 -05:00
Dave Wilde	23ac2aa985	Add logstash filters This adds the ability to include logstash log parsing filters for various openstack and service logs. These filters are disabled by default and can be enabled by toggling the deploy_logstash_filters variable. Change-Id: I5c46f78f232d3fb604283ae623cd3975a8346c7c	2018-06-07 22:13:48 -05:00
Jonathan Rosser	eb73dd6e66	Point metricbeat rabbitmq collector to existing rabbitmq endpoint Change-Id: I9511a1da1a031b4b05bbbb108386cd5b56fd96e9	2018-06-05 12:56:29 +01:00
Jonathan Rosser	62f9508df2	Point metricbeat haproxy collector to existing stats endpoint Change-Id: I36e86746a851d48501bce7f91910761a08d20196	2018-06-05 12:56:28 +01:00
Jonathan Rosser	b2a66c9a18	Convert ELK repo location to a variable so it can be overridden Change-Id: I9e5b78960c891aae7f4e94317647668a77b08c58	2018-05-16 12:34:01 +01:00
Zuul	8ce8da08c8	Merge "update heartbeat vars to check response"	2018-05-12 19:55:34 +00:00
Kevin Carter	422b13fd86	update heartbeat vars to check response Several API services use 300 to indicate it's up, this change add the ability to check for that. Change-Id: Ic85f6cff3bc225b29ae0e3e8fbd19eceece00441 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-05-11 16:10:23 -05:00
Kevin Carter	846a90d025	Tune down the collection intervals and default retention policy At present we're collecting too much info by default. We're seeing +500GB on a <50 node environment in just two weeks. While we dont expect the data set to grow much larger given the use of curator, this change lowers the default collection intervals of the various beats and updates the retention / detection policies so we're not storing too much information. To correct a unicode problem with py2 the host index loops have been updated. Curator has also been updated to run everyday. Change-Id: Ic202eb19806d1b805fa314d3d8bde05b286740e0 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-05-11 16:04:14 +00:00
Kevin Carter	0c41b0fd70	Add curator and dynamic shard counts Curator has been added to automatically maintain the cluster with sensible defaults when it pertains to data retention. The index counts have been modified such that they're determined by the size of the initial cluster. While these shard counts can be modified post deployment by reindexing the data, it's not something being done at this time. Depends-On: https://review.openstack.org/c/565807 Change-Id: I249d715ae5241ab57c4117b14377e4d07cb6e984 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-05-02 12:11:30 -05:00
Kevin Carter	4e0c30ed16	Add documentation and tooling for legacy environments A deployer may want to run these tools within a legacy environment (running Ansible <2.4) but will find it the deployment of these playbooks impossible to due to the use of new-ish task syntax, roles, and modules. This change gives deployers options when running within legacy environments by providing everything needed to deploy these playbooks using embeded ansible. Change-Id: Ic99b93017129321b2eb8b773a77f7fa478cc8dc7 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-05-01 19:06:38 -05:00
Kevin Carter	49f63cabae	cleanup heartbeat config Change-Id: Iea30a4187e93fce252c603d4e188b2e475672b32 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-27 17:08:10 -05:00
Kevin Carter	ac286b0ac3	Update rollback plan and configs * Added options for the rollback plan so that if a rollback is executed all beat packages will be removed. * additional updates to streamline elk and fix container bindmounts, the use of group information for metric and heartbeat information. * Readme information has been fixed Change-Id: Icd070259db5b19d289d10033b1f055125f56e18c Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-26 16:41:51 -05:00
Kevin Carter	903b995d32	Add APM Server The elastic Stack has the ability to get application performance metrics using the built in APM server. This change implements the APM server in an existing ELK environment. Change-Id: Ie6f533b81cfdb0c6a4ba2f33fd3b9f0a3e49a1fc Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-16 08:36:48 -05:00
Kevin Carter	390314e18b	Add variables to connect ELK and Grafana With the option to deploy grafana the following changes allow a user to automatically connect ELK and Grafana. Change-Id: Ic8e64a31d860940c6863f46ce558908d5ef8f8e7 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-13 23:08:31 -05:00
Kevin Carter	969a30c6c7	Add grafana This change introduces grafana into the stack which gives us a great way to visualize the data. The grafana role from cloudalchemy is being used for the bulk of the deployment. Because the grafana deployment playbook is now standalone the mentions of grafana in the other ops directories have been removed. Change-Id: I23e1c96cd1fda7ece9b86a69f9f0326913de714d Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-13 10:31:34 -05:00
Kevin Carter	17fb37f075	Update elk 6.x playbooks Most of the changes in this PR are for style and to adapt the playbooks so that the system can operate on a multi-node cloud. Functional change includes the removal of mainline Java 8 in favor of OpenJDK 8. A site playbook was add to allow an operator to just run everything. Old tools that no longer function within the stack have been removed. Packetbeat was added to the install list Auditbeat was added to the install list All of the config files have been updated for the recent ElasticStack 6.x changes. Change-Id: I01200ad4772ff200b9c5c93f8f121145dfb88170 Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>	2018-04-11 03:11:44 -05:00
Per Abildgaard Toft	48e2b8e998	Updatev version of ELK stack for openstack ansible This addition is an updated of the curent elk_metrics which will install Elasticsearc, Logstash and Kibana 6.x. It also include configuration guide for haproxy endpoints Change-Id: Iac4dec6d17bc75433e5fe672f3b9781536b8e619	2018-03-06 14:21:23 +00:00

42 Commits