896 Commits

Author SHA1 Message Date
Zuul
439b269525 Merge "Enable setting mnaio disk size by pxe server group" 2018-09-25 16:10:00 +00:00
Zuul
907a5ff7c7 Merge "Use group vars to reduce redundancy in host vars" 2018-09-25 16:10:00 +00:00
Zuul
3195d3bf06 Merge "Change VM definition in deploy-vm mnaio playbook" 2018-09-25 16:09:59 +00:00
Zuul
b2c9cf4221 Merge "Add host metadata to core beats output" 2018-09-25 15:59:17 +00:00
Zuul
27ef77f601 Merge "Add tags for beats setup tasks" 2018-09-25 15:24:03 +00:00
Zuul
a9482e571b Merge "Convert refresh fact to strings" 2018-09-25 15:21:06 +00:00
Jonathan Rosser
1a48236ced Add host metadata to core beats output
This change adds fields such as host OS, version and platform to the
core beats output, giving extra query/filter capabilities.

Change-Id: Iff61bb4402eaa45b8f1c134a6a39cebe6613cbf3
2018-09-25 13:34:18 +01:00
Jonathan Rosser
ac46b2be6a Fix journalbeat installation for mixed environments
The previous code would terminate the play immediatley if any hosts
in the environment did not have a journal directory. This change runs
the journalbeat install role selectively on hosts that have the journal
directory, and skips hosts that do not.

In addition a legacy task to stop the play after uninstallation is removed,
this functionality is currently broken.

Change-Id: I412e3594c4b2292caafafb580bb4ede9ccfd3944
2018-09-25 12:34:21 +01:00
Jonathan Rosser
8cf20bfea2 Add tags for beats setup tasks
Previously the beat setup tasks were tagged with 'setup' but the include
statements were not, so the tasks were always skipped when using '--tags
setup'. This change adds tags to the includes so that the tasks are executed
as expected

Change-Id: If16069cd273d84a22b229b8140e5a8d56eed86d1
2018-09-25 12:30:22 +01:00
Zuul
af4e551c09 Merge "Fix Kibana dashboard uploading for mirrors and proxies" 2018-09-25 03:10:53 +00:00
Zuul
a7c581e8b5 Merge "Add the pip package hvac to support hashicorp vault" 2018-09-25 03:10:27 +00:00
Kevin Carter
dfc919bb0e
Convert refresh fact to strings
While the use of "to_json" resulted in a string it also created a JSON
escaped a string full of slashes which then corrupts kibana index patterns.

Change-Id: I2c26ab9dd4930226f3e554c2f9bed5e382cdafa5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-24 21:53:48 -05:00
Zuul
40fd4f2433 Merge "Remove PrivateDevices" 2018-09-24 22:39:59 +00:00
Zuul
0e4dc162b4 Merge "Increase beat input inactivity timeout" 2018-09-24 20:30:43 +00:00
Zuul
3b086c860a Merge "Clean-up filtering for API requests" 2018-09-24 20:30:43 +00:00
Zuul
144ec7628d Merge "Create filter for contextual logs" 2018-09-24 20:30:42 +00:00
Jonathan Rosser
bc374b8688 Fix Kibana dashboard uploading for mirrors and proxies
To upload the Kibana dashboard behind http proxies we need to

* Point to a local mirror for nodejs if one is defined
* Install nodejs directly with apt rather than the magic script
* Be explicit about no_proxy when uploading the dashboard

This change also uploads the dashboard only once rather than on
each elastic-logstash node.

Change-Id: I4695d6fe6f85d9120f83abc9a92c54ac3ad68c95
2018-09-24 19:55:21 +00:00
Zuul
fb46c5a0f9 Merge "Use correct parsed timestamp" 2018-09-24 19:52:59 +00:00
Zuul
fde22a11aa Merge "Fix OpenStack Log Visualizations" 2018-09-24 19:52:58 +00:00
Kevin Carter
76aa98423e
Remove PrivateDevices
The PrivateDevices option, in LXC containers, causes more problems than
it solves. This change removes that option generally so that its no
longer a problem no matter how the installation of the ELK services is
done.

Change-Id: I7f881ac0da9eb6154f8e6b977df815736ac04264
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-24 13:16:47 -05:00
Kevin Carter
a98074ea74
Add the pip package hvac to support hashicorp vault
The hashicorp plugin for vault requires the hvac library. This change
adds the required lib to the embeded ansible.

Change-Id: If24767a647bc3fb359e67bac46ca4a626bbc6e54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-24 11:09:22 -05:00
Jonathan Rosser
e8966f4568 Increase beat input inactivity timeout
This defaults to 60s, and there is a constant background of errors making
noise in the logs of the form:

ERROR pipeline/output.go:121 Failed to publish events: write tcp 10.11.3.128:40770->10.11.128.94:5044: write: connection reset by peer

Advice seems to be to increase client_inactivity_timout [1]. Tested in one
environment and the log noise is gone.

[1] https://discuss.elastic.co/t/solved-filebeat-logstash-connection-reset-by-peer/87012

Change-Id: Ia93867826a0c32192e3c37ea101f9a95a29e3d00
2018-09-24 13:34:09 +01:00
Guilherme Steinmüller
baf16d19da Fix OpenStack Log Visualizations
The json visualization templates are limited to filebeat-*
index-pattern. To avoid incompatibility between versions
of kibana, this patch aims to set the index-pattern of the
openstack visualizations as the global index-pattern *.

Change-Id: I2c41b94b0a9c323ecac47db0dd1e4075ad6a432a
2018-09-24 00:03:24 +00:00
Mohammed Naser
db6533481a Clean-up filtering for API requests
This updates all of the pipelines for most projects API requests
to provide cleaner information.

Change-Id: I5cb20a6c104b25d365fe03e4086272fa2965846a
2018-09-23 18:52:35 -04:00
Mohammed Naser
17c3563e27 Create filter for contextual logs
The oslo.log has a default pattern for logging all of the entries
with context, so let's use that in a common place to avoid duplicating
all the information.

Change-Id: I7f326221c01f53710f3adbc5fc2d416bec6aef8f
2018-09-23 17:35:44 -04:00
Mohammed Naser
72acd46a31 Use correct parsed timestamp
At the moment, we're adding an extra field called "logdate" rather
than using the built-in timestamp.  This makes things go to the
right field.

Change-Id: I5e56d01692b7205418e6aba89d1c7c44fa1abfef
2018-09-23 17:25:49 -04:00
Mohammed Naser
eb4e6731b5 Drop oslofmt tag from checks
The filebeat does not ship anything tagged with oslofmt, the
openstack tag gives us all we need to parse things correctly.

Change-Id: I614e4bc5d85559540a9d616407da993ed90de87e
2018-09-23 16:48:11 -04:00
Mohammed Naser
48d7b08773 Drop extra Ceph messages
Ceph has a problem where logs that were introduced which are
debug messages are logged as normal.  They cause a lot of extra
useless messages and overloading ELK cluster

This was fixed in 12.2.9 which is not out yet, so let's work
around it for now by avoiding shipping it.

Change-Id: I36a503b7380ce62c65570232a18d2179a98ecfa1
2018-09-23 13:48:14 -04:00
Mohammed Naser
12c9687437 Add ARM64 support
This patch adds support for ARM64 beats.  Unfortunately, Elastic does
not publish any packages, so this points at local builds.  Also, it
looks like Packetbeat fails to builds so for now we just don't do
anything about it on ARM.

Change-Id: I1889ce51f1a4c13c311165b8b76dde7c71ecfa2d
2018-09-23 16:17:04 +00:00
Kevin Carter
814622cc6c
Improve logstash and elasticsearch performance
The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.

* Async translog allows elasticsearch to using run fsync in the
  background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
  of 30. This integer is representitive of the seconds between index
  refresh calls which greatly lowers the load generated across the
  cluster.
* All documents were fingerprinted before writting to the cluster. This
  was a costly operation as elasticsearch will do a forward lookup on all
  documents with a preset ID resulting in 100's, if not 1000's, of extra
  reads. The purpose of the fingerprint function is to limit repeading
  writes so to keep some of this functionality the fingerprint function is
  now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
  > 6GiB. Early versions of elasticsearch did not recommend this setting
  however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
  allowing these tasks to trigger service restarts when changes are made.

Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-21 21:47:07 -05:00
Zuul
daffc177a1 Merge "Add more optionality when customizing node roles" 2018-09-21 16:31:54 +00:00
Zuul
28df8729c8 Merge "Add playbook to automatically refresh fields in kibana" 2018-09-21 16:31:53 +00:00
Zuul
f5f581aa0e Merge "Add variable to define a beat service state" 2018-09-20 21:18:33 +00:00
Guilherme Steinmüller
7430f6c8d5 Add variable to define a beat service state
This patch aims to provide the user a way to enable/disable
beats by overriding {beatname}_service_state variable accordingly
to the beats that the users wants to be receiving data.

There are some use cases that users just wants a subset of the
beats provided, mostly to avoid unecessary use of bandwidth
with data that woudn't be used. So the way that this patch proposes
this use case is just enable/disable after install, keeping the service
installed in case of the users needs it.

Change-Id: I2251095d7fcfc48a239fe9d4984269503cc835da
2018-09-20 16:27:20 +00:00
Kevin Carter
1f9171082e Add more optionality when customizing node roles
The node roles would apply attributes to hosts if an override was set or
if a node was part of a given group as determined through auto-detection.
This change will now add nodes to a given role when set manually and
will ensure no extra nodes are added to the role if the count meets or
exceeds what's required to run the service.

Change-Id: Ied5f564f0328488d3359ec4dc8e9ad17fefe5eaf
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 16:04:47 +00:00
Kevin Carter
218944a4e5
Add playbook to automatically refresh fields in kibana
When upgrading or updating a template fields within kibana will not be
updated until they're manually refreshed. This change uses the kibana
API to gather field information from the indexes and update kibana
automatically.

Change-Id: Ia5de566521d79da070f4377d1d7cb4d9786447b4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 10:27:19 -05:00
Kevin Carter
10ffc96ab1
Update monitoring index for replicas
Ensures that any monitoring indexes are made with replicas in a custer
setup, which will ensure we're able to monitor the growth of ES indexes.

The curator action plugin timer was updated to use two different timer
files instead of combining them into one timer.

Change-Id: I2184ac4ec0b75e442ee8ae6ca8bd2c6f04d51401
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 00:35:17 -05:00
Zuul
b0ead67f54 Merge "Convert the curator action file into multiple files" 2018-09-20 00:18:27 +00:00
Kevin Carter
bebab50f10 Convert the curator action file into multiple files
The curator action plugin does not use a logical OR when parsing
multiple filters. The only way to do this is to run curator with
different action filter files.

Change-Id: I97c93c87d6254f79831f2a177098ea52a3a3a49d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-19 12:06:23 -05:00
Dave Wilde
e4bd1fdaed MNAIO ELK Updates
* We don't need to create the containers as they are created during the 
initial run.

* Remove quoting in favor of {% raw %} blocks

Change-Id: Ied696ad0882169d523a60a900788e7c2ba1d3fa3
2018-09-19 10:52:32 -05:00
Zuul
94d8f09b74 Merge "Make openstack-service-setup compatible with older ansible" 2018-09-19 06:58:35 +00:00
Zuul
8f59a6f97c Merge "Tune-up logrotate config" 2018-09-18 22:32:49 +00:00
Zuul
cf2e5dbdc3 Merge "Add capability to set node role" 2018-09-18 22:32:49 +00:00
Kevin Carter
3c96804a87
Tune-up logrotate config
The log rotate configuration was leaving too many logs in place and
allowing them to grow too large. This tunes up the logrotation process
to ensure we're retaining information but not excessively.

Change-Id: If0f02352ee2c274f4c589b05630d28126ceba2ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-18 13:20:27 -05:00
Kevin Carter
0b0efcb841
Add capability to set node role
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.

Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-18 12:35:06 -05:00
Zuul
772dad3543 Merge "Deploy ELK in MNAIO" 2018-09-17 18:00:59 +00:00
Wayne Warren
53fe850aa3 Make openstack-service-setup compatible with older ansible
This change allows this playbook to be run using an older version of
ansible. This change is necessary for my use case where I am running all
OSA and related playbooks in a docker container locally for a Newton
deploy.

The use of Newton OSA's ansible bootstrap script means that the
openstack-ansible my workflow uses requires Ansible 2.1, which does not
support `include_tasks`. This change addresses that problem by replacing
`include_tasks` in the playbook that needs to be run using
openstack-ansible with `include` which produces the desired result.

Change-Id: I8b2a0217e851d022ee40cbdd8bc8045e18d5a07d
2018-09-17 11:57:14 -05:00
Wayne Warren
2bcbb26215 Enable setting mnaio disk size by pxe server group
Also, halve the loadbalancer default disk size; this is the only group
I'm somewhat confident doesn't need 90 GB of disk.

Change-Id: I40b46c8d978cdefbed8c4cd5586c7ded0fe318dc
2018-09-17 09:26:31 -05:00
Wayne Warren
0f5555328d Use group vars to reduce redundancy in host vars
This is essentially a noop that establishes a healthy DRY pattern of
group var definitions in the pxe_server subgroups.

Change-Id: Ie458f0684f720879c21ca639320814ef12d5dc4e
2018-09-17 09:26:31 -05:00
Wayne Warren
18dfdd8895 Change VM definition in deploy-vm mnaio playbook
... to allow usage of host vars and group vars by looping the part of
the play that reifies the VM templates over pxe_servers rather than
vm_hosts. Instead of looping over pxe_servers hosts on each task, loop
over and delegate to vm_hosts on each task.

What this makes possible is the definition of default_vm_storage on a
per-host basis. The specific use case I have in mind is to allocate a
larger share of VG space to compute nodes without artificially limiting
how much space the compute nodes can have by bloating every other VM
in proportion.

Change-Id: I5f9a14038d59af7740acb64cce7f83fd88e5555a
2018-09-17 09:26:27 -05:00