872 Commits

Author SHA1 Message Date
Mohammed Naser
4f03c51118 Add Instance ID to logs
This will parse the logs and grab the instance ID out of it.

Change-Id: I9ad0c0e8d6101cca1fc3c4a7cb5cabc3504e6e28
2018-09-27 14:56:52 -04:00
Mohammed Naser
aa647953e0 Refactor Filebeat configuration file
- Avoid checking item by item, we always enable modules and
  prospectors, with an option to disable with opt-in
- Updated MySQL and Apache modules to point to right path
- Improved and clean-up tagging
- All the prospectors are managed using a variable

Change-Id: I2a091669d6a77fd2c89a073cf9071292793e2f6b
2018-09-27 14:54:51 -04:00
Mohammed Naser
db6533481a Clean-up filtering for API requests
This updates all of the pipelines for most projects API requests
to provide cleaner information.

Change-Id: I5cb20a6c104b25d365fe03e4086272fa2965846a
2018-09-23 18:52:35 -04:00
Mohammed Naser
17c3563e27 Create filter for contextual logs
The oslo.log has a default pattern for logging all of the entries
with context, so let's use that in a common place to avoid duplicating
all the information.

Change-Id: I7f326221c01f53710f3adbc5fc2d416bec6aef8f
2018-09-23 17:35:44 -04:00
Mohammed Naser
72acd46a31 Use correct parsed timestamp
At the moment, we're adding an extra field called "logdate" rather
than using the built-in timestamp.  This makes things go to the
right field.

Change-Id: I5e56d01692b7205418e6aba89d1c7c44fa1abfef
2018-09-23 17:25:49 -04:00
Mohammed Naser
eb4e6731b5 Drop oslofmt tag from checks
The filebeat does not ship anything tagged with oslofmt, the
openstack tag gives us all we need to parse things correctly.

Change-Id: I614e4bc5d85559540a9d616407da993ed90de87e
2018-09-23 16:48:11 -04:00
Mohammed Naser
48d7b08773 Drop extra Ceph messages
Ceph has a problem where logs that were introduced which are
debug messages are logged as normal.  They cause a lot of extra
useless messages and overloading ELK cluster

This was fixed in 12.2.9 which is not out yet, so let's work
around it for now by avoiding shipping it.

Change-Id: I36a503b7380ce62c65570232a18d2179a98ecfa1
2018-09-23 13:48:14 -04:00
Mohammed Naser
12c9687437 Add ARM64 support
This patch adds support for ARM64 beats.  Unfortunately, Elastic does
not publish any packages, so this points at local builds.  Also, it
looks like Packetbeat fails to builds so for now we just don't do
anything about it on ARM.

Change-Id: I1889ce51f1a4c13c311165b8b76dde7c71ecfa2d
2018-09-23 16:17:04 +00:00
Kevin Carter
814622cc6c
Improve logstash and elasticsearch performance
The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.

* Async translog allows elasticsearch to using run fsync in the
  background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
  of 30. This integer is representitive of the seconds between index
  refresh calls which greatly lowers the load generated across the
  cluster.
* All documents were fingerprinted before writting to the cluster. This
  was a costly operation as elasticsearch will do a forward lookup on all
  documents with a preset ID resulting in 100's, if not 1000's, of extra
  reads. The purpose of the fingerprint function is to limit repeading
  writes so to keep some of this functionality the fingerprint function is
  now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
  > 6GiB. Early versions of elasticsearch did not recommend this setting
  however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
  allowing these tasks to trigger service restarts when changes are made.

Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-21 21:47:07 -05:00
Zuul
daffc177a1 Merge "Add more optionality when customizing node roles" 2018-09-21 16:31:54 +00:00
Zuul
28df8729c8 Merge "Add playbook to automatically refresh fields in kibana" 2018-09-21 16:31:53 +00:00
Zuul
f5f581aa0e Merge "Add variable to define a beat service state" 2018-09-20 21:18:33 +00:00
Guilherme Steinmüller
7430f6c8d5 Add variable to define a beat service state
This patch aims to provide the user a way to enable/disable
beats by overriding {beatname}_service_state variable accordingly
to the beats that the users wants to be receiving data.

There are some use cases that users just wants a subset of the
beats provided, mostly to avoid unecessary use of bandwidth
with data that woudn't be used. So the way that this patch proposes
this use case is just enable/disable after install, keeping the service
installed in case of the users needs it.

Change-Id: I2251095d7fcfc48a239fe9d4984269503cc835da
2018-09-20 16:27:20 +00:00
Kevin Carter
1f9171082e Add more optionality when customizing node roles
The node roles would apply attributes to hosts if an override was set or
if a node was part of a given group as determined through auto-detection.
This change will now add nodes to a given role when set manually and
will ensure no extra nodes are added to the role if the count meets or
exceeds what's required to run the service.

Change-Id: Ied5f564f0328488d3359ec4dc8e9ad17fefe5eaf
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 16:04:47 +00:00
Kevin Carter
218944a4e5
Add playbook to automatically refresh fields in kibana
When upgrading or updating a template fields within kibana will not be
updated until they're manually refreshed. This change uses the kibana
API to gather field information from the indexes and update kibana
automatically.

Change-Id: Ia5de566521d79da070f4377d1d7cb4d9786447b4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 10:27:19 -05:00
Kevin Carter
10ffc96ab1
Update monitoring index for replicas
Ensures that any monitoring indexes are made with replicas in a custer
setup, which will ensure we're able to monitor the growth of ES indexes.

The curator action plugin timer was updated to use two different timer
files instead of combining them into one timer.

Change-Id: I2184ac4ec0b75e442ee8ae6ca8bd2c6f04d51401
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-20 00:35:17 -05:00
Zuul
b0ead67f54 Merge "Convert the curator action file into multiple files" 2018-09-20 00:18:27 +00:00
Kevin Carter
bebab50f10 Convert the curator action file into multiple files
The curator action plugin does not use a logical OR when parsing
multiple filters. The only way to do this is to run curator with
different action filter files.

Change-Id: I97c93c87d6254f79831f2a177098ea52a3a3a49d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-19 12:06:23 -05:00
Dave Wilde
e4bd1fdaed MNAIO ELK Updates
* We don't need to create the containers as they are created during the 
initial run.

* Remove quoting in favor of {% raw %} blocks

Change-Id: Ied696ad0882169d523a60a900788e7c2ba1d3fa3
2018-09-19 10:52:32 -05:00
Zuul
94d8f09b74 Merge "Make openstack-service-setup compatible with older ansible" 2018-09-19 06:58:35 +00:00
Zuul
8f59a6f97c Merge "Tune-up logrotate config" 2018-09-18 22:32:49 +00:00
Zuul
cf2e5dbdc3 Merge "Add capability to set node role" 2018-09-18 22:32:49 +00:00
Kevin Carter
3c96804a87
Tune-up logrotate config
The log rotate configuration was leaving too many logs in place and
allowing them to grow too large. This tunes up the logrotation process
to ensure we're retaining information but not excessively.

Change-Id: If0f02352ee2c274f4c589b05630d28126ceba2ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-18 13:20:27 -05:00
Kevin Carter
0b0efcb841
Add capability to set node role
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.

Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-18 12:35:06 -05:00
Zuul
772dad3543 Merge "Deploy ELK in MNAIO" 2018-09-17 18:00:59 +00:00
Wayne Warren
53fe850aa3 Make openstack-service-setup compatible with older ansible
This change allows this playbook to be run using an older version of
ansible. This change is necessary for my use case where I am running all
OSA and related playbooks in a docker container locally for a Newton
deploy.

The use of Newton OSA's ansible bootstrap script means that the
openstack-ansible my workflow uses requires Ansible 2.1, which does not
support `include_tasks`. This change addresses that problem by replacing
`include_tasks` in the playbook that needs to be run using
openstack-ansible with `include` which produces the desired result.

Change-Id: I8b2a0217e851d022ee40cbdd8bc8045e18d5a07d
2018-09-17 11:57:14 -05:00
Kevin Carter
0d4a4a92c7
Converg the logstash pipelines and enhance memory backed queues
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.

The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.

Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.

Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-16 23:44:58 -05:00
Zuul
be70a2078c Merge "Set the max user watches to 1M" 2018-09-16 03:05:37 +00:00
Zuul
1d6c01ee57 Merge "MNAIO: Cater for galera bootstrap without a master" 2018-09-14 20:58:02 +00:00
Zuul
35a59a4cb6 Merge "update ironic scripts" 2018-09-12 23:03:00 +00:00
Kevin Carter
a98035e177
Correct elasticsearch list entropy
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.

Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-12 13:13:13 -05:00
jacky06
e5b8b8b13a Replace Chinese punctuation with English punctuation
Curly quotes(Chinese punctuation) usually input from Chinese input
method. When read from english context, it makes some confusion.

Change-Id: I42ea55f2840eed70fe731119b259a5c625071e5b
Closes-Bug: #1792131
2018-09-12 13:09:10 +00:00
Dave Wilde
525887872b Deploy ELK in MNAIO
This enables the deployment of the elk_metrics_6x stack inside of an
MNAIO.

Change-Id: Ie611baee79c33d7cbab9f0865127ac5966475838
2018-09-11 20:17:24 +00:00
Cameron Loader
778ce9895f update ironic scripts
Thee scripts currently use 'ironic' commands, whhich is deprecated. This
patch converts to openstack commands.

Change-Id: I1a16164a7b8e35a61938ec470def37fa52db9edb
2018-09-11 10:34:39 -06:00
Victor Palma
86a2402da9 change osquery defaults
* do not install debuging osquery packages
   * log to filesystem
   * turn off rsyslog

Change-Id: Iae91959847fc7bfd5184d157a44cd994dab397f3
2018-09-11 11:29:44 -05:00
Zuul
42f7f896b4 Merge "Enforce no_proxy when setting up ELK dashboards and rollups" 2018-09-10 22:15:48 +00:00
Jonathan Rosser
1b267c475c Ensure logstash listens on ipv4 address
Upgrading the ELK stack to 6.4.0 leaves logstash only listening on
an ipv6 address and thereby unable to receive existing beats inputs.

This change makes the jvm prefer binding to ipv4 addresses.

Change-Id: I04a0fdbcb253a0a6a3bcc3759eb0b9d0f1962621
2018-09-10 21:14:21 +00:00
Jonathan Rosser
c2d3c44fd8 Enforce no_proxy when setting up ELK dashboards and rollups
There is no guarantee that all container IP addressess will be included
in an existing no_proxy environment variable. This will cause failures
when an http proxy is configured, but the proxy does not allow traffic
to 'hairpin' back to internal addresses.

This change forces no_proxy to the specific address of the kibana
and coordinator endpoints when the uri module is used to load dashboards
and configure rollups.

Change-Id: I669334c722cce79459b522e6e2d7e1aaec49ef24
2018-09-10 21:14:11 +00:00
Kevin Carter
bb4954b598 Set the max user watches to 1M
This increases the default value on elastic hosts from 32k to 1M which
improves general stability, especially on high traffic hosts.

Change-Id: I18f3e7005d2798dd4008215c7aa949cc37084f5c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-10 21:12:14 +00:00
Zuul
58a7e461ac Merge "Move bootstrap-ansible and passwords to run_osa tag" 2018-09-10 17:35:06 +00:00
Zuul
31d0cff14c Merge "Allow using custom publish host" 2018-09-08 20:53:51 +00:00
Antony Messerli
cd3b3f047a Move bootstrap-ansible and passwords to run_osa tag
Allows for deployment/bootstrap of OSA to be skipped
by skipping run_osa while still allowing configuration
to be added during pre_config_osa.

Change-Id: I40b0c8209f03c7e9543c7c688f2ef8ba2ebdf72d
2018-09-07 17:01:29 -05:00
Jesse Pretorius
41d0e61f0c MNAIO: Cater for galera bootstrap without a master
There can be situations where a gvwstate.dat file is present
in at least one galera container, but the my_uuid and view_id
do not match in any of them. In this case, we should just pick
any container to be the master.

This patch caters for this situation, ensuring that the cluster
still bootstraps whenever the VM boots.

Change-Id: If87cd9399b6624418f16910e4ddc046aaa22e5c5
2018-09-07 17:51:58 +01:00
Jesse Pretorius
2958a629c7 MNAIO: Ensure that nested virt is enabled on host
Nested virtualization is important to improve VM performance
and enabling it is crucial to ensuring that VM images built
on one host work on boot on other hosts because the environment
is consistent.

In this patch add a task to enable it if it is available.

Change-Id: I812d8399cf45fab94f0f46976c9415591d45e463
2018-09-06 17:16:21 +01:00
Jesse Pretorius
f437430212 MNAIO: Ensure that virt-networks are properly setup
Due to the rather terrible virt_net module, only one action
can be done on the virt networks at any one time. This means
that the current action of setting them to autostart has no
effect, because the module does not do it. Also, the current
action of disabling the default network and disabling it from
autostarting also does not take full effect. As such, after a
host reboot, the default network autostarts, and the other
networks are not started and the VM's cannot start. When trying
to resolve this by re-running the host setup, the play ignores
any existing virt networks - so the issue cannot be fixed.

This patch does the following:

1. Ensures that the default network does not autostart. This
   is done by splitting the disabling of the network, and the
   disabling of autostart into two tasks.
2. Changes the define/create action into a single action which
   will not change the network configuration if it is defined.
3. Implements the setting of the network as active, and the
   setting of it to autostart as two seperate tasks. This
   ensures that both actions are actually implemented.

Change-Id: I608f2607824fac649f4e018d89094d57047134b3
2018-09-06 13:08:46 +01:00
Zuul
7a9f3ef7f4 Merge "Force filesystem type on swift format" 2018-09-05 20:11:20 +00:00
Antony Messerli
ad1a4bc9ef Force filesystem type on swift format
It currently seems to think that /dev/vmvg00/disk1
is used for btrfs, so force this operation to
ensure it's changed to xfs.

Change-Id: I0bcc9723fb33b557315422c3259a7ba2b75ceff6
2018-09-05 13:13:22 -05:00
Zuul
b80cb0366b Merge "Refactor templates to use a single macro template file" 2018-09-05 16:57:53 +00:00
Jesse Pretorius
7618619bf8 MNAIO: Implement retries for image downloads
The image downloads may fail, even with aria's built-in
retry mechanism. With this patch we ensure that ansible
will delay and retry again. This improves the chances of
success.

With this we also remove the '--quiet' default parameter
so that we get console output from the task if it does
ultimately fail. This is useful for diagnostic purposes.

Change-Id: Ieed41f06a22effb28463637184980a748791edfe
2018-09-05 14:20:50 +01:00
Jesse Pretorius
868a559840 MNAIO: Only run systemd daemon_reload when necessary
When the VM's are Ubuntu Trusty, this task causes total failure.
We should only try and do the daemon_reload if the system being
used supports it.

Change-Id: I557856045a7735c8f351df6350f777caae526b10
2018-09-04 19:23:42 +01:00