- Avoid checking item by item, we always enable modules and
prospectors, with an option to disable with opt-in
- Updated MySQL and Apache modules to point to right path
- Improved and clean-up tagging
- All the prospectors are managed using a variable
Change-Id: I2a091669d6a77fd2c89a073cf9071292793e2f6b
The oslo.log has a default pattern for logging all of the entries
with context, so let's use that in a common place to avoid duplicating
all the information.
Change-Id: I7f326221c01f53710f3adbc5fc2d416bec6aef8f
At the moment, we're adding an extra field called "logdate" rather
than using the built-in timestamp. This makes things go to the
right field.
Change-Id: I5e56d01692b7205418e6aba89d1c7c44fa1abfef
The filebeat does not ship anything tagged with oslofmt, the
openstack tag gives us all we need to parse things correctly.
Change-Id: I614e4bc5d85559540a9d616407da993ed90de87e
Ceph has a problem where logs that were introduced which are
debug messages are logged as normal. They cause a lot of extra
useless messages and overloading ELK cluster
This was fixed in 12.2.9 which is not out yet, so let's work
around it for now by avoiding shipping it.
Change-Id: I36a503b7380ce62c65570232a18d2179a98ecfa1
This patch adds support for ARM64 beats. Unfortunately, Elastic does
not publish any packages, so this points at local builds. Also, it
looks like Packetbeat fails to builds so for now we just don't do
anything about it on ARM.
Change-Id: I1889ce51f1a4c13c311165b8b76dde7c71ecfa2d
The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.
* Async translog allows elasticsearch to using run fsync in the
background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
of 30. This integer is representitive of the seconds between index
refresh calls which greatly lowers the load generated across the
cluster.
* All documents were fingerprinted before writting to the cluster. This
was a costly operation as elasticsearch will do a forward lookup on all
documents with a preset ID resulting in 100's, if not 1000's, of extra
reads. The purpose of the fingerprint function is to limit repeading
writes so to keep some of this functionality the fingerprint function is
now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
> 6GiB. Early versions of elasticsearch did not recommend this setting
however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
allowing these tasks to trigger service restarts when changes are made.
Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This patch aims to provide the user a way to enable/disable
beats by overriding {beatname}_service_state variable accordingly
to the beats that the users wants to be receiving data.
There are some use cases that users just wants a subset of the
beats provided, mostly to avoid unecessary use of bandwidth
with data that woudn't be used. So the way that this patch proposes
this use case is just enable/disable after install, keeping the service
installed in case of the users needs it.
Change-Id: I2251095d7fcfc48a239fe9d4984269503cc835da
The node roles would apply attributes to hosts if an override was set or
if a node was part of a given group as determined through auto-detection.
This change will now add nodes to a given role when set manually and
will ensure no extra nodes are added to the role if the count meets or
exceeds what's required to run the service.
Change-Id: Ied5f564f0328488d3359ec4dc8e9ad17fefe5eaf
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When upgrading or updating a template fields within kibana will not be
updated until they're manually refreshed. This change uses the kibana
API to gather field information from the indexes and update kibana
automatically.
Change-Id: Ia5de566521d79da070f4377d1d7cb4d9786447b4
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Ensures that any monitoring indexes are made with replicas in a custer
setup, which will ensure we're able to monitor the growth of ES indexes.
The curator action plugin timer was updated to use two different timer
files instead of combining them into one timer.
Change-Id: I2184ac4ec0b75e442ee8ae6ca8bd2c6f04d51401
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The curator action plugin does not use a logical OR when parsing
multiple filters. The only way to do this is to run curator with
different action filter files.
Change-Id: I97c93c87d6254f79831f2a177098ea52a3a3a49d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
* We don't need to create the containers as they are created during the
initial run.
* Remove quoting in favor of {% raw %} blocks
Change-Id: Ied696ad0882169d523a60a900788e7c2ba1d3fa3
The log rotate configuration was leaving too many logs in place and
allowing them to grow too large. This tunes up the logrotation process
to ensure we're retaining information but not excessively.
Change-Id: If0f02352ee2c274f4c589b05630d28126ceba2ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.
Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change allows this playbook to be run using an older version of
ansible. This change is necessary for my use case where I am running all
OSA and related playbooks in a docker container locally for a Newton
deploy.
The use of Newton OSA's ansible bootstrap script means that the
openstack-ansible my workflow uses requires Ansible 2.1, which does not
support `include_tasks`. This change addresses that problem by replacing
`include_tasks` in the playbook that needs to be run using
openstack-ansible with `include` which produces the desired result.
Change-Id: I8b2a0217e851d022ee40cbdd8bc8045e18d5a07d
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.
The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.
Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.
Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.
Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Curly quotes(Chinese punctuation) usually input from Chinese input
method. When read from english context, it makes some confusion.
Change-Id: I42ea55f2840eed70fe731119b259a5c625071e5b
Closes-Bug: #1792131
Thee scripts currently use 'ironic' commands, whhich is deprecated. This
patch converts to openstack commands.
Change-Id: I1a16164a7b8e35a61938ec470def37fa52db9edb
Upgrading the ELK stack to 6.4.0 leaves logstash only listening on
an ipv6 address and thereby unable to receive existing beats inputs.
This change makes the jvm prefer binding to ipv4 addresses.
Change-Id: I04a0fdbcb253a0a6a3bcc3759eb0b9d0f1962621
There is no guarantee that all container IP addressess will be included
in an existing no_proxy environment variable. This will cause failures
when an http proxy is configured, but the proxy does not allow traffic
to 'hairpin' back to internal addresses.
This change forces no_proxy to the specific address of the kibana
and coordinator endpoints when the uri module is used to load dashboards
and configure rollups.
Change-Id: I669334c722cce79459b522e6e2d7e1aaec49ef24
This increases the default value on elastic hosts from 32k to 1M which
improves general stability, especially on high traffic hosts.
Change-Id: I18f3e7005d2798dd4008215c7aa949cc37084f5c
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Allows for deployment/bootstrap of OSA to be skipped
by skipping run_osa while still allowing configuration
to be added during pre_config_osa.
Change-Id: I40b0c8209f03c7e9543c7c688f2ef8ba2ebdf72d
There can be situations where a gvwstate.dat file is present
in at least one galera container, but the my_uuid and view_id
do not match in any of them. In this case, we should just pick
any container to be the master.
This patch caters for this situation, ensuring that the cluster
still bootstraps whenever the VM boots.
Change-Id: If87cd9399b6624418f16910e4ddc046aaa22e5c5
Nested virtualization is important to improve VM performance
and enabling it is crucial to ensuring that VM images built
on one host work on boot on other hosts because the environment
is consistent.
In this patch add a task to enable it if it is available.
Change-Id: I812d8399cf45fab94f0f46976c9415591d45e463
Due to the rather terrible virt_net module, only one action
can be done on the virt networks at any one time. This means
that the current action of setting them to autostart has no
effect, because the module does not do it. Also, the current
action of disabling the default network and disabling it from
autostarting also does not take full effect. As such, after a
host reboot, the default network autostarts, and the other
networks are not started and the VM's cannot start. When trying
to resolve this by re-running the host setup, the play ignores
any existing virt networks - so the issue cannot be fixed.
This patch does the following:
1. Ensures that the default network does not autostart. This
is done by splitting the disabling of the network, and the
disabling of autostart into two tasks.
2. Changes the define/create action into a single action which
will not change the network configuration if it is defined.
3. Implements the setting of the network as active, and the
setting of it to autostart as two seperate tasks. This
ensures that both actions are actually implemented.
Change-Id: I608f2607824fac649f4e018d89094d57047134b3
It currently seems to think that /dev/vmvg00/disk1
is used for btrfs, so force this operation to
ensure it's changed to xfs.
Change-Id: I0bcc9723fb33b557315422c3259a7ba2b75ceff6
The image downloads may fail, even with aria's built-in
retry mechanism. With this patch we ensure that ansible
will delay and retry again. This improves the chances of
success.
With this we also remove the '--quiet' default parameter
so that we get console output from the task if it does
ultimately fail. This is useful for diagnostic purposes.
Change-Id: Ieed41f06a22effb28463637184980a748791edfe
When the VM's are Ubuntu Trusty, this task causes total failure.
We should only try and do the daemon_reload if the system being
used supports it.
Change-Id: I557856045a7735c8f351df6350f777caae526b10