Getting started with the ELK Stack monitoring solution
Elk Hunting
ELK Stack is a powerful monitoring system known for efficient log management and versatile visualization. This hands-on workshop will help you take your first steps with setting up your own ELK Stack monitoring solution.
Today's networks require a monitoring solution with industrial-strength log management and analytics. One option that has gained popularity in recent years is ELK stack [1]. The free and open source ELK Stack collection is maintained by a company called Elastic. (According to the website, the company has recently changed the name of the project to Elastic Stack, but the previous name is still in common usage.) ELK Stack is not a single tool but a collection of tools (Figure 1). The ELK acronym highlights the importance of the collection's three most important utilities. At the heart of the stack, Elasticsearch collects and maintains data, providing an engine, based on Apache Lucene, for searching through it. Logstash serves as the log processing pipeline, collecting data from a multitude of sources, transforming it, then sending it to a chosen "stash." (Keep in mind that, despite its name, Logstash itself does not preserve any data.) Kibana provides a user-friendly interface for querying and visualizing the data.
A bundle of tiny apps called beats specialize in collecting data and feeding it to Logstash or Elasticsearch. The beats include:
- Filebeat – probably the most popular and commonly used member of the beats family. Filebeat is a log shipper that assigns subordinates, called harvesters, for each log to be read and fed into Logstash.
- Heartbeat – an app that asks a simple question: Are you alive? Then it ships this information and response time to Elasticsearch. In other words it is a more advanced ping.
- Winlogbeat – is used for monitoring a Windows-based infrastructure. Winlogbeat streams Windows event logs to Elasticsearch and Logstash.
- Metricbeat – collects metrics from your systems and services. Metrics include CPU and memory disk storage, as well as data for Redis, Nginx, and much more. Metricbeat is a lightweight way to collect system and service data.
The collection also comes with several plugins that enhance functionality for the entire stack.
ELK Stack is popular in today's distributed environments because of its strong support for log management and analytics. Before you roll out a solution as complex and powerful as ELK Stack, though, you'll want to start by trying it out and experimenting with it in a test environment. It is easy to find overviews and short intros to ELK Stack, but it is a little more difficult to study the details. This workshop is a hands-on look at what it takes to get ELK Stack up and running.
ELK Installation
ELK Stack has lots of pieces, so it helps to use an automated deployment and configuration tool for the installation. I will use Ansible in this example. I hope to write this in a simple way that will be easy to follow even if you aren't familiar with Ansible, but see the Ansible project website [2] if you need additional information.
Listing 1 shows an Ansible playbook for installing the ELK Stack base applications. The first few lines define a few settings specific to Ansible itself, such as declaring that the execution will be local (and won't require an SSH network connection). become: true
asks Ansible to run all commands with Sudo, which will allow you to run this playbook as a default Vagrant user instead of relogging to root. The tasks
section lists the steps that will be executed in the playbook. There are multiple ways to install ELK Stack; Listing 1 uses the yum package manager and specifies a package repository. I specify the exact version numbers for the Elasticsearch, Logstash, and Kibana packages to make it easier to install the correct plugins later.
Listing 1
Ansible Playbook: elk-setup.yml
01 --- 02 - hosts: localhost 03 connection: local 04 gather_facts: false 05 become: true 06 tasks: 07 - name: Add Elasticsearch OpenSource repo 08 yum_repository: 09 name: Elasticsearch-OS 10 baseurl: https://artifacts.elastic.co/packages/oss-7.x/yum 11 description: ELK OpenSource repo 12 gpgcheck: false 13 14 - name: Install ELK stack 15 yum: 16 name: "{{ item }}" 17 loop: 18 - elasticsearch-oss-7.8.0-1 19 - logstash-oss-7.8.0-1 20 - kibana-oss-7.8.0-1
Once the software is installed, you need to run it as a service. You could use systemctl, but Listing 2 carries on using Ansible.
Listing 2
Are Elasticsearch and Kibana Enabled?
01 - name: Start ELK services 02 service: 03 name: "{{ item }}" 04 enabled: true 05 state: started 06 loop: 07 - elasticsearch 08 - kibana
The command in Listing 3 checks to ensure that Elasticsearch is running locally at the default port 9200.
Listing 3
Is Elasticsearch Running?
01 [vagrant@ELK ~]$ curl localhost:9200 02 { 03 "name" : "ELK", 04 "cluster_name" : "elasticsearch", 05 "version" : { 06 "number" : "7.8.0", 07 "minimum_wire_compatibility_version" : "6.8.0", 08 "minimum_index_compatibility_version" : "6.0.0-beta1" 09 }, 10 "tagline" : "You Know, for Search" 11 }
Configuring ELK
ELK Stack is set up in a virtual machine and is only listening on localhost, so if you try to open Kibana or Elasticsearch in the host's browser, it won't work. You need to change the network.host
setting in the YAML file to 0.0.0.0 to enable network operations.
The Elasticsearch YAML file is usually /etc/elasticsearch/elasticsearch.yml
and Kibana and Logstash follow the same pattern. (The YAML config files installed with the RPM packages are quite verbose, though many of the settings are commented out.)
The most important change is to set network.host
to 0.0.0.0. Keep in mind that Elasticsearch considers this change as enabling a production environment, therefore ELK Stack will expect a production environment to be running in a cluster. And since I am working in a single-node cluster, I need to set the value discovery.seed_hosts: []
– an empty list, in order to disable cluster discovery features.
The same applies to the Kibana dashboard. You need to modify the value server.hosts
to 0.0.0.0 and restart the service.
You can use Ansible to help you get the default config YAML files for Kibana and ES (Listing 4). Store them in the files
subdirectory of the playbook directory. Then you can make the required updates and use Ansible to replace the files. You'll need to restart the service if you make changes to the configuration.
Listing 4
elk-setup.yml: Getting the Files
01 - name: Copy file with Elasticsearch config 02 copy: 03 src: files/elasticsearch.yml 04 dest: /etc/elasticsearch/elasticsearch.yml 05 owner: root 06 group: elasticsearch 07 mode: '0660' 08 notify: restart_elasticsearch 09 10 - name: Copy file with Kibana config 11 copy: 12 src: files/kibana.yml 13 dest: /etc/kibana/kibana.yml 14 owner: root 15 group: kibana 16 mode: '0660' 17 notify: restart_kibana 18 19 handlers: 20 - name: Restart Elasticsearch 21 service: 22 name: elasticsearch 23 state: restarted 24 listen: restart_elasticsearch 25 26 - name: Restart Kibana 27 service: 28 name: kibana 29 state: restarted 30 listen: restart_kibana
Listing 4 uses a notify directive to create notifications that will be monitored in the handlers section.
Collecting Data with Beats
Now that the ELK services are up and running, I'll show you how to use Metricbeat and Filebeat to collect data. As I mentioned previously, Metricbeat is designed to collect system and service metrics, and Filebeat collects data from logfiles.
The first step is to set up a dummy Nginx application that will serve as a monitored node (Listing 5).
Listing 5
Provision a Monitored Node
01 --- 02 # ... 03 tasks: 04 - name: Add epel-release repo 05 yum: 06 name: epel-release 07 state: present 08 09 - name: Install Nginx 10 yum: 11 name: nginx 12 state: present 13 14 - name: Insert Index Page 15 template: 16 src: index.html.j2 17 dest: /usr/share/nginx/html/index.html 18 19 - name: Start Nginx 20 service: 21 name: nginx 22 state: started
Most of the tasks in Listing 5 are self-explanatory except the third one, which takes a local file with jinja2 formatting and renders it into the chosen destination format. In this case, I insert a hostname to display it on an HTTP page (Listing 6).
Listing 6
index.html.j2: Minimal HTML File
01 <!doctype html> 02 <html> 03 <head> 04 <title>{{ hostname }} dummy page</title> 05 </head> 06 <body> 07 <h1>Host {{ hostname }}</h1> 08 <p>Welcomes You</p> 09 </body> 10 </html>
I'll use Metricbeat to collect statistics on the monitored node. The YAML file in Listing 7 shows a Metricbeat configuration file that will collect data on the CPU, RAM, disk usage, and a few other metrics.
Listing 7
metricbeat.yml
01 metricbeat.modules: 02 - module: system 03 period: 30s 04 metricsets: 05 - cpu # CPU usage 06 - load # CPU load averages 07 - service # systemd service information 08 # Configure the metric types that are included by these metricsets. 09 cpu.metrics: ["percentages", "normalized_percentages"] 10 - module: nginx 11 metricsets: ["stubstatus"] 12 period: 10s 13 hosts: 14 - "http://127.0.0.1" 15 server_status_path: "/nginx_status" 16 tags: 17 - slave 18 - test 19 #fields: 20 # hostname: ${HOSTNAME:?Missing hostname env variable} 21 processors: 22 - fingerprint: 23 fields: ['.*'] 24 ignore_missing: true 25 output.elasticsearch.hosts: ["172.22.222.222:9200"] 26 setup.kibana.host: "http://172.22.222.222:5601" 27 setup.dashboards.enabled: true
Metricbeat supports several different modules dedicated to monitoring different services. One of the most commonly used modules is the system
module, which collects metrics related to the system. Some of the metrics have individual configuration settings, such as cpu and core, which you can see in lines 21-22.
The Metricbeat config file contains three sections: tags
, fields
, and processors
. The tags
section adds new list-type fields. In the fields
section, you can append key-value entries to send to JSON.
Beats environment variables behave like environment variables in Bash and take the form ${VAR_NAME}
. You can provide a default value to use if no other value is found with ${VAR_NAME:some_default_value}
. To enforce the presence of the variable, use ${VAR_NAME:?error message}
, in which case Metricbeat will fail to start and log an error message if the environment variable is not found. The most advanced modifiers are in the processors
section. Processor settings can dynamically adjust to events, in this case: compute fingerprints from chosen fields. There are many variations of processors that perform tasks such as conditionally adding or removing fields or even executing simple JavaScript snippets that modify our event data.
Another popular module for metrics collection is Nginx, which collects numbers from the Nginx status page. However before you can use the Nginx module, you need to enable the status page for scraping.
Listing 8 shows the section of the nginx.conf
configuration file that will enable metrics and configure security so that attempts to reach the status page must come from the host itself. Because the scraper will collect metrics every few seconds, there is no point in logging each entry in access_log
, therefore the access_log
setting is turned off.
Listing 8
nginx.conf Excerpt
21 location /nginx_status { 22 stub_status on; 23 access_log off; 24 allow 127.0.0.1; 25 allow ::1; 26 deny all; 27 }
Listing 9 shows the Ansible playbook section that deploys Nginx and Metricbeat.
Listing 9
Deploying Nginx and Metricbeat
01 (...) 02 - name: Copy Nginx config 03 copy: 04 src: nginx.conf 05 dest: /etc/nginx/nginx.conf 06 owner: root 07 group: root 08 mode: '0644' 09 notify: restart_nginx 10 11 - name: Install Beats 12 yum: 13 name: "{{ item }}" 14 loop: 15 - metricbeat-7.8.0-1 16 - filebeat-7.8.0-1 17 18 - name: Start Beats services 19 service: 20 name: "{{ item }}" 21 enabled: true 22 state: started 23 loop: 24 - metricbeat 25 - filebeat 26 27 - name: Copy file with Metricbeat config 28 copy: 29 src: metricbeat.yml 30 dest: /etc/metricbeat/metricbeat.yml 31 owner: root 32 group: root 33 mode: '0644' 34 notify: restart_metricbeat
Log Management
Filebeat attends to tasks related to log collection. Filebeat has some built-in parsers for commonly recognized logfile types, such as syslog, Nginx logs, and a few more.
The filebeat.yml
file in Listing 10 shows two modules for handling system logs and Nginx logs. Both modules are built into the base Filebeat application and provide functionality to break the lines of the log into events that can be sent directly to Elasticsearch.
Listing 10
filebeat.yml
01 filebeat.modules: 02 - module: nginx 03 access: 04 var.paths: ["/var/log/nginx/access.log"] 05 error: 06 var.paths: ["/var/log/nginx/error.log"] 07 - module: system 08 syslog: 09 var.paths: ["/var/log/messages"] 10 auth: 11 var.paths: ["/var/log/secure"] 12 setup.kibana.host: "http://172.22.222.222:5601" 13 setup.dashboards.enabled: true 14 output.elasticsearch.hosts: ["172.22.222.222:9200"]
Listing 11 shows an event stored to Elasticsearch by Filebeat (with some insignificant parts removed for brevity). This sample event originates from a log entry like the following:
Sep 28 13:49:07 slave0 sudo[17900]: vagrant : TTY=pts/1 ;PWD=/home/vagrant ; USER=root ;COMMAND=/bin/vim/etc/filebeat/filebeat.yml
Listing 11
Store in Elasticsearch via Filebeat
01 $ curl -s "localhost:9200/filebeat-7.8.0-2020.09.28/_search?pretty=true&sort=@timestamp&size=1" 02 03 "_source" : { 04 "agent" : { 05 "hostname" : "slave0", 06 "type" : "filebeat" 07 }, 08 "process" : { 09 "name" : "sudo", 10 "pid" : 17900 11 }, 12 "log" : { 13 "file" : { 14 "path" : "/var/log/secure" 15 } 16 }, 17 "fileset" : { 18 "name" : "auth" 19 }, 20 "input" : { 21 "type" : "log" 22 }, 23 "@timestamp" : "2020-09-28T13:49:07.000Z", 24 "system" : { 25 "auth" : { 26 "sudo" : { 27 "tty" : "pts/1", 28 "pwd" : "/home/vagrant", 29 "user" : "root", 30 "command" : "/bin/vim /etc/filebeat/filebeat.yml" 31 } 32 } 33 }, 34 "related" : { 35 "user" : [ 36 "vagrant" 37 ] 38 }, 39 "service" : { 40 "type" : "system" 41 }, 42 "host" : { 43 "hostname" : "slave0" 44 } 45 }
Keep in mind that Filebeat is designed to work with structured and known log types. If you aim to track unspecified logs, you need to use Logstash.
Logstash
In order to track and ship unstructured logs, you have to use a simple log module, and its output should mainly go through Logstash, unless you are only interested in the timestamp and message fields. You need Logstash to act as an aggregator for multiple logging pipelines.
Despite its name, Logstash does not stash any logs. It receives log data and processes it with filters (Figure 2). In this case, processing means transforming, by removing unnecessary elements or splitting objects into terms that can serve as JSON objects and be sent to a tool like Elasticsearch. When the logs are processed, you have to define the output: The output could go to Elasticsearch, a file, a monitoring tool like StatsD, or one of several other output options.
The logstash.yml
file shown in Listing 12 only presents non-default values. You need to define the data and specify where logstash will keep temporary data.
Listing 12
Exceptions from logstash.yml
01 $ cat files/logstash.yml | grep -vP '^#' 02 path.data: /var/lib/logstash 03 pipeline.id: main 04 path.config: "/etc/logstash/conf.d/pipeline.conf" 05 http.host: 0.0.0.0
Pipelines are sets of input/filter/output rules. You can define multiple pipelines and list them in file pipelines.yml
. When you have a single pipeline, you can specify it directly in the main config. The pipeline.conf
file (Listing 13) lets you specify the input, filter, and output for the pipeline. The input options include reading from a file, listening on port 514 for syslog messages, reading from a Redis server, and processing events sent by beats. A pipeline can also receive input from services such as the Cloudwatch monitoring tool or the RabbitMQ message broker.
Listing 13
pipeline.conf
01 input { 02 beats { 03 port => 5044 04 } 05 } 06 filter { 07 if [fileset][module] == "system" { 08 if [fileset][name] == "auth" { 09 (...) 10 } 11 else if [fileset][name] == "syslog" { 12 grok { 13 match => { 14 "message" => ["%{SYSLOGTIMESTAMP:[system][syslog][timestamp]} %{SYSLOGHOST:[system][syslog][hostname]} \ 15 %{DATA:[system][syslog][program]}(?:\[%{POSINT:[system][syslog][pid]}\])?: \ 16 %{GREEDYMULTILINE:[system][syslog][message]}"] 17 } 18 pattern_definitions => { 19 "GREEDYMULTILINE" => "(.|\n)*" 20 } 21 } 22 } 23 } 24 else if [fileset][module] == "nginx" { 25 (...) 26 } 27 } 28 output { 29 elasticsearch { 30 hosts => localhost 31 manage_template => false 32 index => "logstash-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" 33 } 34 }
Filters transform data in various ways to prepare it for later storage or processing. Some popular filter options include:
- grok filter – transforms unstructured lines into structured data.
- csv – converts csv content into a list of elements.
- geoip – assigns geographic coordinates to a given IP addresses.
The output settings define where the data goes after filtering, which might be to Elasticsearch, email, a local file, or a database.
As you can see in Listing 13, the sample pipeline starts with waiting for beat input on port 5044. The most complex part of Listing 13 is the filter. In this case, the filter will parse syslog, audit.log, Nginx, and error logs, and each log has different syntax.
Most filters are self-explanatory, but grok [3] requires a comment: it is a plugin that modifies information in one format and immerses it in another (JSON, in this case). To speed up the process, you can use a built in pattern, like IPORHOST
or DATA
. There are already hundreds of grok patterns available, but you can define your own, like the GREEDYMULTILINE
pattern in Listing 13.
A pattern in grok has the format %{SYNTAX:SEMANTIC}
, where SYNTAX
is a regex (or another SYNTAX
with regex) and SEMANTIC
is a human-acceptable name that you will want to bind to the matched expression. When transformation is completed, you can output the data, in this case to Elasticsearch.
Kibana and Logtrail
Kibana is a visualization dashboard system that helps you view and analyze data obtained through other ELK components (Figure 3 and 4). Kibana supports hundreds of dashboards, allowing you to visualize different kinds of data in many useful ways. A system of plugins makes it easy to configure Kibana to display different kinds of data.
In this example, I'll show you how to set up Kibana to use Logtrail, a popular plugin for searching and visualizing logfiles.
You have to call kibana-plugin
to install Logtrail (Listing 14). (Of course, you also could have installed it with Ansible.)
Listing 14
Installing Logtrail
01 [root@ELK ~]# cd /usr/share/kibana/bin/ 02 [root@ELK bin]# ./kibana-plugin --allow-root install \ 03 https://github.com/sivasamyk/logtrail/releases/download/v0.1.31/logtrail-7.8.0-0.1.31.zip
The Logtrail config file (Listing 15) lets you define which index/indices it should take data from, as well as some display settings and mappings for essential fields such as the timestamp, hostname, and message. You can also add more fields and define custom message formats.
Listing 15
logtrail.json
01 { 02 "index_patterns" : [ 03 { 04 "es": { 05 "default_index": "logstash-*" 06 }, 07 "tail_interval_in_seconds": 10, 08 "display_timestamp_format": "MMM DD HH:mm:ss", 09 "fields" : { 10 "mapping" : { 11 "timestamp" : "@timestamp", 12 "hostname" : "agent.hostname", 13 "message": "message" 14 } 15 }, 16 "color_mapping" : { 17 "field": "log.file.path", 18 "mapping" : { 19 "/var/log/nginx/access.log": "#00ff00", 20 "/var/log/messages": "#0000ff", 21 "/var/log/secure": "#ff0000" 22 } 23 } 24 } 25 ] 26 }
Security
The last step is to introduce some security to the stack. Until now, if you enabled access to the stack from all networks, it would mean that anyone could mess with the data. The ELK base configuration does not include any kind of access restrictions, but you can add security through plugins. Two options are the paid Elastic X-Pack Security plugin [4] and the OpenDistro [5] security plugin.
It is worth noting that another option would be to use a proxy service like Apache or Nginx to enforce authorization, but for consistency, I'll stick with a dedicated solution.
The basic scenario is, a user presents credentials that are verified against access backends. When the user's identity is confirmed, the security plugin assigns privileges and roles for the user (Figure 5).
When the OpenDistro plugin is enabled, Kibana presents a login panel (Figure 6).
The configuration for the OpenDistro plugin is stored in a few YAML files in /usr/share/elasticsearch/plugins/opendistro_security/securityconfig/
.
As you can see in Listing 16, the YAML file for the security plugin is organized by user account. The hash is an encrypted password generated with the hash.sh
script, which is located in the tools
subdirectory of the plugin directory. The opendistro_security_roles
entry lets you specify any of the predefined roles. Most of the roles are self explanatory, but a word is needed for the logstash
role, since it also includes permissions to write Beats indices. If you want to create your own roles, you have to modify the action_groups.yml
, roles.yml
, and roles_mapping.yml
file, which are located in the plugin's securityconfig
subdirectory. The config file can also refer to roles assigned in an authentication system such as LDAP or ActiveDirectory.
Listing 16
internal_users.yml
01 # All passwords are: 02 # qwerty 03 _meta: 04 type: "internalusers" 05 config_version: 2 06 07 admin: 08 hash: "$2y$12$N5/i8SBuGv9c8vI5fYNWFe2otKwYPbAfBpNObFjCDpRJQp0k55bfC" 09 reserved: true 10 hidden: true 11 opendistro_security_roles: 12 - all_access 13 description: "Demo admin user" 14 15 kibanaserver: 16 hash: "$2y$12$N5/i8SBuGv9c8vI5fYNWFe2otKwYPbAfBpNObFjCDpRJQp0k55bfC" 17 reserved: true 18 hidden: false 19 opendistro_security_roles: 20 - kibana_server 21 description: "Demo kibanaserver user" 22 23 kibana: 24 hash: "$2y$12$N5/i8SBuGv9c8vI5fYNWFe2otKwYPbAfBpNObFjCDpRJQp0k55bfC" 25 reserved: false 26 opendistro_security_roles: 27 - kibana_user 28 - readall_and_monitor 29 description: "Demo kibana user" 30 31 logstash: 32 hash: "$2y$12$N5/i8SBuGv9c8vI5fYNWFe2otKwYPbAfBpNObFjCDpRJQp0k55bfC" 33 reserved: true 34 hidden: false 35 opendistro_security_roles: 36 - logstash 37 description: "Demo Logstash & Beats user"
You can mark a user, role, role mapping, or action group as reserved. Resources that have the reserved flag set to true can't be changed using the REST API or Kibana. Reserved resources are not returned by the REST API and are not visible in Kibana.
In order to further harden your ELK stack, you can generate certificates to use with SSL and enable them in Elasticsearch, then add user credentials to the Kibana server as well as all beats. In the long run, however, it is a good idea to plug your stack into a company authentication service, such as Okta or LDAP.
Summary
ELK is an amazing solution that allows users to swiftly explore the status of the infrastructure. Although it was originally designed to handle logging, with later iterations and plugins, it has become a fully functional MAL tool (Monitoring-Alerting-Logging). This paper has touched on a few of the many potential options. Other notable features include fully configurable alerting, machine learning, anomaly detectors, and a performance analyzer.
Infos
- ELK Stack: https://www.elastic.co/elastic-stack
- Ansible: https://www.ansible.com/
- grok Filter: https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
- X-Pack Security Plugin: https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html
- OpenDistro: https://opendistro.github.io/for-elasticsearch-docs/
- Code in this Article: https://github.com/szandala/ELK