登录查看更多内容

Centralize your logs with ELK Stack

Sabir Rajabov

System Administrator at IDDA | Master at Khazar University

发布日期: 2023年12月29日

Centralizing logs with the ELK Stack is a powerful way to streamline and optimize the management of log data within an organization. The ELK Stack, comprised of Elasticsearch, Logstash, and Kibana, offers a comprehensive solution for collecting, parsing, storing, and visualizing logs from various sources in a centralized location. This setup not only enhances visibility into system behaviors and application performance but also facilitates troubleshooting, debugging, and proactive monitoring. Centralizing logs with the ELK Stack empowers businesses to gain valuable insights, detect anomalies, and make data-driven decisions more effectively.

In this article, We will setup a ELK stack to gathering Nginx Web Service logs from source node and visualize.

The efficient way to setup the stack. We should separate each service to single node.

Elasticsearch: RAM: Minimum of 4GB RAM (but preferably 8GB or more for better performance) Storage: Sufficient disk space for data storage (depends on the amount of log data) CPU: Dual-core CPU or higher
Logstash: RAM: Minimum of 2GB RAM Storage: Enough space for temporary data processing and cache CPU: Dual-core CPU or higher
Kibana: RAM: Minimum of 1GB RAM Storage: Depends on the saved visualizations and dashboards CPU: Dual-core CPU or higher

But in this article, We will use a single node for entire stack with below specs:

RAM: 8GB

CPU: 8 vCPU (1 cores per socket)

Storage: 200GB

OS: Ubuntu 22.04 LTS

So, Let's start to configure the stack one by one.

Elasticsearch:

Import GPG Key:

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch |sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg

Add repository to server:

echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

Install the package:

apt update && apt install elasticsearch

Configure the package: Elasticsearch is now installed and ready to be configured. The configuration file located at: '/etc/elasticsearch/elasticsearch.yml' . Elasticsearch is using YAML format for its configuration, which means that we need to maintain the indentation format. Be sure that you do not add any extra spaces as you edit this file. The elasticsearch.yml file provides configuration options for your cluster, node, paths, memory, network, discovery, and gateway. Most of these options are preconfigured in the file but you can change them according to your needs. By default, elasticsearch listens on localhost:9200. For the purposes of our demonstration of a single-server configuration, we will leave the default configuration.
Start the service and access:

systemctl start elasticsearch

curl -XGET "localhost:9200"

If you can access properly the endpoint of elasticsearch, you will find similar output like below:

Output
{
  "name" : "ELK-STACK",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "TOkdWnIlTNCEMzCzxYZcog",
  "version" : {
    "number" : "7.17.15",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "0b8ecfb4378335f4689c4223d1f1115f16bef3ba",
    "build_date" : "2023-11-10T22:03:46.987399016Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Kibana:

According to documentation, you should install the kibana dashboard after the elasticsearch. Because each component of ELK stack's packages related to the repository that we had configured. So, if you are already installed the elasticsearch, you can install the kibana dashboard directly.

apt update && apt install kibana

After the installation, we have to start the related service:

systemctl start kibana && systemctl status kibana

By default, kibana listens on localhost:5601. We will keep as default in this article and configure reverse proxy on nginx that proxies the requests and responses from port 443 or 80 to 5601.

nano /etc/nginx/conf.d/your_domain.conf

server {
    listen 80;

    server_name your_domain;

    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/htpasswd.users;

    location / {
        proxy_pass https://localhost:5601;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

Also we will create htpasswd user to secure the dashboard. Below command will create user and encrypted password and store them in the htpasswd.users file under /etc/nginx directory. With this configuration, when someone wants to access your kibana dashboard, nginx will promt them a login alert box and given credentials will checks with the configured file.

echo "kibanaadmin:`openssl passwd -apr1`" | sudo tee -a /etc/nginx/htpasswd.users

After the all configuration, just check the configuration syntax. If everything is ok, then restart the nginx server and open the endpoint with browser.

nginx -t

systemctl restart nginx

In order to establish connection with Elasticsearch, the Kibana will prompt with enrollment token request popup box. You can generate one bu the help of elasticsearch-create-enrollment-token tool. Just run below command and place the token in the popup box.

/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana

After this step, Kibana will ask a verification pin for securely configuration. Run below command to get verification pin:

/usr/share/kibana/bin/kibana-verification-code

If all steps done successfully, Kibana will redirect you home page.

Logstash:

Logstash package also included in the elasticsearch repository. If you already configured the repository, run the below command to install logstash package:

领英推荐

Exploring Key Distributed System Algorithms and…

Vertisystem 1 年前

BitTorrent Internals - Part 6 - Kademlia - Distributed…

Arpit Bhayani 2 年前

The Challenge in Big Data is Small Files

MinIO 6 个月前

apt update && apt install logstash

After installing Logstash, you can move on to configuring it. Logstash’s configuration files reside in the /etc/logstash/conf.d directory. For more information on the configuration syntax, you can check out the configuration reference that Elastic provides. As you configure the file, it’s helpful to think of Logstash as a pipeline which takes in data at one end, processes it in one way or another, and sends it out to its destination (in this case, the destination being Elasticsearch). A Logstash pipeline has two required elements, input and output, and one optional element, filter. The input plugins consume data from a source, the filter plugins process the data, and the output plugins write the data to a destination.

Before the configuration of logstash, let's talk about filebeat which we will be use in this article.

What is filebeat and its purposes?

Filebeat is a lightweight data shipping tool offered by Elastic, the same company behind Elasticsearch, Kibana, and Logstash (ELK Stack). Its primary purpose is to collect, parse, and forward log data or other structured data from various sources to Elasticsearch or Logstash for further processing, indexing, and visualization.

Here's a brief breakdown of Filebeat's functionalities:

Log Collection: Filebeat is designed to efficiently tail and read log files, parsing them into structured JSON documents. It supports various log file formats and can also ingest data from other sources like system logs, network logs, and more.
Lightweight: It's built to be lightweight and resource-efficient, making it suitable for deployment on edge devices, servers, or containers without significant performance overhead.
Shipper: Filebeat acts as a shipper that sends harvested log data to centralized locations such as Elasticsearch or Logstash, ensuring data is centralized for analysis and visualization.
Modules: Filebeat offers predefined modules for various applications and services, simplifying the setup for common log formats and structures like Apache logs, MySQL logs, system logs, etc.
Scalability: It's scalable and can handle large volumes of log data by distributing data across multiple instances if needed.

In summary, Filebeat plays a crucial role in the ELK Stack ecosystem by facilitating the efficient and reliable movement of log data from multiple sources to the centralized Elasticsearch or Logstash for indexing and analysis, contributing to effective log management and monitoring.

So, let's install the filebeat and configure to ships the defined logs defined by us.

apt update && apt install filebeat

The package should be installed and configured each nodes separately. The configuration of the service located in '/etc/filebeat/filebeat.yml'.

First of all, comment out the elasticsearch section and uncomment the logstash section like below. By the help of this changing, instead of sending logs directly to the elasticsearch service, filebeat will send to logstash service and logstash will filter out the logs according to defined configuration that we will do in the next steps.

#output.elasticsearch:
  # Array of hosts to connect to.
  #  hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"


output.logstash:
  # The Logstash hosts
  hosts: ["ip_of_logstash:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

Now just add the below config under filebeat.inputs block to set which log files will filebeat ships to the logstash service:

 - type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    log_type: nginx-access

log_type field will help us to handle log in the logstash configuration.

After the configuration, save the file, start the filebeat service in installed node:

systemctl start filebeat

Create configuration file under /etc/logstash/conf.d directory. In this article, we will create a single config for each node that collects the log data. So, the node hostname based naming will be good practice for us. But be aware, every filename should end with .conf extension to logstash can read them.

Example config content:

input {
  beats {
    port => 5044
  }
}
filter {
    # Add any specific filters for processing if needed
    

    if [fields][log_type] == 'nginx-access' {
        grok {
            match => {
                "message" => "%{IPORHOST:client_ip} - %{DATA:user_ident} \[%{HTTPDATE:timestamp}\] \"%{WORD:http_method} %{DATA:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code:int} %{NUMBER:body_sent_bytes:int}"
            }
        }

        date {
            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
            target => "@timestamp"
        }

        mutate {
            add_field => { "source_type" => "%{[fields][log_type]}" }
            rename => {
                "response_code" => "http_response_code"
                "client_ip" => "user_client_ip"
                "body_sent_bytes" => "bytes_sent"
            }
            convert => {
                "http_response_code" => "integer"
                "bytes_sent" => "integer"
            }
        }
    }
}

output {
	  elasticsearch {
            hosts => ["localhost:9200"]
            index => "hostname-%{+YYYY.MM.dd}"
	}
}

In this configuration, input section determines in which port the logs will be collecting(Based on filebeat configuration, the port number will be 5044), filter block filters the access log of nginx to more readable and suitable for visualization. Output section will send the filtered logs to the elasticsearch service. After the configuration just restart the below services.

Index value in the output blocks tells the Kibana in which source the logs came from. It will be different based on the nodes that sends the logs.

systemctl restart logstash elasticsearch

After the successful restarting, check the status of the index that we have set in the configuration by the help of below curl command in the elasticsearch node.

curl -XGET "localhost:9200/_cat/indices?v"

In the output, you should see the index name. If you can see the name, it means you had successfully configure all of the steps.

So, let's visualize the logs with kibana.

Access the dashboard with browser. Go to Hamburger menu at the left side->Management->Stack management.

In the Stack Management, go to index patterns page at the left side bar. Create index pattern and give the name and select with the '@timestamp' option in select field. The index pattern name should be same with name of index in the logstash configuration. Logstash will add date at the end of a given index name. It means that you should cover all the dates like 'index_name-*'. You should see the name in the right side source list.

If you created the index pattern, go to hamburger menu->Analytics->Discover page. Select the index pattern at the top left select field. All the logs will be appear in the discover menu.

Left panel shows the all fields that containing log data, blacked section shows all logs over each second. If you can get the logs in this panel, it means that you can properly visualize the data. Just go to hamburger menu->dashboard->create dashboard->Create visualization and select index pattern at the top left side. And the help of official tutorial video you can visualize the data based on your desires.

Tutorial Link

Example visualized dashboard:

Centralizing logs with the ELK Stack presents a transformative approach to log management. By leveraging Elasticsearch, Logstash, and Kibana in unison, organizations can streamline the collection, parsing, storage, and visualization of logs from diverse sources. This centralized approach not only enhances visibility into system behaviors and application performance but also empowers proactive monitoring, troubleshooting, and data-driven decision-making. ELK Stack’s unified platform offers scalability, flexibility, and the ability to glean actionable insights from vast amounts of log data, making it an indispensable tool for modern businesses striving for enhanced operational efficiency and robust data analysis.

References:

How to install ELK Stack on Ubuntu 22.04

Configure Elasticsearch in Kibana

ChatGPT

要查看或添加评论，请登录

Sabir Rajabov的更多文章

Monitor your Docker containers with Prometheus and Grafana.

2023年12月21日

Monitor your Docker containers with Prometheus and Grafana.

Grafana and Prometheus are one of the technology world's powerful monitoring and analytics duo. So, what is Prometheus…
How to secure OpenSSH Server

2023年10月30日

How to secure OpenSSH Server

OpenSSH (Open Secure Shell) is a widely used open-source implementation of the SSH (Secure Shell) protocol. The SSH…
Build your CI with Gitlab Runner and Gitlab-CI

2023年10月28日

Build your CI with Gitlab Runner and Gitlab-CI

Hello everyone, Today we will talk about how can you prepare your application for build/testing with GitLab-CI and…
DevOps v? ona yana?ma t?rzi

2023年9月11日

DevOps v? ona yana?ma t?rzi

Bir ?vv?lki postumda bu metodologiyaya qar?? bir s?hv yana?ma oldu?u bar?sind? m?vzuya toxunmu?dum. Bu yaz?da is? bu…

1 条评论
Increase your workflow on Ansible lab

2022年3月19日

Increase your workflow on Ansible lab

Did you know? You can edit your playbooks in Ansible Lab with VS Code. Your persistent data in the container is located…

See all articles

Centralize your logs with ELK Stack

Sabir Rajabov

System Administrator at IDDA | Master at Khazar University

领英推荐

Sabir Rajabov的更多文章

社区洞察

其他会员也浏览了

Consensus Algorithm: Raft

The ScyllaDB Sync: July 2024

Modern Storage Engine Magic

OpenSearch Index, Shards, Nodes and Clusters

Distributed Snapshots

Robust DolphinDB – How does DolphinDB Achieve Scalability, Reliability, Resilience, Consistency, and Monitorability

Databricks Shared Compute & Data Governance: Understanding Security

Exploring BwTree: Performance Benefits and Potential Challenges

The Journey of Transaction Logs in Databricks

Low-Water Mark (LWM) (Design Pattern of Distributed Systems)

领英推荐

Sabir Rajabov的更多文章

Monitor your Docker containers with Prometheus and Grafana.

How to secure OpenSSH Server

Build your CI with Gitlab Runner and Gitlab-CI

DevOps v? ona yana?ma t?rzi

Increase your workflow on Ansible lab

社区洞察

其他会员也浏览了

Consensus Algorithm: Raft

The ScyllaDB Sync: July 2024

Modern Storage Engine Magic

OpenSearch Index, Shards, Nodes and Clusters

Distributed Snapshots

Robust DolphinDB – How does DolphinDB Achieve Scalability, Reliability, Resilience, Consistency, and Monitorability

Databricks Shared Compute & Data Governance: Understanding Security

Exploring BwTree: Performance Benefits and Potential Challenges

The Journey of Transaction Logs in Databricks

Low-Water Mark (LWM) (Design Pattern of Distributed Systems)