Centralized log management for AEM

What is Centralized log management?

We know that almost every running application generates logs. These logs are very important and describe how the application is being operated. Whenever any problem occurs, we need to go to the bottom of the matter and uncover the cause of the issue, in that case, logs are the only source of truth.

It would be very arduous to find the problem if we do not have logs.

When it comes to a heavy production application running on the servers, a huge amount of logs are generated. And hence it becomes very challenging to gather and visualize these logs in a single place. So here comes the concept of Centralized log management.

Centralized log management is a solution that consolidates log data into a central database from where the log data can be redirected to any location.

Apart from various data management features, an ideal CLM system is also expected to support the analysis of log data and clear presentation of outcomes after analysis.

What can we do with Centralized log management?

  1. Parsing of the logs.
  2. Store logs from multiple sources to a single target.
  3. Easy searching of log data.
  4. Visualization of logs in the form of graphs, tables, etc.
  5. Easy retention of logs.
  6. Access control (who can see what logs)
  7. Alerting based on metrics

CLM for AEM(Adobe Experience Manager)

AEM is a content management solution for building websites, mobile apps, and forms. And it makes it very easy to handle the marketing content and assets.

Like other tools, AEM also generates a large number of logs which are enormously important for developers. So, here we are going to set up an EFK (a CLM solution) stack for monitoring and managing AEM logs.

Below mentioned are the log files generated by AEM which developers use frequently.

  • error.log
  • access.log
  • request.log
  • history.log
  • audit.log
  • stderr..log
  • stdout..log
  • upgrade.log

These logs are present inside the crx-quickstart/logs folder of the AEM directory.

We are also going to write the grok patterns to parse a few of the logs and display them on kibana.

EFK Stack

Elasticsearch, Fluent-bit, and Kibana, when used collectively form EFK stack. EFK stack is an open-source centralized log management stack.

Elasticsearch is used to store, search and analyze data in near real-time and it responds very rapidly. It is the central component of the EFK stack. It allows us to index and search the data by making queries from a large amount of data in near real-time. It allows full-text search as well as structured search. It gives fast search responses because it searches index directly instead of text. It also provides some REST APIs to store and search the data. We can also create and delete indexes with the help of these APIs.

Fluent-bit is used to collect, transform and ship data to elasticsearch. So fluent-bit is a log collector and processor. It is open-source and multi-platform. Fluent-bit is distributed as packages for specific Enterprise Linux distributions under the name of td-agent-bit. Kibana is a very powerful data visualization tool that acts as a user interface for monitoring and managing elasticsearch data. It allows us to create very intuitive dashboards and query the data based on filters. It also allows us to explore elasticsearch data very efficiently. So, here we are using td-agent-bit to gather the logs and ship them to elasticsearch.

Kibana is a very powerful data visualization tool that acts as a user interface for monitoring and managing elasticsearch data. It allows us to create very intuitive dashboards and query the data based on filters. It also allows us to explore elasticsearch data very efficiently.

Configuring EFK Stack

We will be using single Elasticsearch and Kibana here. Td-agents can be installed on each server from where we want to collect the data.

For demonstration purposes, we are using a single td-agent-bit.

We are using Elasticsearch and Kibana docker images to run them as docker-compose. Here is the docker-compose.yml file which can be used to run elasticsearch and kibana. By default elasticsearch runs on port 9200 and kibana runs on 5601.

version: "3"
services:
elasticsearch:
image: elasticsearch
container_name: elasticsearch
hostname: elasticsearch
environment:
- "discovery.type=single-node"
ports:
- 9200:9200
networks:
- elkstack
kibana:
image: kibana
container_name: kibana
hostname: kibana
ports:
- 5601:5601
links:
- elasticsearch:elasticsearch
depends_on:
- elasticsearch
networks:
- elkstack
networks:
elkstack:
driver: bridge

Here “discovery.type=single.node” states that we are using a single elastic search node. We are not using any elasticsearch cluster.

Now, let’s install td-agent-bit (fluent-bit agent) on the machine where the logs are present. Below mentioned are the steps to install a fluent bit agent on ubuntu, centos, Debian, and RedHat machines.

Installing td-agent-bit on Ubuntu

Here we are using the Fluent-bit repository to download the td-agent-bit package, so firstly we need to have the public key to access that repository. Here is the command to add the public key for the fluent-bit repository server.

wget -qO — https://packages.fluentbit.io/fluentbit.key | sudo apt-key add -

Add the URL of the fluent-bit repository to our source list by inserting the below-mentioned line in this file /etc/apt/sources.list.

deb https://packages.fluentbit.io/ubuntu/bionic bionic main

Next, we have to update the apt database. So run the below-mentioned command.

sudo apt-get update

Now, we are ready to install td-agent-bit on our machine.

sudo apt-get install td-agent-bit

Installing td-agent-bit on Debian

Adding the URL of the fluent-bit repository to our source list by inserting the below-mentioned line in this file /etc/apt/sources.list

For Debian 10

deb https://packages.fluentbit.io/debian/buster buster main

For Debian 9

deb https://packages.fluentbit.io/debian/stretch stretch main

For Debian 8

deb https://packages.fluentbit.io/debian/jessie jessie main

Installing td-agent-bit on Centos and Redhat Linux

Create a new file with name td-agent-bit.repo in the directory /etc/yum.repos.d/. Follow the below instructions for the same.

cd /etc/yum.repos.d/vim td-agent-bit.repo

Paste the below-mentioned lines in the file.

[td-agent-bit]
name = TD Agent Bit
baseurl = https://packages.fluentbit.io/centos/7/$basearch/
gpgcheck=1
gpgkey=https://packages.fluentbit.io/fluentbit.key
enabled=1

Then run the following yum command to download and install td-agent-bit service on your machine,

yum install td-agent-bit

Now, you can use the below-mentioned command to start the td-agent-bit service on your machine.

sudo service td-agent-bit start

And to check the status of the td-agent-bit service, you can use the following command.

sudo service td-agent-bit status

Now that we have td-agent-bit installed on our machine, it’s time to configure it. Move to /etc/td-agent-bit location in your machine. There you will find two files named as-

  • td-agent-bit.conf — This is the main configuration file where we can define the service, input, filter, and output accordingly.
  • parsers.conf — This file contains the parsers or grok patterns to parse the logs and segregate them according to our requirements.

Let’s write some configurations to parse and send aem logs to elasticsearch and then display that data to kibana.

Some parsers are already present in the parsers.conf file to parse apache and Nginx logs. We may either keep them or remove them according to our requirements.

Open up the parsers.conf file and append the below-mentioned lines to parse the AEM author’s access, error, and request logs.

[PARSER]
Name author_access
Format regex
Regex ^(?<AccessIp>.*)\s-\s(?<user>.*)\s+(?:\d{2}\/\w{3}\/\d{4}\:\d{2}\:\d{2}\:\d{2} \+\d{4}) \"(?<method>\w{3,4}) \/(?<msg>.*)$

[PARSER]
Name author_error
Format regex
Regex ^(?:.*) \*(?<logLevel>.*)\* (?<msg>.*)$


[PARSER]
Name author_request
Format regex
Regex ^(?<RequestTime>.*) \[(?<requestid>.*)\] -> (?<requestMethod>.*) (?<requestPath>.*) (?<protocol>.*)\n(?<ResponseTime>.*) \[(?<responseid>.*)\] <- (?<responseCode>.*) (?<mimeType>.*) (?<resposneTime>.*)$

Next, we will be configuring the td-agent-bit.conf file to take input, filter out, and output the log data to elasticsearch. So, open up the td-agent-bit.conf file and add the below-mentioned code.

[SERVICE]
Flush 5
Daemon Off
Log_Level info
Log_File /var/log/fluent-logs/fluent-bit.log
Parsers_File parsers.conf

#author-access
[INPUT]
Name tail
Tag author-access
Path /aem/author/crx-quickstart/logs/access.log
Refresh_Interval 5
Mem_Buf_Limit 100MB
Buffer_Chunk_Size 128k
Buffer_Max_Size 4096k
DB fluent-bit.db

[FILTER]
Name parser
Match author-access
Key_Name log
Parser author_access

[OUTPUT]
Name es
Match author-access
Host localhost
Port 9200
Index author-access
Type author-logs
Retry_Limit 1


#author-error
[INPUT]
Name tail
Tag author-error
Path /aem/author/crx-quickstart/logs/error.log
Refresh_Interval 5
Mem_Buf_Limit 100MB
Buffer_Chunk_Size 128k
Buffer_Max_Size 4096k
DB fluent-bit.db

[FILTER]
Name parser
Match author-error
Key_Name log
Parser author_error


[OUTPUT]
Name es
Match author-error
Host localhost
Port 9200
Index author-error
Type author-logs
Retry_Limit 1



#author-request
[INPUT]
Name tail
Tag author-request
Path /aem/author/crx-quickstart/logs/request.log
Refresh_Interval 5
Mem_Buf_Limit 100MB
Buffer_Chunk_Size 128k
Buffer_Max_Size 4096k
DB fluent-bit.db

[FILTER]
Name parser
Match author-request
Key_Name log
Parser author_request


[OUTPUT]
Name es
Match author-request
Host localhost
Port 9200
Logstash_Prefix author-request
Type author-logs
Retry_Limit 1

After adding the above-mentioned configurations in the files, restart the td-agent-bit service by using the following command.

service td-agent-bit restart

Getting data on Elasticsearch and Kibana

Now, open the Kibana dashboard in your browser with this URL http://localhost:5601. The kibana dashboard will look like this.

Now, go to Stack management -> Index management . And you will see all three indexes here as shown below -

So, it’s time to create an index pattern so that the indexes can easily be grouped and we can query the data. In my case, I have created the index pattern named author* because every index starts with the author prefix. You can create multiple index patterns according to your use.

While creating the index pattern, choose the timestamp option to sort the logs based on time.

At last, your index pattern will look like this.

Now, go to Kibana -> Discover. Choose the index pattern from the drop-down. And start finding the required index in the KQL(Kibana Query Language) bar located on the top side by writing the query. I have given an example below. You can also choose the timestamp from the timestamp bar located beside the KQL bar like this.

Conclusion

In conclusion, the EFK stack is one of the easiest ways to set up centralized log management for any application. These same steps can be followed to set up CLM for any kind of application, whether it is a PHP application, node application, or something else. The only thing that needs to be done is, just change the path of the logs that needed to be transferred to Elastic search.

Next thing is, write regex patterns according to the log pattern or the kind of segregation required.

Software Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store