Manual Log Parsing with Cut, AWK and Python
This will be a quick tutorial aimed at people who are infosec newbies or are new to Linux in general that are interested in fast ways to parse logs without the aid of a SIEM or other log aggregation tools.?
Sometimes we find ourselves in situations where we don’t have the aid of log aggregation software and we need to extract information from a log file quickly. I’m going to share three different ways a person can do this from the Linux CLI. There are certainly many ways one can go about this.
Let’s say we must extract the date, the alert and error code from a Suricata log. The size of logs can become tremendous and to do this one at a time would be a fool's errand. It’s a good thing we have all these awesome tools to aid us. ?
Cut?
When using the cut command, we will want to set a delimiter (this is a character or symbol used in computer code that separates words, data, or characters.) before choosing the fields we want to extract, or it will select the entire line because it's viewed as one field. For this log, we will choose the white space between words by using “ “. ?
We will set the delimiter with –d and then choose which fields to extract with –f. We can then output that information to another file by using the ‘>’ and specifying a new file name. ?
Command: cut -d " " -f 1,5,7,8 log.txt ?
AWK
AWK is a domain specific language for scripting and data extraction for files. AWK sees spaces in the text as delimiters for fields. When using AWK we need to give it a command to execute, so we will use the ‘print’ command to print to our terminal. We then need to tell it which fields we want to print out. We use the ‘$ + <a number>’ to specify the field to be printed. ?
Command: awk '{print $1,$5,$7,$8}' log.txt
Python
Python is my favorite way to parse data. It’s certainly not as quick to carry out as cut or awk, but it has so many more options such as looping through the data to search for specific info or formatting the output.?
We first create a variable for the log file we want to use and use an ‘open’ statement and ‘r’ to read it.?
领英推荐
log = open(“log.txt”,”r”)?
Next, we use flow control to go through the log using “strip” to remove the newline character at the end of each line, then we will “split” it at a delimiter, this will be the space between the words “ “. This will then turn each line into a list so we can extract the data individually. ?
for line in log:?
lineFormat = line.strip().split(“ “)?
After that we will extract the fields, we want to use and save them to variables. If you’re unfamiliar with the way a list is formatted, the first item in the list is always “0”, the second being “1” and so on.?
date = lineFormat[0]?
alert = lineFormat[4]?
errorCode = lineFormat[6:8]?
Next, we will print the information to the terminal, but we will use string format “f” to make it look pretty. This can also be carried out by using modules such as pretty-print, pandas, etc. ?
print(f"Date: {date} ----- Alert: {alert} ----- Error: {errorCode}")?
Lastly, we must close the open statement.?
log.close()
If this information can help 1 person, then I've done my job. Happy parsing!?
Cybersecurity and Information Assurance Analyst
1 年I know I'm talking to myself here, but you can also pipe in "tr" with "-d" to format and delete characters (in this case the brackets around the error code) and "awk" with "OFS" (output field separator) to add white space between the fields, making it easier to read.