Manual Log Parsing with Cut, AWK and Python

Manual Log Parsing with Cut, AWK and Python

This will be a quick tutorial aimed at people who are infosec newbies or are new to Linux in general that are interested in fast ways to parse logs without the aid of a SIEM or other log aggregation tools.?

Sometimes we find ourselves in situations where we don’t have the aid of log aggregation software and we need to extract information from a log file quickly. I’m going to share three different ways a person can do this from the Linux CLI. There are certainly many ways one can go about this.

Let’s say we must extract the date, the alert and error code from a Suricata log. The size of logs can become tremendous and to do this one at a time would be a fool's errand. It’s a good thing we have all these awesome tools to aid us. ?

Cut?

When using the cut command, we will want to set a delimiter (this is a character or symbol used in computer code that separates words, data, or characters.) before choosing the fields we want to extract, or it will select the entire line because it's viewed as one field. For this log, we will choose the white space between words by using “ “. ?

We will set the delimiter with –d and then choose which fields to extract with –f. We can then output that information to another file by using the ‘>’ and specifying a new file name. ?

Command: cut -d " " -f 1,5,7,8 log.txt ?

No alt text provided for this image

AWK

AWK is a domain specific language for scripting and data extraction for files. AWK sees spaces in the text as delimiters for fields. When using AWK we need to give it a command to execute, so we will use the ‘print’ command to print to our terminal. We then need to tell it which fields we want to print out. We use the ‘$ + <a number>’ to specify the field to be printed. ?

Command: awk '{print $1,$5,$7,$8}' log.txt

No alt text provided for this image

Python

Python is my favorite way to parse data. It’s certainly not as quick to carry out as cut or awk, but it has so many more options such as looping through the data to search for specific info or formatting the output.?

We first create a variable for the log file we want to use and use an ‘open’ statement and ‘r’ to read it.?

log = open(“log.txt”,”r”)?

Next, we use flow control to go through the log using “strip” to remove the newline character at the end of each line, then we will “split” it at a delimiter, this will be the space between the words “ “. This will then turn each line into a list so we can extract the data individually. ?

for line in log:?

lineFormat = line.strip().split(“ “)?

After that we will extract the fields, we want to use and save them to variables. If you’re unfamiliar with the way a list is formatted, the first item in the list is always “0”, the second being “1” and so on.?

date = lineFormat[0]?

alert = lineFormat[4]?

errorCode = lineFormat[6:8]?

Next, we will print the information to the terminal, but we will use string format “f” to make it look pretty. This can also be carried out by using modules such as pretty-print, pandas, etc. ?

print(f"Date: {date} ----- Alert: {alert} ----- Error: {errorCode}")?

Lastly, we must close the open statement.?

log.close()

No alt text provided for this image
No alt text provided for this image

If this information can help 1 person, then I've done my job. Happy parsing!?

Daniel McNally

Cybersecurity and Information Assurance Analyst

1 年

I know I'm talking to myself here, but you can also pipe in "tr" with "-d" to format and delete characters (in this case the brackets around the error code) and "awk" with "OFS" (output field separator) to add white space between the fields, making it easier to read.

  • 该图片无替代文字

要查看或添加评论,请登录

Daniel McNally的更多文章

  • Project 1 - DVWA

    Project 1 - DVWA

    During the latest FXBG Hackers meeting, a newcomer attended for the second time. He expressed an interest in…

  • Security Analyst Notes: Things to Remember

    Security Analyst Notes: Things to Remember

    Over the last two years during my training, I've been taking notes along the way on all different topics that have been…

    7 条评论
  • PyScript Domains > 72 Char.

    PyScript Domains > 72 Char.

    Last night I was reading one of the go-to blue team compendiums, Blue Team Handbook, by Don Murdoch and it was going…

    1 条评论
  • Malware Analysis Notes: Putty.exe

    Malware Analysis Notes: Putty.exe

    I finally was able to get back around to working on the PMAT course by, HuskyHacks and TCM Security. These are my notes…

    1 条评论
  • Snort 3 vs MiTM Attacks

    Snort 3 vs MiTM Attacks

    Executive Summary: There are pros and cons when using Snort's Intrusion Prevention and Intrusion Detection System…

  • Blue Team CTF: Warzone 1

    Blue Team CTF: Warzone 1

    To continue to work on my ability to parse logs and sniff out possible IOC's, I will be tackling another blue team CTF…

  • Splunk BOTSv3 AWS & WINEvent

    Splunk BOTSv3 AWS & WINEvent

    AWS S3 Bucket Challenge Today I will be finishing up my Splunk course with 2 more blue team CTFs. The first challenge…

    1 条评论
  • Splunk BOTSv3 Web & OneDrive

    Splunk BOTSv3 Web & OneDrive

    The past week I’ve been spending most of my time trying to complete a Splunk learning path to gain an understanding of…

  • CTF: SNORT Basics Pt. 1

    CTF: SNORT Basics Pt. 1

    Today I will be running through a blue team CTF focused on using the IDS/IPS Snort. Snort can be used both passively…

    5 条评论
  • Malware Stager Deobfuscation

    Malware Stager Deobfuscation

    During a recent challenge, I received an obfuscated malware stager that was a PowerShell script that needed…

社区洞察

其他会员也浏览了