登录查看更多内容

Linux IR - AI-Assisted Malware Analysis

Taz Wake

Cyber security incident response | Threat hunting | Digital forensics | Certified SANS instructor & course author | I am not looking for any new certification training...

发布日期: 2024年6月6日

Introduction

Incident response often has to be fast. We are chasing an active attacker and trying to get control of a situation before anything gets worse. It is not often that we have the luxury of time.

This means that sometimes we have to find "quick and dirty" ways to resolve issues. In turn, this often carries some sacrifices - we give up a deep understanding and, instead, rely on a high-level review to make decisions.

Where this becomes most apparent on Linux intrusions, is dealing with malware. Even if we have the right skills and knowledge to take apart complex ELF files, it is rare we can put aside enough time to get value.

In this article, I will look at ways you can get a superficial assessment by running some basic commands and sharing the data with ChatGPT (or any AI/LLM platform).

Note 1: None of this is a replacement for having trained, capable, and experienced reverse engineers available. The "quick and dirty" approach is just to help you deal with the early stage of the incident. You absolutely need a deeper understanding at some point.

Note 2: I am going to use ChatGPT here but you need to consider your own situation. You need to make sure you have appropriate approvals to upload data to public platforms and you need to understand enough that you can identify where the platform is making a mistake (or just plain old lying to you). This is 100% not a replacement for skilled staff, it is just a way to improve your efficiency.

AI platforms have limits. They can be great for quick assessments but rarely stand up to scrutiny...

Overview

The high-level workflow basically has four steps:

Discover a suspicious file during an incident.
Run basic analysis commands, save the output as a text file.
Share output with your AI/LLM model and provide a prompt to support your analysis.
Review output and act on any new knowledge.

The good news is that a lot of this is definitely scriptable and you could easily build this into your SOAR application with API calls to the AI/LLM platform. In this article we will focus on the basic commands to run and some suggested prompts.

A lot of this process will rely on you keeping good notes during your IR work. It should go without saying that this is critically important!

Basic Analysis Commands

In this article, I will use malware.elf as a placeholder for the file you are analysing.

Basic Information

Note the filename, where it was discovered and any immediately relevant information or clues about its behaviour.
Collect file metadata

file malware.elf > file_data.txt

This is an example of what the output should look like, although in use this would be redirected to a text file.

sha1sum malware.elf > sha1hash.txt

Static Analysis

readelf

The readelf command is used to extract information from an ELF file. It provides details on things such as the section headers, program headers, symbols etc. It has a range of command line arguments which allow you specify what specific items you are interested in, but for our purposes we will use the -a argument to collect all headers.

readelf -a malware.elf > readelf_evidence.txt

The output which is redirected into the text file should look something like this:

objdump

Objdump is similar but different. Again, there are a range of options we can use to specify which objects we want, but for speed and simplicity, we will use the -D argument to disassemble all the sections, not just the ones expected to contain instructions. Please be aware that this is very noisy and might result in several hundred thousand lines of output. It might also result in nothing depending on how the malware is compiled.

objdump -D malware.elf

strings

Often overlooked, strings is an excellent way of identifying human-readable content in a file. You can specify the minimum length string and, as a general guide, I find starting with -n8 is a good approach.

ldd

Finally a quick check of any libraries the binary calls with the ldd command.

ldd malware.elf > ldd.txt

This varies in effectiveness, with lots of malware families (especially ransomware) coming up empty here.

ldd output from a SIDEWALK malware sample

Dynamic Analysis

Next is checking how the application works. This does present an element of risk as for strace and ltrace, the binary has to be made executable. Always approach any analysis with caution.

gdb

You can use gdb (the "Gnu Debugger") to get a rapid assessment of some key components of the file. You can do this manually, or for speed, we can use some command line arguments to extract commonly analysed elements.

For our purposes this command will be useful, most of the time.

gdb malware.elf -ex 'info files' -ex 'disassemble main' -ex 'info functions' -ex 'info variables' -ex 'backtrace' -ex 'quit' > gdb.txt

This is an example of how the output might look.

KnowBe4 8 个月前

Understanding Malware Obfuscation: A Guide for…

Cyber Security News ? 2 个月前

How to Analyse Linux Malware in ANY.RUN Malware Sandbox

Cyber Security News ? 9 个月前

Example output from a SIDEWALK malware sample

For some malware samples this won't work - especially if they have a lot of protection mechanisms built in, so don't be too surprised if the output is effectively blank. A good reverse engineer will work around this but that is outside the scope of our rapid assessment.

Example from a Sodinokibi ransomware sample

While gdb is generally safe to run, the next two commands do significantly increase the risks. The file must be executable and the binary is running on the system. If this is a malware sample, it is likely to be able to run its payloads.

Only carry out these steps if you have an isolated analysis environment or sandbox where the impact from (for example) deploying ransomware would be limited.

strace

This command attaches a debugger to the running process and records system calls. At the most basic use, it will monitor the file until it exits which can be a considerable time. It is often a good idea to use the timeout command to avoid any system hangs.

strace -o strace.txt -f -tt -e trace=all -s 256 ./malware.elf

In this example, strace will write to an output file (-o), follow child processes (-f) and add a timestamp with millisecond precision to each entry. It will trace all system calls (-e trace=all) and capture up to 256 characters of strings.

In use, you have to specify the path correctly so either use full paths or ./ strings to point to objects in the same folder.

Alternatively, you can run the command with a timeout. This makes it much faster to run but can miss malicious activity if there are delays built in (a common anti-analysis technique).

timeout 10 strace -o strace.txt -f -e trace=all -c ./malware.elf

This example will kill the process after 10 seconds (timeout 10) and will summarise the trace (-c). The summary causes the output to lose millisecond precision timestamps.

Example of strace output against a SIDEWALK malware sample with millisecond timestamps

ltrace

ltrace is similar to strace in that it runs the file, however, this time it intercepts and records dynamic library calls.

ltrace -o ltrace.txt -f -S -C -tt ./malware.elf

In this example, the commands are:

-o output file.

-f follow child processes

-S show system calls

-C translate C++ symbols to make the output more readable

-tt microsecond timestamps

This is also a good candidate for using the timeout command to prevent it hanging for long periods of time.

timeout 30 ltrace -o ltrace.txt -f -S -C -tt ./malware.elf

Example ltrace output from a SIDEWALK malware sample

Scripting it

Now, when we have a situation where we run the same set of commands each time, then it is crying out to be scripted. This is no exception.

A starter example using a bash script to run the commands above is available at https://for577.com/analysisscript

This is not likely to be perfect in every environment, instead, it should be seen as a starting point to build your own. If you have access to an AI/LLM with an API gateway you could even look to automate the submission and response parts.

Working with "AI"

Once you have carried out the basic analysis, it is time to ask your AI platform for help. The exact syntax and process here will depend on your investigation (and the platform) but, in general, it works best to provide the evidence and ask for assistance in analysing the files.

Most LLM tools will struggle to make a definite "malware/not malware" assessment and you need to be wary of their answers. Instead, it is better to have the tool explain what the software does and then make the determination yourself.

Some example prompts which have worked well for me are:

I've attached the output from file, strings, and readelf run against an unknown file. Review the data and provide a summary of what this file is likely to do

(This was for a suspected ransomware sample and no dynamic analysis took place)

With this information, ChatGPT was able to quickly identify the file as probably ransomware. In this case, it was correct and the sample was Sodinokibi (https://www.virustotal.com/gui/file/a322b230a3451fd11dcfe72af4da1df07183d6aaf1ab9e062f0e6b14cf6d23cd)

It is worth noting that this detection was almost entirely down to the strings data.

Another example is:

Attached is the output from gdb, strings, file, ldd, readelf, strace and ltrace for a suspicious file. Read them and provide a summary of what this application is likely to do. Also provide a yara rule to hunt for other samples.

With this sample, objdump was 15mb in size and rejected by ChatGPT. However, the other files were sufficient for a good determination. Although the response was verbose, it did correctly identify that this was a C2 implant.

ChatGPT output based on analysis of a SIDEWALK implant.

The Yara rules might need some tweaking but definitely provide a good starting point for incident responders.

Conclusion

Using an AI/LLM can, in the right hands, speed up the incident response cycle and help free up DFIR staff for other tasks. It is important to remember that it does need skilled, knowledgeable staff to get good results.

In the examples here, it took under four minutes to collect the data on the Sodinokbi sample and have a determination, without using API calls. The SIDEWALK took slightly longer (six minutes) but did generate more useful data with dynamic analysis.

It is definitely worth adding this to your DFIR tool box if your organisation allows you to submit data to AI platforms, or if you have an internal AI tool.

Michael P.

Cyber Security Professional, Specialist in Penetration Testing, Malware Analysis, Digital Forensics, Cybercrime investigations...

5 个月

Nice, Ive been looking for something like this, students on Malware courses are increasingly familiar with AI, and it's always a question I get asked. This adds value!

Chaitanya Kurhe

5 个月

Thanks for sharing your expertise, Taz! Integrating AI into incident response workflows can definitely improve efficiency.

1 次回应

Klaus-Günther Schmidt

Penetration Tests | ICS, OT, IT | GREM, GPEN | Hacking company networks and products

5 个月

Stefan Zenk

1 次回应

Patryk M.

5 个月

How do we deal with AI halucinations though?

Lawrie Abercrombie FCIIS

Arcanum Cyber - Helping Businesses Operate Securely in Cyber Space - Principal Cyber Security Consultant

5 个月

Taz is one of the very few people I know who are genuinely very good at incident response. When he writes stuff like this, it’s very definitely worth paying attention.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Linux IR - AI-Assisted Malware Analysis

Taz Wake

Cyber security incident response | Threat hunting | Digital forensics | Certified SANS instructor & course author | I am not looking for any new certification training...

Introduction

Overview

Basic Analysis Commands

Basic Information

Static Analysis

Dynamic Analysis

领英推荐

Scripting it

Working with "AI"

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Analyze Sophisticated Malware with ANY.RUN’s ChatGPT Powered AI Assistant

Rhadamanthys, Spectre, and AI-Obfuscation: Stay Ahead of Evolving Attacks

Unmasking the MoreEggs Malware: A Deep Dive into the Latest Cyber Threat

Cyber Briefing - 2024.07.24

Cyber Briefing - 2023.09.12

11 Easy Tips to Speed Up Your Computer!

EchoSpoofing leverages Proofpoint to Send Millions of Convincing Spoofed Emails

Microsoft's Latest Update, Including Critical MSMQ Vulnerability

Hackers Use Fake GlobalProtect VPN Software in New WikiLoader Malware Attack

A New MSBuild Fileless Malware Campaign in Which Threat Actors Used MSBuild to Deliver Rats

Introduction

Overview

Basic Analysis Commands

Basic Information

Static Analysis

Dynamic Analysis

领英推荐

Scripting it

Working with "AI"

Conclusion

Incident Response - Filesystem Timeline Generation

2024年9月9日

Linux DFIR - Rapid Audit Log Ingestion with Elasticsearch

2024年8月30日

Linux Security - Forwarding the Journal logs

2024年7月17日

Linux IR - Creating evidence of execution in Linux

2024年7月8日

Linux Incident Response - Sticky Bits, SUID and SGID.

2024年7月1日

Cybersecurity - Training your staff.

2024年5月30日

Linux IR - Key forensic artifacts for incident responders

2024年4月24日

Linux Copy on Write for Incident Responders

2024年3月18日

Linux Incident Response - EXT4 superblock basics

2024年2月15日

Cron - the Linux task scheduler for incident responders

2024年2月13日

社区洞察

其他会员也浏览了

Analyze Sophisticated Malware with ANY.RUN’s ChatGPT Powered AI Assistant

Rhadamanthys, Spectre, and AI-Obfuscation: Stay Ahead of Evolving Attacks

Unmasking the MoreEggs Malware: A Deep Dive into the Latest Cyber Threat

Cyber Briefing - 2024.07.24

Cyber Briefing - 2023.09.12

11 Easy Tips to Speed Up Your Computer!

EchoSpoofing leverages Proofpoint to Send Millions of Convincing Spoofed Emails

Microsoft's Latest Update, Including Critical MSMQ Vulnerability

Hackers Use Fake GlobalProtect VPN Software in New WikiLoader Malware Attack

A New MSBuild Fileless Malware Campaign in Which Threat Actors Used MSBuild to Deliver Rats