Linux IR - AI-Assisted Malware Analysis

Linux IR - AI-Assisted Malware Analysis

Introduction

Incident response often has to be fast. We are chasing an active attacker and trying to get control of a situation before anything gets worse. It is not often that we have the luxury of time.

This means that sometimes we have to find "quick and dirty" ways to resolve issues. In turn, this often carries some sacrifices - we give up a deep understanding and, instead, rely on a high-level review to make decisions.

Where this becomes most apparent on Linux intrusions, is dealing with malware. Even if we have the right skills and knowledge to take apart complex ELF files, it is rare we can put aside enough time to get value.

In this article, I will look at ways you can get a superficial assessment by running some basic commands and sharing the data with ChatGPT (or any AI/LLM platform).

Note 1: None of this is a replacement for having trained, capable, and experienced reverse engineers available. The "quick and dirty" approach is just to help you deal with the early stage of the incident. You absolutely need a deeper understanding at some point.

Note 2: I am going to use ChatGPT here but you need to consider your own situation. You need to make sure you have appropriate approvals to upload data to public platforms and you need to understand enough that you can identify where the platform is making a mistake (or just plain old lying to you). This is 100% not a replacement for skilled staff, it is just a way to improve your efficiency.

AI platforms have limits. They can be great for quick assessments but rarely stand up to scrutiny...

Overview

High-level workflow

The high-level workflow basically has four steps:

  • Discover a suspicious file during an incident.
  • Run basic analysis commands, save the output as a text file.
  • Share output with your AI/LLM model and provide a prompt to support your analysis.
  • Review output and act on any new knowledge.

The good news is that a lot of this is definitely scriptable and you could easily build this into your SOAR application with API calls to the AI/LLM platform. In this article we will focus on the basic commands to run and some suggested prompts.

A lot of this process will rely on you keeping good notes during your IR work. It should go without saying that this is critically important!

Basic Analysis Commands

In this article, I will use malware.elf as a placeholder for the file you are analysing.

Basic Information

  • Note the filename, where it was discovered and any immediately relevant information or clues about its behaviour.
  • Collect file metadata

file malware.elf > file_data.txt        

This is an example of what the output should look like, although in use this would be redirected to a text file.

sha1sum malware.elf > sha1hash.txt        

Static Analysis

readelf

The readelf command is used to extract information from an ELF file. It provides details on things such as the section headers, program headers, symbols etc. It has a range of command line arguments which allow you specify what specific items you are interested in, but for our purposes we will use the -a argument to collect all headers.

readelf -a malware.elf > readelf_evidence.txt        

The output which is redirected into the text file should look something like this:

Example readelf output

objdump

Objdump is similar but different. Again, there are a range of options we can use to specify which objects we want, but for speed and simplicity, we will use the -D argument to disassemble all the sections, not just the ones expected to contain instructions. Please be aware that this is very noisy and might result in several hundred thousand lines of output. It might also result in nothing depending on how the malware is compiled.

objdump -D malware.elf        
Example Objdump output

strings

Often overlooked, strings is an excellent way of identifying human-readable content in a file. You can specify the minimum length string and, as a general guide, I find starting with -n8 is a good approach.

Example output from running strings on a malware sample

ldd

Finally a quick check of any libraries the binary calls with the ldd command.

ldd malware.elf > ldd.txt        

This varies in effectiveness, with lots of malware families (especially ransomware) coming up empty here.

ldd output from a SIDEWALK malware sample

Dynamic Analysis

Next is checking how the application works. This does present an element of risk as for strace and ltrace, the binary has to be made executable. Always approach any analysis with caution.

gdb

You can use gdb (the "Gnu Debugger") to get a rapid assessment of some key components of the file. You can do this manually, or for speed, we can use some command line arguments to extract commonly analysed elements.

For our purposes this command will be useful, most of the time.

gdb malware.elf -ex 'info files' -ex 'disassemble main' -ex 'info functions' -ex 'info variables' -ex 'backtrace' -ex 'quit' > gdb.txt        

This is an example of how the output might look.

Example output from a SIDEWALK malware sample

For some malware samples this won't work - especially if they have a lot of protection mechanisms built in, so don't be too surprised if the output is effectively blank. A good reverse engineer will work around this but that is outside the scope of our rapid assessment.

Example from a Sodinokibi ransomware sample

While gdb is generally safe to run, the next two commands do significantly increase the risks. The file must be executable and the binary is running on the system. If this is a malware sample, it is likely to be able to run its payloads.

Only carry out these steps if you have an isolated analysis environment or sandbox where the impact from (for example) deploying ransomware would be limited.

strace

This command attaches a debugger to the running process and records system calls. At the most basic use, it will monitor the file until it exits which can be a considerable time. It is often a good idea to use the timeout command to avoid any system hangs.

strace -o strace.txt -f -tt -e trace=all -s 256 ./malware.elf        

In this example, strace will write to an output file (-o), follow child processes (-f) and add a timestamp with millisecond precision to each entry. It will trace all system calls (-e trace=all) and capture up to 256 characters of strings.

In use, you have to specify the path correctly so either use full paths or ./ strings to point to objects in the same folder.

Alternatively, you can run the command with a timeout. This makes it much faster to run but can miss malicious activity if there are delays built in (a common anti-analysis technique).

timeout 10 strace -o strace.txt -f -e trace=all -c ./malware.elf        

This example will kill the process after 10 seconds (timeout 10) and will summarise the trace (-c). The summary causes the output to lose millisecond precision timestamps.

Example of strace output against a SIDEWALK malware sample with millisecond timestamps

ltrace

ltrace is similar to strace in that it runs the file, however, this time it intercepts and records dynamic library calls.

ltrace -o ltrace.txt -f -S -C -tt ./malware.elf        

In this example, the commands are:

-o output file.

-f follow child processes

-S show system calls

-C translate C++ symbols to make the output more readable

-tt microsecond timestamps

This is also a good candidate for using the timeout command to prevent it hanging for long periods of time.

timeout 30 ltrace -o ltrace.txt -f -S -C -tt ./malware.elf        
Example ltrace output from a SIDEWALK malware sample

Scripting it

Now, when we have a situation where we run the same set of commands each time, then it is crying out to be scripted. This is no exception.

A starter example using a bash script to run the commands above is available at https://for577.com/analysisscript

This is not likely to be perfect in every environment, instead, it should be seen as a starting point to build your own. If you have access to an AI/LLM with an API gateway you could even look to automate the submission and response parts.

Working with "AI"

Once you have carried out the basic analysis, it is time to ask your AI platform for help. The exact syntax and process here will depend on your investigation (and the platform) but, in general, it works best to provide the evidence and ask for assistance in analysing the files.

Most LLM tools will struggle to make a definite "malware/not malware" assessment and you need to be wary of their answers. Instead, it is better to have the tool explain what the software does and then make the determination yourself.

Some example prompts which have worked well for me are:

I've attached the output from file, strings, and readelf run against an unknown file. Review the data and provide a summary of what this file is likely to do

(This was for a suspected ransomware sample and no dynamic analysis took place)

With this information, ChatGPT was able to quickly identify the file as probably ransomware. In this case, it was correct and the sample was Sodinokibi (https://www.virustotal.com/gui/file/a322b230a3451fd11dcfe72af4da1df07183d6aaf1ab9e062f0e6b14cf6d23cd)

Correctly identified Sodinokibi

It is worth noting that this detection was almost entirely down to the strings data.

Another example is:

Attached is the output from gdb, strings, file, ldd, readelf, strace and ltrace for a suspicious file. Read them and provide a summary of what this application is likely to do. Also provide a yara rule to hunt for other samples.

With this sample, objdump was 15mb in size and rejected by ChatGPT. However, the other files were sufficient for a good determination. Although the response was verbose, it did correctly identify that this was a C2 implant.

ChatGPT output based on analysis of a SIDEWALK implant.

The Yara rules might need some tweaking but definitely provide a good starting point for incident responders.

Conclusion

Using an AI/LLM can, in the right hands, speed up the incident response cycle and help free up DFIR staff for other tasks. It is important to remember that it does need skilled, knowledgeable staff to get good results.

In the examples here, it took under four minutes to collect the data on the Sodinokbi sample and have a determination, without using API calls. The SIDEWALK took slightly longer (six minutes) but did generate more useful data with dynamic analysis.

It is definitely worth adding this to your DFIR tool box if your organisation allows you to submit data to AI platforms, or if you have an internal AI tool.

Michael P.

Cyber Security Professional, Specialist in Penetration Testing, Malware Analysis, Digital Forensics, Cybercrime investigations...

5 个月

Nice, Ive been looking for something like this, students on Malware courses are increasingly familiar with AI, and it's always a question I get asked. This adds value!

回复
Chaitanya Kurhe

?? Sr. TSE | AI Enthusiast | Virtual Assistant Developer | Conversational AI Developer | Prompt Expert | Elevating Customer Experiences | Former TSE @_VOIS | Technical Writer | Computer Science Grad | VIT'22 Alumnus

5 个月

Thanks for sharing your expertise, Taz! Integrating AI into incident response workflows can definitely improve efficiency.

Klaus-Günther Schmidt

Penetration Tests | ICS, OT, IT | GREM, GPEN | Hacking company networks and products

5 个月
Patryk M.

Senior SOC Analyst | GIAC GREM | SC-200 | MS-500 | AZ-500 | DipHE | Bachelor (Hons) | Shift Leader

5 个月

How do we deal with AI halucinations though?

回复
Lawrie Abercrombie FCIIS

Arcanum Cyber - Helping Businesses Operate Securely in Cyber Space - Principal Cyber Security Consultant

5 个月

Taz is one of the very few people I know who are genuinely very good at incident response. When he writes stuff like this, it’s very definitely worth paying attention.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了