Linux IR - AI-Assisted Malware Analysis
Introduction
Incident response often has to be fast. We are chasing an active attacker and trying to get control of a situation before anything gets worse. It is not often that we have the luxury of time.
This means that sometimes we have to find "quick and dirty" ways to resolve issues. In turn, this often carries some sacrifices - we give up a deep understanding and, instead, rely on a high-level review to make decisions.
Where this becomes most apparent on Linux intrusions, is dealing with malware. Even if we have the right skills and knowledge to take apart complex ELF files, it is rare we can put aside enough time to get value.
In this article, I will look at ways you can get a superficial assessment by running some basic commands and sharing the data with ChatGPT (or any AI/LLM platform).
Note 1: None of this is a replacement for having trained, capable, and experienced reverse engineers available. The "quick and dirty" approach is just to help you deal with the early stage of the incident. You absolutely need a deeper understanding at some point.
Note 2: I am going to use ChatGPT here but you need to consider your own situation. You need to make sure you have appropriate approvals to upload data to public platforms and you need to understand enough that you can identify where the platform is making a mistake (or just plain old lying to you). This is 100% not a replacement for skilled staff, it is just a way to improve your efficiency.
Overview
The high-level workflow basically has four steps:
The good news is that a lot of this is definitely scriptable and you could easily build this into your SOAR application with API calls to the AI/LLM platform. In this article we will focus on the basic commands to run and some suggested prompts.
A lot of this process will rely on you keeping good notes during your IR work. It should go without saying that this is critically important!
Basic Analysis Commands
In this article, I will use malware.elf as a placeholder for the file you are analysing.
Basic Information
file malware.elf > file_data.txt
This is an example of what the output should look like, although in use this would be redirected to a text file.
sha1sum malware.elf > sha1hash.txt
Static Analysis
readelf
The readelf command is used to extract information from an ELF file. It provides details on things such as the section headers, program headers, symbols etc. It has a range of command line arguments which allow you specify what specific items you are interested in, but for our purposes we will use the -a argument to collect all headers.
readelf -a malware.elf > readelf_evidence.txt
The output which is redirected into the text file should look something like this:
objdump
Objdump is similar but different. Again, there are a range of options we can use to specify which objects we want, but for speed and simplicity, we will use the -D argument to disassemble all the sections, not just the ones expected to contain instructions. Please be aware that this is very noisy and might result in several hundred thousand lines of output. It might also result in nothing depending on how the malware is compiled.
objdump -D malware.elf
strings
Often overlooked, strings is an excellent way of identifying human-readable content in a file. You can specify the minimum length string and, as a general guide, I find starting with -n8 is a good approach.
ldd
Finally a quick check of any libraries the binary calls with the ldd command.
ldd malware.elf > ldd.txt
This varies in effectiveness, with lots of malware families (especially ransomware) coming up empty here.
Dynamic Analysis
Next is checking how the application works. This does present an element of risk as for strace and ltrace, the binary has to be made executable. Always approach any analysis with caution.
gdb
You can use gdb (the "Gnu Debugger") to get a rapid assessment of some key components of the file. You can do this manually, or for speed, we can use some command line arguments to extract commonly analysed elements.
For our purposes this command will be useful, most of the time.
gdb malware.elf -ex 'info files' -ex 'disassemble main' -ex 'info functions' -ex 'info variables' -ex 'backtrace' -ex 'quit' > gdb.txt
This is an example of how the output might look.
领英推荐
For some malware samples this won't work - especially if they have a lot of protection mechanisms built in, so don't be too surprised if the output is effectively blank. A good reverse engineer will work around this but that is outside the scope of our rapid assessment.
While gdb is generally safe to run, the next two commands do significantly increase the risks. The file must be executable and the binary is running on the system. If this is a malware sample, it is likely to be able to run its payloads.
Only carry out these steps if you have an isolated analysis environment or sandbox where the impact from (for example) deploying ransomware would be limited.
strace
This command attaches a debugger to the running process and records system calls. At the most basic use, it will monitor the file until it exits which can be a considerable time. It is often a good idea to use the timeout command to avoid any system hangs.
strace -o strace.txt -f -tt -e trace=all -s 256 ./malware.elf
In this example, strace will write to an output file (-o), follow child processes (-f) and add a timestamp with millisecond precision to each entry. It will trace all system calls (-e trace=all) and capture up to 256 characters of strings.
In use, you have to specify the path correctly so either use full paths or ./ strings to point to objects in the same folder.
Alternatively, you can run the command with a timeout. This makes it much faster to run but can miss malicious activity if there are delays built in (a common anti-analysis technique).
timeout 10 strace -o strace.txt -f -e trace=all -c ./malware.elf
This example will kill the process after 10 seconds (timeout 10) and will summarise the trace (-c). The summary causes the output to lose millisecond precision timestamps.
ltrace
ltrace is similar to strace in that it runs the file, however, this time it intercepts and records dynamic library calls.
ltrace -o ltrace.txt -f -S -C -tt ./malware.elf
In this example, the commands are:
-o output file.
-f follow child processes
-S show system calls
-C translate C++ symbols to make the output more readable
-tt microsecond timestamps
This is also a good candidate for using the timeout command to prevent it hanging for long periods of time.
timeout 30 ltrace -o ltrace.txt -f -S -C -tt ./malware.elf
Scripting it
Now, when we have a situation where we run the same set of commands each time, then it is crying out to be scripted. This is no exception.
A starter example using a bash script to run the commands above is available at https://for577.com/analysisscript
This is not likely to be perfect in every environment, instead, it should be seen as a starting point to build your own. If you have access to an AI/LLM with an API gateway you could even look to automate the submission and response parts.
Working with "AI"
Once you have carried out the basic analysis, it is time to ask your AI platform for help. The exact syntax and process here will depend on your investigation (and the platform) but, in general, it works best to provide the evidence and ask for assistance in analysing the files.
Most LLM tools will struggle to make a definite "malware/not malware" assessment and you need to be wary of their answers. Instead, it is better to have the tool explain what the software does and then make the determination yourself.
Some example prompts which have worked well for me are:
I've attached the output from file, strings, and readelf run against an unknown file. Review the data and provide a summary of what this file is likely to do
(This was for a suspected ransomware sample and no dynamic analysis took place)
With this information, ChatGPT was able to quickly identify the file as probably ransomware. In this case, it was correct and the sample was Sodinokibi (https://www.virustotal.com/gui/file/a322b230a3451fd11dcfe72af4da1df07183d6aaf1ab9e062f0e6b14cf6d23cd)
It is worth noting that this detection was almost entirely down to the strings data.
Another example is:
Attached is the output from gdb, strings, file, ldd, readelf, strace and ltrace for a suspicious file. Read them and provide a summary of what this application is likely to do. Also provide a yara rule to hunt for other samples.
With this sample, objdump was 15mb in size and rejected by ChatGPT. However, the other files were sufficient for a good determination. Although the response was verbose, it did correctly identify that this was a C2 implant.
The Yara rules might need some tweaking but definitely provide a good starting point for incident responders.
Conclusion
Using an AI/LLM can, in the right hands, speed up the incident response cycle and help free up DFIR staff for other tasks. It is important to remember that it does need skilled, knowledgeable staff to get good results.
In the examples here, it took under four minutes to collect the data on the Sodinokbi sample and have a determination, without using API calls. The SIDEWALK took slightly longer (six minutes) but did generate more useful data with dynamic analysis.
It is definitely worth adding this to your DFIR tool box if your organisation allows you to submit data to AI platforms, or if you have an internal AI tool.
Cyber Security Professional, Specialist in Penetration Testing, Malware Analysis, Digital Forensics, Cybercrime investigations...
5 个月Nice, Ive been looking for something like this, students on Malware courses are increasingly familiar with AI, and it's always a question I get asked. This adds value!
?? Sr. TSE | AI Enthusiast | Virtual Assistant Developer | Conversational AI Developer | Prompt Expert | Elevating Customer Experiences | Former TSE @_VOIS | Technical Writer | Computer Science Grad | VIT'22 Alumnus
5 个月Thanks for sharing your expertise, Taz! Integrating AI into incident response workflows can definitely improve efficiency.
Penetration Tests | ICS, OT, IT | GREM, GPEN | Hacking company networks and products
5 个月Stefan Zenk
Senior SOC Analyst | GIAC GREM | SC-200 | MS-500 | AZ-500 | DipHE | Bachelor (Hons) | Shift Leader
5 个月How do we deal with AI halucinations though?
Arcanum Cyber - Helping Businesses Operate Securely in Cyber Space - Principal Cyber Security Consultant
5 个月Taz is one of the very few people I know who are genuinely very good at incident response. When he writes stuff like this, it’s very definitely worth paying attention.