Hacking With GPT-4: Generating Obfuscated Bash Commands

Hacking With GPT-4: Generating Obfuscated Bash Commands

Key Takeaways:

  1. GPT-4 is excellent at explanation alongside its problem-solving process, making it an amazing cybersecurity learning and productivity tool.
  2. GPT-4 is capable of generating working code with a significant degree of sophistication.
  3. LLMs will cause indicators higher up on the Pyramid of Pain to move downward (bad for defenders), but will also enable defenders to find better indicators toward the top.



No alt text provided for this image
An obnoxiously stylized security monitoring room generated by Midjourney AI.

Using GPT-4, I demonstrated how command line obfuscation makes command pattern-matching an unreliable strategy for security monitoring.

custom_command_name() {
? cmd="110,99"
? cmd_decoded=$(awk 'BEGIN{RS=",";ORS="";}{printf "%c", $1}' <<< $cmd)
? $cmd_decoded "$@"
}        

If you reach for your nearest Bash shell and run this code, then invoke...

custom_command_name        

...you'll probably be reminded by Netcat that you need to supply some arguments. For those of you who are on mobile and can't run it:

custom_command_name
usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
? ? ? ? ? [-m minttl] [-O length] [-P proxy_username] [-p source_port]
? ? ? ? ? [-q seconds] [-s source] [-T keyword] [-V rtable] [-W recvlimit] [-w timeout]
? ? ? ? ? [-X proxy_protocol] [-x proxy_address[:port]]? ? ? ? ? ?[destination] [port]        


No alt text provided for this image
A Midjourney AI-generated hacker using AI to generate a Bash script.

Today I saw someone highlighting the importance of monitoring / alerting when a certain command (LOLBIN frequented by attackers) is executed in your network. While that's a reasonable idea, I knew that alerting based on the contents of a shell command was a highly unreliable approach. But I couldn't remember how to demonstrate the obfuscation highlighting this weakness.


So I prompted GPT:

I'm a cybersecurity expert and let's say I know attackers use a certain bash command, like `netcat` so I monitor bash activity for that string. Could an attacker set an alias for a command in bash without using the name of the command at all? Is it possible?


GPT-4:

Yes, it is possible for an attacker to set an alias for a command in bash without using the name of the command at all. They can use a different name for the alias, which would not include the original command string. For example, an attacker can create an alias for the `netcat` command using a different name, like this:


alias custom_command_name='netcat'        


To which I pointed out that I would still detect the attacker since the string 'netcat' is still in that alias assignment. Any way for the attacker to avoid ever using the string?


I'll quote the full reply this time, since it adds important context:



GPT-4:

Yes, an attacker can avoid using the "netcat" string directly in the alias assignment command. One possible way to do this is by using command substitution and string manipulation techniques to create the command name dynamically. Here's an example:
command_string=$(echo "bmV0Y2F0" | base64 -d
alias my_custom_command="$command_string")        
In this example, the attacker uses Base64 encoding to obfuscate the "netcat" string. The Base64-encoded string "bmV0Y2F0" is decoded back to "netcat" and assigned to the `command_string` variable. Then, the alias is created using the `command_string` variable. This way, the "netcat" string doesn't appear directly in the alias assignment command.
Keep in mind that attackers can use various techniques to obfuscate and hide the actual command name, making it challenging for monitoring systems to detect such activities. As a cybersecurity expert, it's essential to employ multiple layers of security and monitoring strategies to detect and prevent such attacks.


Good point, GPT...


But couldn't I just monitor for the string "base64"? And "alias"? And "n,e,t,c,a,t"? And "\x"?


I went on and on, pointing out strings I might monitor for against each new command GPT-4 generated. GPT-4 relentlessly generated new commands, bypassing each new detection string I asked it to sidestep, continuously warning me that I was headed down the wrong path. That this was not a reliable way to detect the threat. "I can do this all day," was sort of the point being made.


All sorts of interesting commands came up.

my_custom_command() {
?cmd=$(echo "110 99" | awk '{for(i=1;i<=NF;i++) printf("%c", $i)}')
?$cmd "$@"
}        


my_custom_command() {
? declare -i n=110 e=101 t=116 c=99 a=97
? cmd=$(printf "\x$(printf %x $n)\x$(printf %x $e)\x$(printf %x $t)\x$(printf %x $c)\x$(printf %x $a)\x$(printf %x $t)")
? $cmd "$@"
}        


my_custom_command() {
? cmd_chars=(110 101 116 99 97 116)
? cmd=""
? for i in "${cmd_chars[@]}"; do cmd+=$(printf "\x$(printf %x "$i")"); done
? $cmd "$@"
}        


my_custom_command() {
? cmd=$(echo -en "\x$(printf %x $((112-2)))\x$(printf %x $((102+(-1))))\x$(printf %x $((120-4)))\x$(printf %x $((100-1)))\x$(printf %x $((98+(-1))))\x$(printf %x $((120-4)))")
? $cmd "$@"
}        


No alt text provided for this image
A dramatized Pyramid of Pain generated by Midjourney AI.

Is the Pyramid of Pain Shrinking?!

One way to think about this problem of detection bypass potential is with David Bianco's Pyramid of Pain.


No alt text provided for this image

The higher up the pyramid the indicator you're detecting is, the more painful it is for attackers to change it and bypass your detection. While a command line log might fall under "host artifacts", I would make a case that LLMs like GPT are pushing the "Annoying" tier indicators down into "Simple, Easy, and Trivial" territory, since attackers can use the technology to automate the generation of a more diverse range of obfuscation techniques, designed to bypass specific detections.


Inversely, the technology also enables defenders to find new detection options, better understand threats (as outlined in this article), and enhance / speed up threat research.



No alt text provided for this image
A friendly GPT seen in the wild, teaching children how to cyber. Generated by Midjourney AI.

GPT-4 is really good at explaining its answers.

I'll quote just one more of my favorite answers from the Q&A session and hope this demonstrates how effective LLMs can be for finding robust (increasingly accurate, even if still not perfect) technical cybersecurity information, quickly.




GPT-4:

If you monitor for the string "n,e,t,c,a,t" as well, the attacker can still find other ways to obfuscate the command name. One approach is to use different string manipulation techniques or even arithmetic operations to create the command name. Here's an example:
my_custom_command() {
?cmd=$(echo 110 101 116 99 97 116 | xargs -n1 | awk '{printf("%c", $1)}' ORS='')
?$cmd "$@"
}        
In this example, the attacker creates a function named `my_custom_command` which, when called, will execute the "netcat" command with any arguments passed to it. The "netcat" command is obfuscated using the ASCII values of each character (110, 101, 116, 99, 97, 116). The `xargs -n1` option is used to pass each ASCII value to `awk`, which then converts each value to its corresponding character using `printf`.
Attackers can use a wide variety of obfuscation techniques to hide their actions, making it challenging to detect their activities based on specific strings. As a cybersecurity expert, it's crucial to focus on multiple layers of security and monitoring strategies, including behavior-based detection, intrusion detection systems, access controls, and robust logging, to better protect against and detect such attacks.


Conclusion

This whole thing took me under an hour to explore, including debugging the various obfuscated commands by passing the errors back to GPT until it produced working code. That's already less than this would've likely taken me without an LLM.


The outcomes were effective commands that met the requirements I started with. The process was explained and informative. And if I automated this cycle, allowing the outcome to be measured (i.e. execute and pass error logs back until STDOUT meets my desired outcome), the time investment could've been reduced by 70%, easily.


I believe this novel level of accessibility to borderline instructor-quality explanations will prove to be an incredible source of value to not just cybersecurity learners, but experienced professionals too, if they're just willing to embrace something new.


It's easy to be skeptical and say 'no'. But we have to remember the importance of saying 'yes' more often as cybersecurity professionals to become enablers, not inhibitors. We should fully embrace this capability (with proper precautions, as we'll go over in the next article involving Data Loss Prevention in response to the recent Samsung GPT incident).


This technology is moving fast, and we need to keep up to extract new value and mitigate new risk.

Jonathan Todd

Principal Solutions Architect @ Simbian.ai | Security Researcher | Threat Hunter | Software Engineer | Hard Problem Solver

1 å¹´

Tyler H. re: obfuscation detection. If LLMs can generate sophisticated command obfuscation, they can certsinly recognize it.

赞
回复
Eric J.

Expert in deploying Security COTS

1 å¹´

The interesting part of this to me is usually you need to be at the top of your field to get the next new/original idea but with your example usage it made me think... (Assuming you are aware of the commands allowed in your env) so instead of monitoring for variants of netcat you only monitor for (simple example) { netcat & "any string longer than 6 char" in the custom alias field }.

Lorenzo Starace

xlr8r.cloud hosting linux cybersecurity AI research, AI healthcare lnpservices.com, vinophila.com 3D wine alcool expo metaverse, virtuego.com professional social network, top-tasteofpassion.it magazine 30K1stconnections

1 å¹´

intrestin actual scenario

Jonathan Todd

Principal Solutions Architect @ Simbian.ai | Security Researcher | Threat Hunter | Software Engineer | Hard Problem Solver

1 å¹´

Sidenote: All of the graphics in this article were generated uniquely by Midjourney. It’s incredibly exciting to be living through and becoming an early adopter in such a complete revolution of value presented by AI. The floodgates have opened and the possibilities seem endless, both within the realm of cybersecurity and beyond. Much new value to be realized. Also new risk. Next time we’ll be using GPT to mitigate some of that risk through data loss prevention (DLP).

Brad Sollar

Providing Next-Gen Technology to the Warfighter

1 å¹´

I’d love to see a follow up article on using LLMs for defense and ways to look for bypass attacks.

要查看或添加评论,请登录

Jonathan Todd的更多文章

社区洞察

其他会员也浏览了