A Memory Dump Analyzer in?Rust
Luis Soares, M.Sc.
Lead Software Engineer | Blockchain & ZK Protocol Engineer | ?? Rust | C++ | Web3 | Solidity | Golang | Cryptography | Author
Analyzing binary files and memory dumps is a common task in software development, especially in cybersecurity, reverse engineering, and low-level programming.?
In this article, we will build a memory and hex dump analyzer in Rust that provides an interactive UI to view, navigate, and search through binary data.?
By the end, you’ll have a tool capable of detecting specific byte patterns, and ASCII strings, and displaying them in an organized way. ??
1. Project?Overview
Our Rust Dump Analyzer will allow us to:
We’ll implement the tool with Rust’s crossterm and ratatui libraries to build an interactive command-line interface.
2. Setting Up the?Project
Begin by creating a new Rust project:
cargo new rust-dump-analyzer
cd rust-dump-analyzer
Add the following dependencies to Cargo.toml:
[dependencies]
crossterm = "0.20"
memchr = "2.5"
ratatui = "0.29" # for building the UI
3. Implementing the Core Functionality
Our analyzer’s core functions will focus on:
Let’s break down each of these components in detail.
Reading Binary?Data
The first task is reading the binary data from a file. We’ll implement a function called read_dump_file that opens a file, reads its contents into a byte vector, and returns this data.
This function needs to:
Here’s the full implementation of the read_dump_file function:
use std::fs::File;
use std::io::{self, Read};
fn read_dump_file(filename: &str) -> io::Result<Vec<u8>> {
let mut file = File::open(filename)?;
let mut buffer = Vec::new();
file.read_to_end(&mut buffer)?;
Ok(buffer)
}
This function will be used to load binary files, providing raw data for further analysis in subsequent functions.
Detecting ASCII?Strings
In binary files, ASCII strings often represent readable text or meaningful data. We want to identify these strings and their positions.
Our find_ascii_strings function will:
Here’s the complete implementation of find_ascii_strings:
fn find_ascii_strings(chunk: &[u8], chunk_offset: usize, min_length: usize) -> Vec<(String, usize)> {
let mut result = Vec::new();
let mut current_string = Vec::new();
let mut start_index = 0;
for (i, &byte) in chunk.iter().enumerate() {
if byte.is_ascii_graphic() || byte == b' ' {
if current_string.is_empty() {
start_index = i;
}
current_string.push(byte);
} else if current_string.len() >= min_length {
result.push((
String::from_utf8_lossy(¤t_string).to_string(),
chunk_offset + start_index,
));
current_string.clear();
} else {
current_string.clear();
}
}
if current_string.len() >= min_length {
result.push((
String::from_utf8_lossy(¤t_string).to_string(),
chunk_offset + start_index,
));
}
result
}
This function will be used to detect readable text within binary data, which can often reveal metadata, file names, and other useful information.
Detecting Known File?Patterns
Many file formats have specific “magic numbers”?—?unique byte sequences at the beginning of the file. Detecting these patterns can help identify embedded files or known data structures within the binary dump.
Our detect_patterns function will:
Here’s the complete detect_patterns implementation:
use memchr::memmem;
#[derive(Debug, Clone)]
struct Pattern {
name: &'static str,
bytes: &'static [u8],
}
fn detect_patterns(chunk: &[u8], chunk_offset: usize, patterns: &[Pattern]) -> Vec<(String, usize)> {
let mut results = Vec::new();
for pattern in patterns {
let mut start = 0;
while let Some(pos) = memmem::find(&chunk[start..], pattern.bytes) {
let actual_pos = chunk_offset + start + pos;
results.push((pattern.name.to_string(), actual_pos));
start += pos + 1;
}
}
results
}
With this function, our analyzer can identify embedded files and other known structures within the binary data.
领英推荐
Generating a Hex?Dump
A hex dump allows users to see the raw byte data in a readable format, often with corresponding ASCII characters. This is crucial for analyzing binary data because it presents both the raw hex values and readable characters side-by-side.
The hex_dump function will:
Here’s the complete hex_dump function:
fn hex_dump(chunk: &[u8], chunk_offset: usize, bytes_per_row: usize) {
for (i, line) in chunk.chunks(bytes_per_row).enumerate() {
// Print offset in hexadecimal
print!("{:08X} ", chunk_offset + i * bytes_per_row);
// Print each byte in hexadecimal
for byte in line {
print!("{:02X} ", byte);
}
// Pad if row is incomplete
if line.len() < bytes_per_row {
print!("{:width$}", "", width = (bytes_per_row - line.len()) * 3);
}
// Print ASCII representation
print!(" |");
for &byte in line {
if byte.is_ascii_graphic() {
print!("{}", byte as char);
} else {
print!(".");
}
}
println!("|");
}
}
This function can be called to display chunks of data in hex format, enabling users to see the raw byte values alongside any readable text.
Putting It All?Together
With these core functions in place, our dump analyzer can read a binary file, detect ASCII strings and known patterns, and display data in a hex dump format. Here’s a summary of each function:
Next, we can use these functions within an interactive UI to create a powerful tool for analyzing memory dumps and binary files. Here’s an example of how they might be used together:
fn main() -> io::Result<()> {
let filename = "test_dump.bin";
let chunk_size = 1024;
let min_string_length = 4;
let patterns = vec![
Pattern { name: "PDF", bytes: b"%PDF" },
Pattern { name: "JPEG", bytes: &[0xFF, 0xD8, 0xFF, 0xE0] },
Pattern { name: "ZIP", bytes: &[0x50, 0x4B, 0x03, 0x04] },
Pattern { name: "PNG", bytes: &[0x89, 0x50, 0x4E, 0x47] },
];
let data = read_dump_file(filename)?;
for chunk_offset in (0..data.len()).step_by(chunk_size) {
let chunk = &data[chunk_offset..chunk_offset + chunk_size.min(data.len() - chunk_offset)];
// Display hex dump
hex_dump(chunk, chunk_offset, 16);
// Detect ASCII strings
let ascii_strings = find_ascii_strings(chunk, chunk_offset, min_string_length);
for (string, addr) in ascii_strings {
println!("ASCII String '{}' found at 0x{:X}", string, addr);
}
// Detect patterns
let detected_patterns = detect_patterns(chunk, chunk_offset, &patterns);
for (pattern, addr) in detected_patterns {
println!("Pattern '{}' found at 0x{:X}", pattern, addr);
}
}
Ok(())
}
This code will process the file in chunks, displaying each section as a hex dump and reporting any ASCII strings or known patterns found.?
Generating a dump file for testing
Here’s a simple Rust program that creates a memory dump file with known patterns, ASCII strings, and random data for testing our implementation.
use std::fs::File;
use std::io::{self, Write};
use rand::Rng;
fn main() -> io::Result<()> {
let mut file = File::create("test_dump.bin")?;
// Insert a PDF header signature at the beginning
file.write_all(b"%PDF-1.4\n")?;
// Fill with random data for padding
let padding: Vec<u8> = (0..1024).map(|_| rand::thread_rng().gen()).collect();
file.write_all(&padding)?;
// Insert a JPEG signature at 1KB offset
file.write_all(b"\xFF\xD8\xFF\xE0")?;
// More padding
let padding: Vec<u8> = (0..1024).map(|_| rand::thread_rng().gen()).collect();
file.write_all(&padding)?;
// Insert an ASCII string at 2KB offset
file.write_all(b"Hello, this is a test ASCII string.")?;
// Add more random data to reach a certain size, e.g., 1 MB
let padding: Vec<u8> = (0..1024 * 1024 - 4096).map(|_| rand::thread_rng().gen()).collect();
file.write_all(&padding)?;
println!("Generated test_dump.bin with known patterns for testing.");
Ok(())
}
You can also generate a dump file using your operating system tools, such as gcore or memdump in Linux.
Running the?Analyzer
You can now run the Analyzer tool with:
cargo run --bin dump /your_dump_file.bin
You can now build on this foundation with a UI improvement in usability and interactivity.
Click here to see the full implementation on my Github .?
?? Discover More Free Software Engineering Content!???
If you enjoyed this post, be sure to explore my new software engineering blog, packed with 200+ in-depth articles, ?? explainer videos, ??? a weekly software engineering podcast, ?? books, ?? hands-on tutorials with GitHub code, including:?
?? Developing a Fully Functional API Gateway in Rust ? —?Discover how to set up a robust and scalable gateway that stands as the frontline for your microservices.
?? Implementing a Network Traffic Analyzer ?—?Ever wondered about the data packets zooming through your network? Unravel their mysteries with this deep dive into network analysis.
??Implementing a Blockchain in Rust? —?a step-by-step breakdown of implementing a basic blockchain in Rust, from the initial setup of the block structure, including unique identifiers and cryptographic hashes, to block creation, mining, and validation, laying the groundwork.
and much more!
? 200+ In-depth software engineering articles ?? Explainer Videos?—?Explore Videos ??? A brand-new weekly Podcast on all things software engineering?—?Listen to the Podcast ?? Access to my books?—?Check out the Books ?? Hands-on Tutorials with GitHub code ?? Book a Call
LinkedIn Newsletter : Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here
?? Connect with Me:
Wanna talk? Leave a comment or drop me a message!
All the best,
Luis Soares
Lead Software Engineer | Blockchain & ZKP Protocol Engineer | ?? Rust | Web3 | Solidity | Golang | Cryptography | Author
Passionate Rustacean ??
3 周In `detect_patterns`, you can simply insert the &'static str into the results Vec, there isn't any reason to convert the names into heap-allocated Strings there. Note: Vecs and any generic constructs do not need owned data at all, they also accept & and &mut into any T generic params. You can create Option<&mut String> or Vec<&str>! Just remember, when using references in generics you need to use lifetimes. It is very cool that a Vec<&'a str> will correctly track the lifetime 'a of the source &strs you put inside it, i.e., meaning it won't let the &strs go out of scope before the Vec itself. And, 'static is also a lifetime, but one that means it will be valid for the entire duration of the program, so it can always be safely used inside anything.