登录查看更多内容

Why I Ditched Libraries to Decode Image Files

Sivuyile Sifuba

Data Specialist | BSc(Eng) in Mechatronics (UCT)

发布日期: 2025年1月13日

One of my favourite areas of study is digital signal processing (DSP). What makes this field particularly exciting is its practical nature—I get to apply mathematical concepts to solve real-world problems. As someone deeply passionate about mathematics, especially applied mathematics, DSP offers the perfect balance between theory and practice. This is also why I enjoy image processing, a fascinating subfield of DSP.

Why Image Processing?

Image processing stands out because it allows you to visualize results. Unlike abstract calculations, where results are often theoretical, image processing lets you see the direct impact of the math you apply. The interplay between mathematics, programming, and visual feedback makes it incredibly rewarding.

When working with images digitally, most people rely on libraries such as Pillow or OpenCV to simplify tasks like reading, writing, and manipulating image data. These libraries abstract away the underlying complexity of how images are stored and processed on a computer. While this abstraction is convenient, I wanted to challenge myself to understand the lower-level details—how image data is structured, stored, and interpreted by computers.

For this purpose, I chose to work with the BMP (bitmap) file format. The BMP format is known for its simplicity, with a straightforward structure and an easy-to-understand compression scheme (when compression is applied). By focusing on BMP, I could learn how to process image files without relying on external libraries.

Understanding and Reading BMP Files

To achieve this, I decided to use Python. Python is a versatile programming language that supports multiple paradigms, including procedural, object-oriented, and functional programming. For this task, I chose a procedural programming approach, which breaks down the program into a series of functions, making the process more intuitive and easier to explain.

Let’s start by importing the io module, which is essential for handling the file operations required to work with BMP files.

import io

This module is part of Python’s standard library and provides tools for handling input and output operations. It supports three types of I/O:

Text I/O: Used for handling text data.
Binary I/O: Used for handling raw binary data, such as image files.
Buffered I/O: Adds a layer of buffering to improve performance during I/O operations.

For this exercise, we work with Binary I/O because BMP files store image data as raw bytes. The io module allows us to open the file, read its contents, and process it efficiently.

Opening the BMP File

To work with BMP files, we’ll define a function specifically for reading them. This approach encapsulates the file-opening logic, making the code reusable and easier to maintain. Since BMP files store image data as raw bytes, we’ll use Python’s io module to handle Binary I/O. The Python io module, part of the standard library, is a great tool for this purpose. It supports different types of I/O operations, including Binary I/O.

The io.open function allows us to open files in various modes, such as reading, writing, or appending. This function is flexible, and it works seamlessly with both text and binary data. For our task, we’ll open the file in read-binary mode (rb), enabling us to access the raw byte data directly stored in the BMP file.

def read_bmp(file_path):
? ? with io.open(file_path, 'rb') as f:

?The function returns a BufferedReader object, which facilitates efficient reading of binary data by buffering it in memory. This makes file access significantly faster and reduces the number of disks read operations.

The with statement is a context manager that simplifies resource management in Python. When used with io.open, it ensures that the file is automatically closed after the block of code is executed, even if an error occurs. This eliminates the need to manually call f.close() and prevents resource leaks. At this stage, we’re ready to explore the file’s contents and decode its structure.

BMP File Structure

Now that we’ve successfully opened the file, it’s time to explore the structure of a BMP file. Understanding this structure is essential for interpreting and processing the data it contains. A BMP file is organized into three main sections, each with a distinct role:

Bitmap File Header This is the first section of the BMP file and contains general information about the file. It specifies details such as the total file size, the file type (identified by a magic number), and the offset where the pixel data begins. The file header acts as a guide, pointing to critical parts of the file.
DIB Header Following the file header is the DIB (Device Independent Bitmap) header, which provides key details about the image itself. This section includes information like the image’s dimensions (width and height), the number of bits per pixel (bit depth), the compression method (if any), and the size of the raw bitmap data. The DIB header ensures that the image data can be interpreted and displayed correctly.
Pixel Data The final section contains the raw image data, which represents the pixels of the image. This data is stored row by row, with each row (or scanline) typically padded to align with a multiple of 4 bytes. This alignment is crucial for compatibility with memory storage requirements, something we’ll need to account for when processing the data.

Understanding these sections lays the foundation for working with BMP files. With this knowledge, we can begin extracting and manipulating the file contents.

Verifying the File Format

The first two bytes of a BMP file are part of the BITMAPFILEHEADER and serve as a signature to identify the file type. For BMP files, this signature is "BM" in ASCII. Verifying this signature is a crucial first step in ensuring the file is a valid BMP before attempting further processing. Using the BufferedReader object created in the with statement, we can read these two bytes and validate the file format:

def read_bmp(file_path):
    with io.open(file_path, 'rb') as f:
        # Read the BITMAPFILEHEADER
        bfType = f.read(2).decode()
        if bfType != 'BM':
            raise ValueError("Not a valid BMP file.")?

The f.read(2) call reads the first two bytes of the BITMAPFILEHEADER. These bytes represent the file's signature and are essential for identifying the file as a BMP. The .decode() method converts the binary data into a human-readable string for easy comparison with "BM", the expected signature for BMP files. If the signature does not match "BM", the file is not a valid BMP. In this case, a ValueError is raised to stop further processing.

def read_bmp(file_path):
    with io.open(file_path, 'rb') as f:
        # Read the BITMAPFILEHEADER
        bfType = f.read(2).decode()
        if bfType != 'BM':
            raise ValueError("Not a valid BMP file.")
        bfSize = int.from_bytes(f.read(4), 'little')
        bfReserved1 = f.read(2)
        bfReserved2 = f.read(2)
        bfOffBits = int.from_bytes(f.read(4), 'little')?

After verifying the file type, the next step is to read the remaining fields of the BITMAPFILEHEADER to extract metadata about the BMP file. The f.read(4) call reads the next four bytes, which represent the total size of the BMP file, including the headers and pixel data. These bytes are then converted to an integer using the int.from_bytes(..., 'little') method. The 'little' argument specifies that the data is stored in little-endian format, the standard format for BMP files. Following this, the f.read(2) calls read two 2-byte fields: bfReserved1 and bfReserved2. These fields are reserved by the BMP specification and are typically set to zero. Although they are not used for processing, reading them is necessary to maintain proper alignment in the file structure.

Finally, the f.read(4) call reads the offset to the pixel data. This value specifies the number of bytes from the beginning of the file to where the actual pixel data starts. It is converted into an integer using the int.from_bytes(..., 'little') method and stored in the bfOffBits variable. This offset is essential for correctly accessing the pixel data later in the program.

领英推荐

SciPy

360DigiTMG 1 年前

A Comprehensive Guide to Image Processing in Python…

Crimson Tech 1 年前

The Ultimate Guide to Essential Machine Learning…

Prag Robotics 7 个月前

After we finish reading the BITMAPFILEHEADER, the next step is to process the DIB (Device Independent Bitmap) header. This section is the blueprint of the image, providing all the essential details we need to understand how the image is structured. The very first field in the DIB header is its size, which tells us what kind of header it is and how much information to expect.

      # Read the size of the DIB header
      dib_header_size = int.from_bytes(f.read(4), 'little')?

Based on this size, we can determine if the BMP file uses the common BITMAPINFOHEADER, the extended BITMAPV4HEADER, or the advanced BITMAPV5HEADER.

If the header size is 40 bytes, we know it’s a BITMAPINFOHEADER, the most widely used format for BMP files. This header includes crucial details:

Width and Height: These fields tell us the dimensions of the image, measured in pixels. The width is the number of pixels horizontally, and the height is the number of pixels vertically. A positive height means the image data is stored starting from the bottom row (bottom-up), while a negative height means it starts from the top row (top-down).
Bit Depth: This tells us how many bits are used to represent each pixel. For example, 24 bits mean the image uses true colour (RGB), with 8 bits for each colour channel—red, green, and blue. If the bit depth is 32, there’s an additional channel for transparency (alpha).
Compression: This field specifies if the image data is compressed. A value of 0 means no compression, which makes it simpler to process because the pixel data is stored exactly as it is.

In addition to these, the BITMAPINFOHEADER has fields for the size of the pixel data, the resolution (in pixels per meter), and details about the colour palette. Even if these fields aren’t always used directly, they are necessary for correctly interpreting the image.

When the DIB header size is 108 bytes, the BMP file uses a BITMAPV4HEADER, which includes everything in the BITMAPINFOHEADER but adds more features for advanced graphics. For example:

Colour Masks: These fields, like the red mask and blue mask, describe how the colour channels (red, green, blue, and alpha) are stored in the pixel data. This allows for more flexibility in how the image data is represented.
Gamma Correction: These fields adjust brightness and colours for more accurate display. This is especially important for high-quality imaging applications.

The largest DIB header type, at 124 bytes, is the BITMAPV5HEADER. This header is built for advanced colour management and adds features like embedded colour profiles. These profiles ensure that colours look consistent across different devices, which is critical for professional work in graphic design and photography. The BITMAPV5HEADER also has a field for rendering intent, which tells the program how the colours should be displayed—whether to prioritize accuracy or aesthetics.

Each of these headers plays an important role in decoding the image data correctly. By parsing the DIB header, we get all the information we need to locate and understand the raw pixel data in the final section of the BMP file. This process is like piecing together a map that leads us to the heart of the image. Once we have this information, we’re ready to extract and process the pixel data, turning the bytes stored in the file into a visual image we can see and manipulate.

Extracting Pixel Data

Inside the read_bmp function, there’s another function called read_pixel_data. This nested function is responsible for extracting the raw pixel data from the BMP file and converting it into a structured format. The main purpose of this function is to handle the layout of pixel data, which can vary depending on the image’s width, height, and bit depth.

def read_pixel_data(biWidth, biHeight, biBitCount):
            # Move file pointer to pixel data
            f.seek(bfOffBits)
            pixel_data = []

            # Calculate bytes per pixel and row padding
            bytes_per_pixel = biBitCount // 8
            row_size = (biWidth * bytes_per_pixel + 3) & ~3

            for y in range(biHeight):
                row = []
                for x in range(biWidth):
                    pixel = f.read(bytes_per_pixel)
                    if biBitCount == 24:
                        b, g, r = pixel[0], pixel[1], pixel[2]
                        row.append((r, g, b))
                    elif biBitCount == 32:
                        b, g, r, a = pixel[0], pixel[1], pixel[2], pixel[3]
                        row.append((r, g, b, a))
                f.read(row_size - biWidth * bytes_per_pixel)
                pixel_data.append(row)

            return pixel_data?

The read_pixel_data function starts by moving the file pointer to the position where the pixel data begins, as specified by the bfOffBits value read earlier from the BITMAPFILEHEADER. This ensures the function reads the correct part of the file where the image data is stored. BMP files organize pixel data row by row, and each row is padded so its size is a multiple of 4 bytes. This padding is necessary for memory alignment and must be accounted for when processing the data. The function calculates the Bytes per Pixel by dividing the bit depth (biBitCount) by 8. For example, in a 24-bit BMP file, each pixel requires 3 bytes—one each for red, green, and blue.

?           # Calculate bytes per pixel and row padding
            bytes_per_pixel = biBitCount // 8?

The total size of each row, including padding, is computed using the formula (biWidth * bytes_per_pixel + 3) & ~3, which ensures the row is correctly aligned to a 4-byte boundary.

            row_size = (biWidth * bytes_per_pixel + 3) & ~3?

With this preparation, the function loops through each row of the image, reading pixel data one pixel at a time. Depending on the bit depth, it extracts the color values for each pixel. For 24-bit images, it retrieves the blue, green, and red channels (in that order) and stores them as (R, G, B) tuples. For 32-bit images, which include an alpha channel, it stores (R, G, B, A) tuples to account for transparency. After processing all the pixels in a row, the function skips over the padding bytes, if any, and moves to the next row. Each row is stored as a list of pixel tuples, and all rows are collected into a larger list representing the full image. Once all rows have been processed, the structured pixel data is returned, ready for further use in the program.

Conclusion

In conclusion, digital signal processing (DSP) offers a unique blend of mathematical theory and practical application, making it one of the most compelling fields of study for those passionate about applied mathematics. Among its many subfields, image processing stands out for its ability to provide immediate visual feedback, showcasing the real-world impact of mathematical principles. By exploring the BMP file format and implementing a Python-based approach to understanding its structure, I gained invaluable insights into the intricate details of image storage and processing.

This journey emphasized the importance of digging deeper into underlying mechanisms, rather than relying solely on abstractions provided by libraries. From understanding file headers to decoding pixel data, each step reinforced the connection between mathematical concepts, programming techniques, and their practical applications. This exploration not only deepened my appreciation for image processing but also highlighted the broader significance of DSP as a field that bridges the gap between theoretical knowledge and tangible outcomes.

References

Wikipedia: BMP File Format https://en.wikipedia.org/wiki/BMP_file_format
Microsoft Learn: Bitmap Storage https://learn.microsoft.com/en-us/windows/win32/gdi/bitmap-storage
Microsoft Learn: BITMAPFILEHEADER Structure https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapfileheader
Microsoft Learn: BITMAPINFOHEADER Structure https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapinfoheader
Microsoft Learn: BITMAPV4HEADER Structure https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapv4header
Microsoft Learn: BITMAPV5HEADER Structure https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapv5header

Appendix

Below is a sample code snippet that demonstrates how to decode the BMP file using vanilla Python

import io

def read_bmp(file_path):
    with io.open(file_path, 'rb') as f:
        # Read the BITMAPFILEHEADER
        bfType = f.read(2).decode()  # First 2 bytes, should be "BM"
        if bfType != 'BM':
            raise ValueError("Not a valid BMP file.")
        
        bfSize = int.from_bytes(f.read(4), 'little')  # The size, in bytes, of the bitmap file
        bfReserved1 = f.read(2)  # Reserved; must be zero
        bfReserved2 = f.read(2)  # Reserved; must be zero
        bfOffBits = int.from_bytes(f.read(4), 'little')  # Offset to pixel data

        # Read the size of the DIB header
        dib_header_size = int.from_bytes(f.read(4), 'little')

        def read_pixel_data(biWidth, biHeight, biBitCount):
            # Move file pointer to pixel data
            f.seek(bfOffBits)
            pixel_data = []

            # Calculate bytes per pixel and row padding
            bytes_per_pixel = biBitCount // 8
            row_size = (biWidth * bytes_per_pixel + 3) & ~3  # Rows are padded to the nearest 4 bytes

            for y in range(biHeight):
                row = []
                for x in range(biWidth):
                    pixel = f.read(bytes_per_pixel)
                    if biBitCount == 24:  # 24-bit BMP (no alpha channel)
                        b, g, r = pixel[0], pixel[1], pixel[2]
                        row.append((r, g, b))  # Store as a tuple of integers
                    elif biBitCount == 32:  # 32-bit BMP (with alpha channel)
                        b, g, r, a = pixel[0], pixel[1], pixel[2], pixel[3]
                        row.append((r, g, b, a))  # Store as a tuple of integers
                f.read(row_size - biWidth * bytes_per_pixel)  # Skip padding
                pixel_data.append(row)

            return pixel_data

        if dib_header_size == 40:
            # BITMAPINFOHEADER
            header_type = "BITMAPINFOHEADER"
            biSize = dib_header_size
            biWidth = int.from_bytes(f.read(4), 'little')
            biHeight = int.from_bytes(f.read(4), 'little')
            biPlanes = int.from_bytes(f.read(2), 'little')
            biBitCount = int.from_bytes(f.read(2), 'little')
            biCompression = int.from_bytes(f.read(4), 'little')
            biSizeImage = int.from_bytes(f.read(4), 'little')
            biXPelsPerMeter = int.from_bytes(f.read(4), 'little')
            biYPelsPerMeter = int.from_bytes(f.read(4), 'little')
            biClrUsed = int.from_bytes(f.read(4), 'little')
            biClrImportant = int.from_bytes(f.read(4), 'little')
            
            return {
                "header_type": header_type,
                "width": biWidth,
                "height": biHeight,
                "planes": biPlanes,
                "bit_count": biBitCount,
                "compression": biCompression,
                "size_image": biSizeImage,
                "x_pels_per_meter": biXPelsPerMeter,
                "y_pels_per_meter": biYPelsPerMeter,
                "clr_used": biClrUsed,
                "clr_important": biClrImportant,
                "pixel_data": read_pixel_data(biWidth, biHeight, biBitCount)
            }

        elif dib_header_size == 108:
            # BITMAPV4HEADER
            header_type = "BITMAPV4HEADER"
            bV4Size = dib_header_size
            bV4Width = int.from_bytes(f.read(4), 'little')
            bV4Height = int.from_bytes(f.read(4), 'little')
            bV4Planes = int.from_bytes(f.read(2), 'little')
            bV4BitCount = int.from_bytes(f.read(2), 'little')
            bV4Compression = int.from_bytes(f.read(4), 'little')
            bV4SizeImage = int.from_bytes(f.read(4), 'little')
            bV4XPelsPerMeter = int.from_bytes(f.read(4), 'little')
            bV4YPelsPerMeter = int.from_bytes(f.read(4), 'little')
            bV4ClrUsed = int.from_bytes(f.read(4), 'little')
            bV4ClrImportant = int.from_bytes(f.read(4), 'little')

            bV4RedMask = int.from_bytes(f.read(4), 'little')
            bV4GreenMask = int.from_bytes(f.read(4), 'little')
            bV4BlueMask = int.from_bytes(f.read(4), 'little')
            bV4AlphaMask = int.from_bytes(f.read(4), 'little')
            bV4CSType = int.from_bytes(f.read(4), 'little')
            bV4Endpoints = f.read(36)  # CIEXYZTRIPLE structure
            bV4GammaRed = int.from_bytes(f.read(4), 'little')
            bV4GammaGreen = int.from_bytes(f.read(4), 'little')
            bV4GammaBlue = int.from_bytes(f.read(4), 'little')

            return {
                "header_type": header_type,
                "width": bV4Width,
                "height": bV4Height,
                "planes": bV4Planes,
                "bit_count": bV4BitCount,
                "compression": bV4Compression,
                "size_image": bV4SizeImage,
                "x_pels_per_meter": bV4XPelsPerMeter,
                "y_pels_per_meter": bV4YPelsPerMeter,
                "clr_used": bV4ClrUsed,
                "clr_important": bV4ClrImportant,
                "red_mask": bV4RedMask,
                "green_mask": bV4GreenMask,
                "blue_mask": bV4BlueMask,
                "alpha_mask": bV4AlphaMask,
                "cstype": bV4CSType,
                "gamma_red": bV4GammaRed,
                "gamma_green": bV4GammaGreen,
                "gamma_blue": bV4GammaBlue,
                "pixel_data": read_pixel_data(bV4Width, bV4Height, bV4BitCount)
            }

        elif dib_header_size == 124:
            # BITMAPV5HEADER
            header_type = "BITMAPV5HEADER"
            bV5Size = dib_header_size
            bV5Width = int.from_bytes(f.read(4), 'little')
            bV5Height = int.from_bytes(f.read(4), 'little')
            bV5Planes = int.from_bytes(f.read(2), 'little')
            bV5BitCount = int.from_bytes(f.read(2), 'little')
            bV5Compression = int.from_bytes(f.read(4), 'little')
            bV5SizeImage = int.from_bytes(f.read(4), 'little')
            bV5XPelsPerMeter = int.from_bytes(f.read(4), 'little')
            bV5YPelsPerMeter = int.from_bytes(f.read(4), 'little')
            bV5ClrUsed = int.from_bytes(f.read(4), 'little')
            bV5ClrImportant = int.from_bytes(f.read(4), 'little')

            bV5RedMask = int.from_bytes(f.read(4), 'little')
            bV5GreenMask = int.from_bytes(f.read(4), 'little')
            bV5BlueMask = int.from_bytes(f.read(4), 'little')
            bV5AlphaMask = int.from_bytes(f.read(4), 'little')
            bV5CSType = int.from_bytes(f.read(4), 'little')
            bV5Endpoints = f.read(36)  # CIEXYZTRIPLE structure
            bV5GammaRed = int.from_bytes(f.read(4), 'little')
            bV5GammaGreen = int.from_bytes(f.read(4), 'little')
            bV5GammaBlue = int.from_bytes(f.read(4), 'little')
            bV5Intent = int.from_bytes(f.read(4), 'little')
            bV5ProfileData = int.from_bytes(f.read(4), 'little')
            bV5ProfileSize = int.from_bytes(f.read(4), 'little')
            bV5Reserved = int.from_bytes(f.read(4), 'little')

            return {
                "header_type": header_type,
                "width": bV5Width,
                "height": bV5Height,
                "planes": bV5Planes,
                "bit_count": bV5BitCount,
                "compression": bV5Compression,
                "size_image": bV5SizeImage,
                "x_pels_per_meter": bV5XPelsPerMeter,
                "y_pels_per_meter": bV5YPelsPerMeter,
                "clr_used": bV5ClrUsed,
                "clr_important": bV5ClrImportant,
                "red_mask": bV5RedMask,
                "green_mask": bV5GreenMask,
                "blue_mask": bV5BlueMask,
                "alpha_mask": bV5AlphaMask,
                "cstype": bV5CSType,
                "gamma_red": bV5GammaRed,
                "gamma_green": bV5GammaGreen,
                "gamma_blue": bV5GammaBlue,
                "intent": bV5Intent,
                "profile_data": bV5ProfileData,
                "profile_size": bV5ProfileSize,
                "reserved": bV5Reserved,
                "pixel_data": read_pixel_data(bV5Width, bV5Height, bV5BitCount)
            }

        else:
            return "Unknown or unsupported header"

# Example usage
bmp = read_bmp('input.bmp')
print(bmp)

要查看或添加评论，请登录

Sivuyile Sifuba的更多文章

Inside MySQL’s Client-Server Communication

2025年1月27日

Inside MySQL’s Client-Server Communication

For the most part, my work involves writing SQL queries. We use MySQL as our database management system (DBMS).
Amanani Wonkwenyani

2023年11月12日

Amanani Wonkwenyani

"Ndazi lukhulu ngokuzivalela egumbini, usiba esandleni ndityikitye imibongo elulutho eyakuze indibeke emanqanabeni"…

6 条评论

Why I Ditched Libraries to Decode Image Files

Sivuyile Sifuba

Data Specialist | BSc(Eng) in Mechatronics (UCT)

Why Image Processing?

Understanding and Reading BMP Files

Opening the BMP File

BMP File Structure

Verifying the File Format

领英推荐

Extracting Pixel Data

Conclusion

References

Appendix

Sivuyile Sifuba的更多文章

社区洞察

其他会员也浏览了

automata: Simulation and manipulation

Byte Size TECH & Science NEWS

Lie Algebra on SO3 Groups in Python

?? 15 Creative Methods for Data Annotation in Math Using Python ??

Are there any emerging programming languages or frameworks gaining traction in the AI community?

Machine Learning at scale, what about runtime performance?

The Julia programming language

AdS/CFT String Theory Boltzmann Machine Python Applications and Future Directions

OpenCV: Open Source Computer Vision Library

Monks, Your DNA, and Python

Why Image Processing?

Understanding and Reading BMP Files

Opening the BMP File

BMP File Structure

Verifying the File Format

领英推荐

Extracting Pixel Data

Conclusion

References

Appendix

Sivuyile Sifuba的更多文章

Inside MySQL’s Client-Server Communication

Amanani Wonkwenyani

社区洞察

其他会员也浏览了

automata: Simulation and manipulation

Byte Size TECH & Science NEWS

Lie Algebra on SO3 Groups in Python

?? 15 Creative Methods for Data Annotation in Math Using Python ??

Are there any emerging programming languages or frameworks gaining traction in the AI community?

Machine Learning at scale, what about runtime performance?

The Julia programming language

AdS/CFT String Theory Boltzmann Machine Python Applications and Future Directions

OpenCV: Open Source Computer Vision Library

Monks, Your DNA, and Python