Notes on Data Compression: Part 5 (JPEG model)

Notes on Data Compression: Part 5 (JPEG model)

Introduction

In this article, the last in this series on data compression, we will look at an implementation, in C++, of a JPEG/JFIF decoder in order to solidify the concepts discussed in the previous article. It is imperative that the previous article has been read and (largely) understood, as the text here assumes this and does not elaborate on the concepts and concentrates on the decoder implementation. It is targeted at those who wish to know, in more detail, the practicalities of implementing JPEG applications and are willing to deep dive into source code. If you've made it this far, well done! So let's begin.

The development of the implementation takes the form of a C/C++ model, but with an architecture that is logic implementation friendly, should that next step be taken in the future, and is based around the DCT algorithm in [3] and pipelined, just as per an intended logic design. A C/C++ model is easier and quicker to implement than an RTL logic implementation, and will execute much faster than an RTL simulation. Therefore the model can be tested with a much larger set of test data for a given test period. The model then becomes a specification and test reference for any future logic hardware implementation. As such it is unlikely to compete on performance with software library implementations of JPEG decoders (such as libjpeg-turbo, for example) designed from the start for performance as software running on a computer. The code, along with this documentation, is meant to be instructive though still perform efficiently. A line-by-line description will not be attempted, and this article is meant to be read in conjunction with the open-source source code on github.

Decoder Software Model Implementation

The software model for the decoder covers all functionality that would also be required in a logic implementation, plus some peripheral code for a command line executable program interface and debugging information etc. As a whole the model will serially process an input JPEG file and output a 24 bit bitmap file. This basic operation is further broken down into processing the header information, and then looping through the 'MCUs', decoding them, and placing the bitmap data in a buffer and outputting the results. The core is based around a class 'jfif', with a single public method (jpeg_process_jfif()). The class header is defined in jfif_class.h, and the methods are defined in jfif.cpp. The jfif class inherits a base class, jfif_idct that provides the inverse DCT functionality. This is defined in jfif_idct.cpp and jfif_idct.h. The basic calling structure of the code is as shown below in the form of pseudo-code. The function names shown match with the actual jfif class methods, and the calling hierarchy is as it is in the model.

jpeg_process_jfif()? ? ? ? ? ? ?-- Converts i/p baseline DCT JFIF buffer to 24
                                -- bit bitmap
? ? jpeg_extract_header()? ? ? ?-- Extracts jpeg header information
                                --   (scan, frame, quant/huffman tables, DRI)
? ? ? ? jpeg_dht()? ? ? ? ? ? ? -- Makes a decode structure from huffman
                                --   table data
? ? jpeg_bitmap_init()? ? ? ? ? -- Creates space for appropriately sized
                                -- bitmap and initialises header data

? LOOP for each MCU:? ? ? ? ? ? -- jpeg_process_jfif() process an MCU at a 
                                --   time until EOI (or error)
? ? jpeg_huff_decode()? ? ? ? ? -- Gets an adjusted Huffman/RLE decoded
                                --   amplitude value and ZRLs, or marker or
                                --   end-of-block
? ? ? ? jpeg_dht_lookup()? ? ? ?-- Does huffman lookup on code and extracts
                                --   amplitude data, or flags a marker
? ? ? ? ? ? jpeg_get_bits()? ? ?-- Gets top bits from barrel shifter and
                                --   (optionally) removes. Pulls in extra
                                --   input if needed.
? ? ? ? ? ? jpeg_amp_adjust()? ?-- Adjusts amplitude to +/- amplitude value

? ? ? ? <dequantise>? ? ? ? ? ? -- De-quantisation done in jpeg_huff_decode 
                                --   directly from selected table
? ? jpeg_idct()? ? ? ? ? ? ? ? ?-- Inverse discrete cosine transform

? ? jpeg_ycc_to_rgb()? ? ? ? ? ?-- Converts YCbCr to RGB on an MCU

? ? jpeg_bitmap_update()? ? ? ? -- Updates bmp data buff with 8x8 RGB values

? ENDLOOP


? return pointer to bitmap
        

The top level entry point for JFIF decoding is the jpeg_process_jfif() method. This has a very simple interface. The first argument is simply a pointer to a buffer of uint8 bytes (*ibuf), containing the JFIF data to be decoded. The JFIF data is expected to start in the first byte and continue consecutively for the whole image. No assumptions are made by the jpeg_process_jfif() method about the size of the file, or dimensions of the image etc.—all this is calculated from the header.

The second argument is a pointer to a uint8 pointer (**obuf). The pointer referenced by the passed in pointer will be updated to point to a buffer containing the decoded bitmap data (assuming no errors). The memory required for the bitmap is dynamically allocated by jpeg_process_jfif(), and it is the responsibility of the calling process to free this memory when it is no longer required.

The functions return one of several return codes, defined in jfif.h:

  • JPEG_NO_ERROR : Decode completed without error
  • JPEG_FORMAT_ERROR : A format error was encountered when parsing the header
  • JPEG_MEMORY_ERROR : A memory allocation failure occurred
  • JPEG_UNSUPPORTED_ERROR : A valid, but unsupported, format was encountered

Within jpeg_process_jfif() the method calls jpeg_extract_header() to parse the JFIF header, and return various pointers to the header information, such as SOS data (scan_header_t *), SOF0 data (frame_header_t *) and reset interval. With this data jpeg_bitmap_init() is called that allocates a buffer for bitmap data (bmp_ptr) and generates a bitmap header for a 24 bit RGB image (even if it's greyscale). It also calculates, from the header data, the number of MCUs it is to process (total_mcus). The main body of the method is then a while loop that processes MCUs one at a time (and markers) until and EOI marker is encountered. The main body of the method jpeg_huff_decode() is a while loop, that loops, calling jpeg_huff_decode() to return either a marker or an MCU consisting of an 8 by 8 array of adjusted amplitude values (scan_data_ptr). The only markers expected to be returned during scan data processing are RSTn and EOI, otherwise an error is returned. Some checks are done of the sequence of RSTn markers received, though the actual resetting of DC values is handled by lower level functions. The scan_data_ptr data is then inverse discrete cosine transformed (in jpeg_idct()) and then, if a colour image, converted to RBG (jpeg_ycc_to_rgb()). The resulting RGB data is then sent to jpeg_bitmap_update() for constructing into the bitmap buffer. When all MCUs are processed, the **optr is updated to point to the bitmap data, and jpeg_process_jfif() returns with a status.

Header extraction

JFIF header extraction is performed by the jpeg_extract_header() method. This function is based around a single "switch" statement that parses all valid marker types expected before the SOS marker, though it flags an error if any of those markers are from an unsupported file format. The JFIF format mandates some ordering of segments delineated by the markers but, for robustness, the jpeg_extract_header() method only expects the first item to be an SOI marker, and that an SOS marker marks the end of the header information. Any other segment can come in any order, whatever the JFIF specification might say. It is known that not all images follow the JFIF format completely strictly.

All markers fall in to one of two categories: ones that have no following data (e.g. SOI) and those that do (e.g. APP0). The latter are always followed by 2 bytes of length, which dictates the total number of following bytes (including the 2 byte length) of the segment. The main markers of interest to the jpeg_extract_header() method are the SOI, SOF0, SOS, DHT, DQT, APP0 and APP14. The SOI has no information, but marks the start of the image, and is expected first, otherwise an error is returned. The SOF0 segment contains data for image dimensions, whether colour or greyscale and a set of values determining sub-sampling (if any), quantisation table selections and component ID—one for each component of a set (see previous article). A pointer to a frame_header_t structure is set to the input buffer location 2 bytes from the SOF0 marker (to skip the length field) and returned in the **fptr argument. Note that all other start of frame markers (SOF1 to SOF15) are for unsupported formats, and the method returns an error if they are encountered.

The start of scan segment (SOS marker) has information for each component, defining a component selector, a DC Huffman table selector and an AC Huffman table selector. Unlike for SOF0, we cannot simply point to the input buffer with a pointer to a structure as the variable length part of the segment (i.e. the component specific fields, which may have 1 set or 3 sets), as this variable length sub-section is followed by another fixed section, whose offset is dependent on the number of component data sets. Therefore some buffer space is allocated and the data extracted into this buffer, and a pointer returned (*sptr) of type scan_header_t, that points to this extracted data.

The Huffman table segment (DHT) requires a bit more processing, and so a sub-function (jpeg_dht) is called to process segments of this type, returning a pointer of type DHT_offsets_t which, in turn is returned by jpeg_extract_header(). The aim of the jpeg_dht() method is to construct a Huffman table with as few bits of information as possible. This information consists of the 16 lengths of the segment (see Figure 7 in the previous article), the offsets within the Vn data where the different bit width codes start, and the end codes for the different bit widths. Note that data for bitwidth n is placed at index n-1 in the data arrays, so, for example, bit width 3 data is at index 2.

As discussed in the previous article, this is all that is needed to decode the Huffman encoded data. If input data bits are less than a certain bit width's end code, but bigger than the next smallest width's end code, then that is the number of bits of the code. Subtracting the end code of the next smallest bit width, doubled to give the start code of the matching bit width, from the input data bits gives the offset into the entries for that bit width, and hence the decoded value. Taking the example of from the previous article, we would have a DHT_offsets_t structure filled in as shown:


  * Ln[] (points directly to Ln values in dht buffer, overlaid on the input data
          buffer, buf[])
? ? - 0, 1, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

? * vmn_offset[n] (array of pointers to dht[] indexes)
? ? - 0 -> NULL
? ? - 1 -> dht[0] : 01
? ? - 2 -> dht[1] : 00, 07, 04, 05
? ? - 3 -> dht[5] : 21, 31, 32
? ? - others -> NULL

? * row_break_codes[n]
? ? - 0 = 0b
? ? - 1 = 01b
? ? - 2 = 110b
? ? - 3 = 1111b
? ? - others don't care
        

If the input data was a Huffman encoded run-length, and presented, say, ..010101011, then the first bit (using lsb) is bigger than row_break_codes[0] (for 1 bit width), so this is not a match. The first two bits are 11b, which is bigger than row_break_codes[1], but the first three bits are 011b which is less than row_break_codes[2], meaning we have a 3 bit code. Taking the endcode for 2 bits (row_break_codes[1] = 01b) and doubling it gives 010b, or 2. Subtracting this from the 3 bits of the input is 011b - 010b = 001b, or 1. The 3 bit vmn_offset pointer (vmn_offset[2]) points to dht[1], and the value at offset 1 from this has a value 07h, and hence is the decoded value—i.e. the variable code 011b of the input has a byte value of 07h. The jpeg_dht() functions returns a pointer to a DHT_offsets_t structure with the values filled in for the Huffman table, having allocated some memory space for table. The Tc and Th values are extracted from the input, and the method loops through each bit width (smallest to largest), filling in the table entries, based on the Ln lengths, and Vn values in the DHT segment.

The quantisation table segments (DQT) define the quantisation table values to be used for the different classes (DC or AC) of data, and IDs (which component). Following the 2 byte length are one or more 65 byte value sets. The first byte consists of two nibbles specifying the Pq (always 8) and Tq (destination) value. Then follows the 64 bytes for the 8x8 quantisation table values. These are extracted into the appropriate table (indexed by the Ptq value) pointed to by *qptr, of type DQT_t. The values are turned into integer values at this point as they are pre-scaled with values determined by the Arai et. al methods (see [1], [3]), allowing the saving of a multiplication at that point.

The application sections are really only processed to make some compatibility checks. A JFIF file should have an APP0 segment, and it ought to be the first segment after SOI (but might not be). The first APP0 should begin with the string "JFIF\0", and this is checked. Subsequent APP0 sections, if any, should begin with the string "JFXX\0", and this is also checked. If these criteria are met, then the image is flagged as JFIF (is_JFIF). Some JPEG files that are not JFIF might have an APP14 section, usually associated with Adobe files, and beginning with the string "Adobe". If this is the case, and not flagged as JFIF, then colour space information is plucked from a field in the segment and checked whether it is RGB (if 0) or YCbCr (if 1), and flagged as such (*is_RGB is set TRUE or FALSE, accordingly).

All other markers or forms of segments not covered here are treated as errors, and the jpeg_extract_header() returns the appropriate error code. All being well, the method returns a good status and the header information in the buffers.

Huffman Decode

The purpose of the jpeg_huff_decode() method is to return either a marker or a set of luminance and, if a colour image, chrominance MCUs that make up a sub-sampled set—i.e. a YCC set of 3 when no sub-sampling, up to a YYYYCC when 4:2:0 sub-sampling. The method makes use of a sub-function jpeg_dht_lookup() which returns a structure of type rle_amplitude_t, as defined below.

? ? 
    typedef struct {
? ? ? ? int? marker;? ? ? ? ?// if non-zero, hit a marker
? ? ? ? bool is_EOB;? ? ? ? ?// Flag if EOB
? ? ? ? bool is_ZRL;? ? ? ? ?// Flag if ZRL
? ? ? ? int? ZRL;? ? ? ? ? ? // run length of zeros
? ? ? ? int? amplitude;? ? ? // adjusted amplitude (if not EOB or ZRL != 16)
? ? } rle_amplitude_t, *rle_amplitude_pt;
        

The first field (marker) when non-zero contains a valid marker value, indicating that a marker was encountered. The second is a flag indicating that an EOB marker terminated the MCU data early. The third give a zeros run-length that occurred before the associated amplitude value (which could be 0). The actual amplitude code, when not EOB or ZRL encountered, is returned in the final field, amplitude.

The jpeg_huff_decode() first does some calculations on the header data to determine the expected number of Y and C MCUs (y_arrays, total_arrays), as well as some sub-sampling factors for use later on. The method then loops for as many MCUs as calculated (indexed by array). Within this outer loop, table selectors are calculated based on whether then MCU is for chrominance or luminance, and the ID etc. The MCU buffers (mcu[][]) are cleared to 0 before commencing to allow for zero run-length skipping and early termination on EOB. The DC value is fetched by calling jpeg_dht_lookup(). This might return a marker, in which case it is processed, and jpeg_huff_decode() returns. When a marker encountered at this point, the jpeg_huff_decode() method returns NULL, and the marker information is returned in the *marker argument. Only one type of marker is expected to be processed in jpeg_huff_decode()—RSTx (others are handled in lower functions, or passed up to the calling method). This is for resetting the DC values (which are a running difference), and the DC counts (current_dc_value[]) are cleared on reception of this marker. If not a marker, the appropriate DC value is updated by adding the returned amplitude value to a running total, and de-quantising by multiplying by qptr[Tq][0]. The result is placed in the current mcu[] buffer. After the DC value, the 63 remaining AC values are similarly retrieved, in an inner loop, via jpeg_dht_lookup(), dequantised and put into the mcu[] buffer. If the AC values have a ZRL run-length greater than 0, then the MCU index (mdx) is incremented by that run-length first, before updating the buffer with the de-quantised amplitude. During AC element processing, markers may still be returned. An EOB flagged will simply set mdx to 64 so that the next loop iteration falls through, and effectively skips to the end of that MCU's processing. The only other marker valid to appear during AC data is EOI. This acts like an EOB, but the marker is passed up to the calling method (in *marker). Any other marker is an error. Assuming no markers, when the MCU is complete (with mdx hitting 64), the outer loop iterates to process the next MCU until all MCUs in the group have finished. The jpeg_huff_decode() then returns a pointer to the mcu[] buffer, containing all the mcu data for the whole MCU set.

Huffman lookup

The jpeg_dht_lookup() method referred to in the last section is the procedure that performs the Huffman lookup as described previously, and extracts the following encoded amplitude. As this method is dealing with variable length codes, it works on a barrel shifter variable (jfif_barrel—defined in jfif class() and passed into jpeg_dht_lookup()). On calling the method, matches are made on the barrel shifter bits from their smallest bit width upwards until either a match is made, no match is found (an error) or a marker is found. The code is fetched via the jpeg_get_bits() method, which returns the number of bits requested from the barrel. If not enough bits available on the barrel for the requested size it will fetch more bits from the input, and add them to the barrel. If it detects a marker (ffh followed by a value byte), it will return the marker value ORed with ffff0000h to distinguish it from a code. Otherwise the 'n' top bits of the barrel are returned. The bits are removed from the barrel if the 'remove_bits' argument is set. Until a code match is found, the bits mustn't be removed from the barrel. When the bit width of the code is determined only then is it safe to update the barrel, and delete the bits returned.

Once the jpeg_dht_lookup() method has found a matching bit width, it extracts the code (and flags to delete from the barrel) and does the table lookup to get the byte value. The value is checked for being EOB or ZRL, and the return structure (rle_amplitude_t) flags updated if so. If not EOB or ZRL, the amplitude bitwidth is extracted from the decoded value, and these amplitude bits extracted from the barrel (via jpeg_get_bits()) and passed onto jpeg_amp_adjust(), which decodes the value to undo the encoding as per Table 2 in the previous article. Any zero run-length value from the Huffman decoded value is written to the return structure, and this returned to the calling method.

Inverse DCT

For the inverse DCT, a fast implementation after Arai et. al, and detailed in [1] and [3], has been developed and tested, and the discussions in this article assume this method. If JPEG_FAST_INT_IDCT is defined at compile time, this method is selected. In addition, another iDCT method is supplied (jpeg_idct_slow()) which is a simple implementation that follows the equations discussed in the previous article, and is adapted from [2]. For the slow method, a pair of double loops are implemented for a one dimensional iDCT of rows and then columns. This method is used if JPEG_FAST_INT_IDCT is not defined at compile time. The method can be full precision, or be integer fixed point (when compiled with JPEG_DCT_INTEGER defined).

The fast method used ([3]) for a one dimensional transform is pipelined into seven phases and takes the form of that shown in the diagram below:

No alt text provided for this image

The actual operations for the phases are not shown in the diagram, but the source code comments reference the variables shown above, for cross-referencing the phase steps, and the operations for each calculation is given alongside the code that implements them.

Colour Space Conversion and bitmaps

The C/C++ reference model is capable of generating a bitmap output from the JPEG input, converting the YCbCr data to RGB (if not RGB already), and constructing a valid 24bit RGB bitmap. This functionality is not part of the HDL hardware core implementation, but is added to allow test, comparison and validation in a simple raw standard format. As such, only a fragmentary description will be given here.

The YCbCr conversion to RGB is performed by the jpeg_ycc_to_rgb() method. This takes a full set of Y and C MCUs (potentially sub-sampled) and expands into an N by N block of RGB triplets—which could be 8x8 if no sub sampling, or 16x8 for 4:2:2 etc. It is passed the final image dimensions, and clips the MCU padded data for non MCU aligned boundaries. At the heart of the method is the conversion functions as detailed in the previous article. The calculations are slightly different for full precision versus fixed point integer (if JPEG_DCT_INTEGER defined), but are essentially the same operation. The jpeg_ycc_to_rgb() is passed a buffer pointer (*optr) to a block of memory (a 2 dimensional array, big enough for a maximally sub-sampled YCC set), in which to place the RGB data.

The jpeg_ycc_to_rgb() method only works on the actual image data. Two other functions actually construct the full bitmap file; jpeg_bitmap_init() and jpeg_bitmap_update(). The first is called after the header extractor routine, once the image dimensions are known. The X and Y dimensions are passed to the method, which allocates memory big enough for the image, including a header, configures the header appropriately, and returns a pointer to the buffer created. jpeg_bitmap_update() is called immediately after jpeg_ycc_to_rgb(), and takes the buffer returned by that method, and places the data in the appropriate locations of the bitmap buffer, reversing the image from top to bottom to be bottom to top, as for bitmaps. It is this bitmap buffer that is returned by the top level JPEG method jpeg_process_jfif().

The jpeg_process_jfif() method (and its sub-functions described above) is meant to be used as a stand-alone library class, and also has a C linkage function available, jpeg_process_jfif_c(), and can be compiled as such with the provided make scripts. However, a brief mention of the file jfif_main.c ought to be made, which uses jpeg_process_jfif_c() to produce a command line based executable for exercising the function for purposes such as verification, generation of bit comparison vectors, etc. Details of the usage of the executable, along with source code download instructions and links, can be found here. The main() function in jfif_main.c provides code for a simple command line interface, file input and output and also a graphical display window for viewing the images (using the GTK+2 library—not available when compiled with MSVC).

Conclusions

We have looked at a C/C++ model of a decoder based on the concepts discussed in the previous article. This has solidified these abstract concepts into a practical implementation to demonstrate how real-world implementations are achieved, and provide a case study of such implementations. The source code for the implementation can be found on github. The README.txt in the repository has details of how to install, compile and run the code, though there are both Windows and Linux compiled binaries available in the repository.

This, then, completes the set of articles reviewing foundational data compression concepts, both lossless and lossy. More articles on other subjects are in the pipeline, so keep an eye out for these.

References

  • [1] Pennebake, W.B. and Mitchell, J.L., 1992, JPEG Still Image Data Compression Standard, 1st Ed., ISBN 0-442-01272-1
  • [2] Nelson, Mark; Gailly, Jean-loup, 1995, The Data Compression Book, 2nd Ed., ISBN 1-558-51434-1
  • [3] Arai, Y., Agui, T., and Nakajima, M.: A fast DCT-SQ scheme for images, Trans. IEICE, 1988, E71, (11), pp. 1095–1097

要查看或添加评论,请登录

Simon Southwell的更多文章

  • Nested Vectored Interrupts and Co-Simulation

    Nested Vectored Interrupts and Co-Simulation

    Introduction In general, co-simulation, in the context discussed in this article, is the ability to simulate in both…

    1 条评论
  • Instruction Set Simulators, gdb, IDEs and Co-simulation

    Instruction Set Simulators, gdb, IDEs and Co-simulation

    Introduction I have previously discussed instruction set simulators (ISS) in various contexts, including a discussion…

    1 条评论
  • The Python/C Interface

    The Python/C Interface

    Introduction In this article I want to talk about the Python C interface. Now that can be one of two directions, of…

    2 条评论
  • Logic Development and Make

    Logic Development and Make

    Introduction The ‘make’ utility has been around a long time now (since 1976), and my relationship with it isn’t much…

    3 条评论
  • Ethernet and TCP/IP

    Ethernet and TCP/IP

    Introduction I have had a few requests to cover something about ethernet over the last few months and so I’ve finally…

    1 条评论
  • Performance Measurements of VProc on Verilator

    Performance Measurements of VProc on Verilator

    Introduction I have written about the VProc virtual processor before, in terms of what it is, what it is used for, how…

    5 条评论
  • Introduction to USB: Part 5

    Introduction to USB: Part 5

    Introduction In the first 4 articles in this series, we got ourselves to a point just shy of transferring data in USB…

    1 条评论
  • Introduction to USB: Part 4

    Introduction to USB: Part 4

    Introduction In the previous article we looked at the USB 3 physical layer, looking at encoding, scrambling and the…

  • Introduction to USB: Part 3

    Introduction to USB: Part 3

    Introduction In the previous two articles (see here and here) we got up to USB 2.1 with a half-duplex differential pair…

    1 条评论
  • The VProc Virtual Processor VIP

    The VProc Virtual Processor VIP

    Introduction I have written about the VProc virtual processor verification IP in a previous article, which was the…

    1 条评论

社区洞察

其他会员也浏览了