Building a Simple Binary Parser in JavaScript

Building a Simple Binary Parser in JavaScript

This is the third part of the series on parser combinators. In this installment, we will implement a simple parser to read the header of an MP3 file. In the previous part, we enhanced the parser library with the ability to verify arbitrary objects. We can take advantage of this to create some binary parser functions and extract the header data from an MP3 file.

Before we begin the implementation, we need to decide what type of object we will accept as input for our binary data. Depending on where the code will be executed, we can accept either a Buffer or a Uint8Array. I have decided to go with the Buffer route for no particular reason, even though Buffer is not implemented by default in the browser. Now that this is settled, let's start by implementing the isBinary rule, which will verify that the object we are parsing is indeed a buffer.

Aucun texte alternatif pour cette image

Next, we will implement the Bit rule, which will consume a single bit and check that we are reading inside the Buffer object. Here's how we can define the Bit rule:

Aucun texte alternatif pour cette image

The Bit rule will serve as the building block for all the other binary-related rules that we will write. Once we have read a sequence of bits, we can combine them into larger units, such as nibbles (4 bits) and bytes (8 bits).

Aucun texte alternatif pour cette image

In addition to the Bit and Nibble rules, we can define a Uint8 rule. This rule will accept a number as input and return a parser rule that verifies that the byte we read from the buffer is equal to that number.

Aucun texte alternatif pour cette image

To combine a sequence of bits into an integer, we can define a bitsToInt function that takes an array of bits and returns the integer they represent. Here's how we can implement this function:

Aucun texte alternatif pour cette image

The bitsToInt function converts an array of bits into an integer. We achieve this by reducing the array and applying the binary-to-decimal conversion. We should note that we use the reversed index since the most significant bit comes first.

We can create another version of this function that we will call bitsToIntLe, where the "Le" stands for "little-endian." This version of the function will interpret the least significant bit as coming first, rather than the most significant bit.

Aucun texte alternatif pour cette image

Sometimes, we may need to parse a binary string from the input data. To handle this case, we can define a binStr rule. This rule will accept a string as input and check if the value at the current position matches the expected string. If the value matches, the rule will consume the corresponding bits and return the string.

The binStr rule can be useful when we need to parse headers or other structured data that contain string values.

Aucun texte alternatif pour cette image

The binStr rule uses the byteSize function to determine the size of the string. This is because not all string characters have a size of one byte, depending on the encoding used. The byteSize function determines the size of a string in bytes by converting it to a byte array and returning its length.

The implementation of the byteSize function is pretty straightforward:

Aucun texte alternatif pour cette image

We are almost ready to start writing our mp3 header reader. Before that, we need to define two more rules: the one rule and the zero rule.

The one rule checks that the value at the current position is 1. If the value is 1, the rule consumes the corresponding bit and returns the bit.

Similarly, the zero rule checks that the value at the current position is 0.

Aucun texte alternatif pour cette image

Before we start parsing an MP3 file, it's important to note that this code is for educational purposes only, and we won't be handling ID3 tags. We'll simply skip over them, but we'll still need to read the header to be able to skip them. If anyone ends up parsing ID3 tags, feel free to share it with me.

To begin, we'll take a look at the MP3 specification, which we'll be following in this implementation. Another useful resource is the Wikipedia page for MP3. The layout of the MP3 header is described as follows:


Aucun texte alternatif pour cette image

As we go through the specification, we can see that the data we extract will be interpreted differently depending on what has been parsed before. This is where we can use the context rule we wrote in the first part of this series. We can start by creating an mp3Header rule that will verify and parse a header frame.

Aucun texte alternatif pour cette image


First, we consume the frame sync, which is 11 bits all set to one. We use uint8 to check the first 8 bits and then 3 bits all equal to 1, which makes it a total of 11. There is no information in this part, so we don't need it.

Next, we get two bits that indicate the version. We use an object to map the number to a string representing the version.

Aucun texte alternatif pour cette image

Then we read two bits that represent the layer and we use another map to extract the layer number.

Aucun texte alternatif pour cette image

Almost all of the maps required to parse the header can be constructed using the information from the linked specification. However, one map was missing from the specification. Here is the missing map:

Aucun texte alternatif pour cette image

This maps the version to the number of samples per frame

Now that we can read the mp3 header, we can write the ID3 parser to skip the ID3 tag if we find one. For those who are wondering, ID3 tags are used to store metadata about the audio, such as the artist's name, title, album art, musical genre, and so on.

The spec for the ID3 tag can be found here. As you can see, the ID tag starts with an ID3 string. Luckily, we have the binStr rule to fetch this part. Then we have some flags and the information that we are interested in, which is the size. Unfortunately, the size is encoded using a synchsafe 32-bit integer. This means that the 7th bit is always 0. That's all we need to know for the purpose of this post. If you want to learn more, you can take a look at the links provided. To get the size, we will need to unsynchsafe the number we get before we can process the data. Here is the function we will use:

Aucun texte alternatif pour cette image

Let's take a look at the parsing code for the ID3v2 tag

Aucun texte alternatif pour cette image

The code appears to be reasonably readable now. To complete the parsing process, we can write an mp3 parser that combines the mp3header and the ID3 parsers and returns the relevant information.

Aucun texte alternatif pour cette image

The code uses a loop to read the mp3 headers and combine the results into a single object. Lookahead is used to prevent the parser from returning an error if there are no more headers to read.

This part of the tutorial provides a basic implementation of a parser, but there are many more advanced parsing techniques and concepts that we haven't covered. Nonetheless, this should give you a good understanding of how binary data can be parsed and utilized.

In the next post, we will delve into a simple implementation of a Lisp interpreter and utilize the parsers we have written to create a reader that will generate an abstract syntax tree (AST) for the interpreter.

. Jason B.

IT Professional | Product Manager | Cosmonaut | Father

1 年

Nice work; on a different note, what happened to your CryptoZone project, please?

回复

要查看或添加评论,请登录

oussama hamdaoui的更多文章

社区洞察

其他会员也浏览了