登录查看更多内容

Building a Simple Binary Parser in JavaScript

oussama hamdaoui

Développeur de light-invoice.com

发布日期: 2023年3月14日

This is the third part of the series on parser combinators. In this installment, we will implement a simple parser to read the header of an MP3 file. In the previous part, we enhanced the parser library with the ability to verify arbitrary objects. We can take advantage of this to create some binary parser functions and extract the header data from an MP3 file.

Before we begin the implementation, we need to decide what type of object we will accept as input for our binary data. Depending on where the code will be executed, we can accept either a Buffer or a Uint8Array. I have decided to go with the Buffer route for no particular reason, even though Buffer is not implemented by default in the browser. Now that this is settled, let's start by implementing the isBinary rule, which will verify that the object we are parsing is indeed a buffer.

Next, we will implement the Bit rule, which will consume a single bit and check that we are reading inside the Buffer object. Here's how we can define the Bit rule:

The Bit rule will serve as the building block for all the other binary-related rules that we will write. Once we have read a sequence of bits, we can combine them into larger units, such as nibbles (4 bits) and bytes (8 bits).

In addition to the Bit and Nibble rules, we can define a Uint8 rule. This rule will accept a number as input and return a parser rule that verifies that the byte we read from the buffer is equal to that number.

To combine a sequence of bits into an integer, we can define a bitsToInt function that takes an array of bits and returns the integer they represent. Here's how we can implement this function:

The bitsToInt function converts an array of bits into an integer. We achieve this by reducing the array and applying the binary-to-decimal conversion. We should note that we use the reversed index since the most significant bit comes first.

We can create another version of this function that we will call bitsToIntLe, where the "Le" stands for "little-endian." This version of the function will interpret the least significant bit as coming first, rather than the most significant bit.

Sometimes, we may need to parse a binary string from the input data. To handle this case, we can define a binStr rule. This rule will accept a string as input and check if the value at the current position matches the expected string. If the value matches, the rule will consume the corresponding bits and return the string.

The binStr rule can be useful when we need to parse headers or other structured data that contain string values.

The binStr rule uses the byteSize function to determine the size of the string. This is because not all string characters have a size of one byte, depending on the encoding used. The byteSize function determines the size of a string in bytes by converting it to a byte array and returning its length.

The implementation of the byteSize function is pretty straightforward:

We are almost ready to start writing our mp3 header reader. Before that, we need to define two more rules: the one rule and the zero rule.

The one rule checks that the value at the current position is 1. If the value is 1, the rule consumes the corresponding bit and returns the bit.

Similarly, the zero rule checks that the value at the current position is 0.

Before we start parsing an MP3 file, it's important to note that this code is for educational purposes only, and we won't be handling ID3 tags. We'll simply skip over them, but we'll still need to read the header to be able to skip them. If anyone ends up parsing ID3 tags, feel free to share it with me.

领英推荐

JavaScript Weird Parts and Quirks

JavaScript Developer WorldWide 2 个月前

JavaScript 50 Quiz Questions

JavaScript Developer WorldWide 7 个月前

?? Mastering JavaScript Web APIs: Unlocking the Power…

JavaScript Developer WorldWide 11 个月前

To begin, we'll take a look at the MP3 specification, which we'll be following in this implementation. Another useful resource is the Wikipedia page for MP3. The layout of the MP3 header is described as follows:

As we go through the specification, we can see that the data we extract will be interpreted differently depending on what has been parsed before. This is where we can use the context rule we wrote in the first part of this series. We can start by creating an mp3Header rule that will verify and parse a header frame.

First, we consume the frame sync, which is 11 bits all set to one. We use uint8 to check the first 8 bits and then 3 bits all equal to 1, which makes it a total of 11. There is no information in this part, so we don't need it.

Next, we get two bits that indicate the version. We use an object to map the number to a string representing the version.

Then we read two bits that represent the layer and we use another map to extract the layer number.

Almost all of the maps required to parse the header can be constructed using the information from the linked specification. However, one map was missing from the specification. Here is the missing map:

This maps the version to the number of samples per frame

Now that we can read the mp3 header, we can write the ID3 parser to skip the ID3 tag if we find one. For those who are wondering, ID3 tags are used to store metadata about the audio, such as the artist's name, title, album art, musical genre, and so on.

The spec for the ID3 tag can be found here. As you can see, the ID tag starts with an ID3 string. Luckily, we have the binStr rule to fetch this part. Then we have some flags and the information that we are interested in, which is the size. Unfortunately, the size is encoded using a synchsafe 32-bit integer. This means that the 7th bit is always 0. That's all we need to know for the purpose of this post. If you want to learn more, you can take a look at the links provided. To get the size, we will need to unsynchsafe the number we get before we can process the data. Here is the function we will use:

Let's take a look at the parsing code for the ID3v2 tag

The code appears to be reasonably readable now. To complete the parsing process, we can write an mp3 parser that combines the mp3header and the ID3 parsers and returns the relevant information.

The code uses a loop to read the mp3 headers and combine the results into a single object. Lookahead is used to prevent the parser from returning an error if there are no more headers to read.

This part of the tutorial provides a basic implementation of a parser, but there are many more advanced parsing techniques and concepts that we haven't covered. Nonetheless, this should give you a good understanding of how binary data can be parsed and utilized.

In the next post, we will delve into a simple implementation of a Lisp interpreter and utilize the parsers we have written to create a reader that will generate an abstract syntax tree (AST) for the interpreter.

. Jason B.

IT Professional | Product Manager | Cosmonaut | Father

1 年

Nice work; on a different note, what happened to your CryptoZone project, please?

要查看或添加评论，请登录

oussama hamdaoui的更多文章

A Whitespace interpreter

2023年6月25日

A Whitespace interpreter

In the previous article about parser combinators, we explored how to use them to validate complex data that satisfies…
GPT, Intent Detection, and Embeddings in pepite.cc

2023年5月28日

GPT, Intent Detection, and Embeddings in pepite.cc

My Experience Building pepite.cc: Exploring the Power of OpenAI APIs Over the past two weeks, I have been immersed in…
How to use parser combinators for data validation part 2

2023年3月13日

How to use parser combinators for data validation part 2

This is the second part in our series on how to use parser combinators for data validation. In the first part, we…
How to use parser combinators for data validation part 1

2023年3月11日

How to use parser combinators for data validation part 1

Hello, I've recently been exploring parsers and compilers, and I came across parser combinators, a powerful pattern…

Building a Simple Binary Parser in JavaScript

oussama hamdaoui

Développeur de light-invoice.com

领英推荐

oussama hamdaoui的更多文章

社区洞察

其他会员也浏览了

?? Dive into the World of JavaScript: Take the Ultimate Quiz! ????

Arrays and Their Properties in JavaScript

How to Efficiently Merge Arrays in JavaScript

JavaScript101:Variable Declaration

JavaScript Quiz Questions and Answers 2

JavaScript Code Examples

Some JS Features You Should Know - Part 1

Why Blazor, via WebAssembly, is the future

Day 2: Dive into bind, call, and apply in JavaScript with Polyfills

Ins & Outs of JavaScript — Understanding The Core

领英推荐

oussama hamdaoui的更多文章

A Whitespace interpreter

GPT, Intent Detection, and Embeddings in pepite.cc

How to use parser combinators for data validation part 2

How to use parser combinators for data validation part 1

社区洞察

其他会员也浏览了

?? Dive into the World of JavaScript: Take the Ultimate Quiz! ????

Arrays and Their Properties in JavaScript

How to Efficiently Merge Arrays in JavaScript

JavaScript101:Variable Declaration

JavaScript Quiz Questions and Answers 2

JavaScript Code Examples

Some JS Features You Should Know - Part 1

Why Blazor, via WebAssembly, is the future

Day 2: Dive into bind, call, and apply in JavaScript with Polyfills

Ins & Outs of JavaScript — Understanding The Core