Uncovering the Depths of NodeJS Internals: From JavaScript to Low-Level C & C++ Implementations

Uncovering the Depths of NodeJS Internals: From JavaScript to Low-Level C & C++ Implementations

You may already be familiar with the JavaScript engine and its role in runtimes such as NodeJS and web browsers. However, it's important to note that NodeJS has more to offer than just the V8 engine. Thanks to its capabilities like reading and writing files, connecting to databases, making HTTP requests, and more. To truly understand NodeJS, it's crucial to delve into its internal workings.

??What does Node.js Include inside?

Remember, the V8 engine here is what lets the NodeJS run JavaScript. But there's so much more to NodeJS than just the ability to run pure JavaScript. There's this entire environment, this entire runtime that runs outside the browser that lets us do things that aren't part of the JavaScript programming language out of the box. Let's take a look inside the Node.js runtime.

No alt text provided for this image

We've already seen that we have the V8 engine, which is what allows us to run basic JavaScript code.

No alt text provided for this image

But also part of Node are our Node.js APIs or additional functions that allow us to do things like talk to the file system to read files and make HTTP requests or lookup paths on our computer for files that we might want to read and write, and even things like the crypto functionality, which has some built-in code that allows us to make our node programs more secure, for example, by encrypting some data for us.

Now, when V8 sees some code that uses these features, it'll say, OK, I need some help to execute this, and it'll call the corresponding functionality in our Node.js APIs. Some of this is written in JavaScript, but also in lower-level languages that are more like the machine code that the hardware on your computer speaks like C and C++. And that's where these NodeJS bindings come in.

No alt text provided for this image

These bindings are what lets your JavaScript code call functionality that's implemented in C and C++, which have things like really highly optimized and tested code to encrypt your data or to make calls to your file system, reading and writing files, regardless of which operating system you're on. So for much of their functionality, the Node APIs will call into these C++ Node.js bindings.

Now, the actual implementation of some of this core functionality, like working with files or the HTTP protocol, lives in what we call libuv.

No alt text provided for this image

libuv and V8 are the two most important internal components of Node.js. That's because Libuv deals with input and output tasks. It's this highly optimized library of code written in the C programming language that deals with I/O tasks that NodeJS can delegate to other parts of your operating system.

This is like your boss telling you to create a five-page report on the strengths of NodeJS, where he's delegating that task to you. Probably because he knows you have the time and expertise to do it.

Example: Let’s consider the task of downloading a webpage and the download request takes 30 seconds. Well, a download request is an input that's asking for output for a response from the web server.

So what's going to happen is we'll make a request in JavaScript using the http module, a function from this http Node.js API. And through the Node.js bindings, NodeJS will pass that task on to libuv, which will just tell your operating system, hey, there is a task of downloading the contents of a web page. Then the operating system will go off and perform that tasks. So we've delegated that task to the operating system and our node program i.e. JavaScript doesn't have to wait around for the response. This is called asynchronous I/O or asynchronous input-output. Nearly all of the code that we write in node involves some form of asynchronous I/O. This includes functionality provided by Libuv.

No alt text provided for this image

Like working with your file system and making requests over the network, over the internet, whether you're on Windows, Mac OS, or Linux, you see part of the challenge of working with all of these different operating systems, particularly when you're working with lower level languages like C and C++, is that each of them accomplishes the same thing differently. The way you open a file on Windows versus Mac OS is actually quite different. Even though you're accomplishing the same thing at the end of the day.

And that's where the beauty of Libuv shines, it abstracts away all of these specific ways of reading a file, for example. So that node will work on any system, on any platform. It does this by implementing the functionality in Libuv. and exposing it to this higher-level layer, which is then translated back to our JavaScript.

We've talked about things like implementation and bindings, but what exactly this means for us and how each piece connects to each other probably doesn't seem fully clear to you yet. We've been talking about the internals of Node, but let's actually apply this theory.

??Node Internals Deep Dive

We've been talking about how the internals of the NodeJS structure. Now it's time to look at the code behind NodeJS. We saw that our JavaScript code executed in the V8 engine will sometimes make use of these built-in Node.js APIs and when we call one of the functions in these APIs, they go through the node.js bindings and are then handled by Libuv in some way.

So let's explore exactly what's going on when our code goes through the flow we see here. The official Node.js project is hosted on GitHub. Since it was made open-source, we can actually look around the codes and figure out how it works. Learn why things are done the way they are, and even potentially contribute new features.

We have a few folders in the Node.js project. The two that are most interesting to us are -

  1. lib: It has the JavaScript side of our node APIs i.e. each module in our Node.js documentation. Remember how we had access to all these globals and APIs that node provides for us? Like the console object, http, and OS module to get information about our operating system and path module to access paths on our hard drive or process module to get information about the process and the arguments that were passed into Node. Each of these has a corresponding file in the lib folder.
  2. src: This folder contains the C++ side with our low-level node API bindings, it's that connection between the JavaScript world and the C++ world.

Example: For demonstration purposes, let's look at an example - say we want to open a file. We know it's part of the file system module, and if we look at the documentation, we'll see the fs.open function there. This lets us pass in a path and some additional arguments.

But where does this live? What's it actually doing? How does it work on both Windows and Mac OS? Well, we can investigate in our source code. We know in the lib folder there's going to need to be a file for our fs module because that's the module that the open function is part of. So open the file with the filename -

  • fs.js

Since it has over 2000 lines, I'm going to do a search for the open function. Eventually, you might find the function declaration of the open function, which looks like this -

No alt text provided for this image

Here we can follow along with the logic, we can see that we're validating the path as the first step. And then depending on what the arguments are, we do a few different branches, there are some different options.

But the most interesting piece here is this function right here → binding.open. This is where our NodeJS internal bindings come in. This function will convert the values between JavaScript in our fs module and C++ over here, allowing Node to call into that optimized lower-level C++ code. So we're calling the open function available in our bindings.

We can now see the full implementation of this open function by going to that src folder. Inside there, find and open a file with the filename -

  • node_file.cc

Where .CC is one of the two most common file extensions for the C++ programming language. The other extension you might see for C++ is .CPP. While the .h file has the headers. Usually, we want to look into the .CC file for the bulk of the code. Inside there, you’ll see thousands of lines of code that might look a little bit scary. Don’t worry, we just need to focus on the function → Initialize And a bunch of setMethod calls.

No alt text provided for this image

These are associating functions from JavaScript with C++ functions in this file. So remember, we were calling fs.open through open (small o), which is going to call the Open function (capital O) which’s a function in C++.

In this file, So I'll do a search for Open (capital O). One of the results would be the C++ implementation of our Open function.

No alt text provided for this image

We can look through this code and see that it's doing some validation on our arguments. But the interesting bit here is will see this uv_fs_open being referenced in an AsyncCall and also in a SyncCall like this -

No alt text provided for this image

This uv_fs_open is actually a function inside of Libuv that we're saying can be asynchronous or synchronous. what we can learn from this example here is that things that start with “uv_” references Libuv.

This is getting a bit deep. How far does the rabbit hole go? Where is our binding.open functionality actually implemented? Let's complete this deep dive into the internals of node by going all the way into the Libuv source code. After this, you'll be able to say you went from the highest level JavaScript to the lowest level internal workings of NodeJS.

??Libuv Internals Deep Dive

Let's complete this deep dive into the internals of node by going all the way into the libuv source code. We're about to venture somewhere that very few Node developers ever go. So we're going to have that extra advantage when it comes to knowing exactly what node is doing.

First, let’s go to the open-source GitHub repo of libuv. The bulk of the C code that makes up Libuv is in this src folder. Now, remember, we're exploring what happens from JavaScript to our final results when we run fs.open to open a file. We've seen the call through the Node.js APIs in our fs.js file and going through the Node.js bindings. Now let's find it in Libuv.

Now in our src folder, you can see that there are two main folders -

  • unix - which is a family of operating systems that includes both Linux and Mac OS, where things tend to work in similar ways. This is because both Linux and Mac OS are descendants of this original Unix operating system.
  • win - which’s for Windows OS.

Let's continue understanding the low-level implementation of fs.open. Now we need to first find where does the open functionality live?

Inside unix: Here the open functionality is going to be in the fs.c file. Going into our file, we see that we have another few thousand lines of code. So do a search for “uv_fs_open”, one of the search results would be the definition of uv_fs_open which looks like this -

No alt text provided for this image

But this doesn't seem to be doing any opening of files. It's just really calling this INIT and passing in OPEN, this is a way of calling functions that Libuv uses. The good news is that we don't have to search far for the actual implementation. The actual implementation is in this function called uv__fs_open

No alt text provided for this image

This is where the hard work of actually opening the file and calling it down into the operating system and doing file system operations is done. When we call this open function here, this is the actual opening function for Unix operating systems that makes a system call down into your operating system to do whatever it needs and then gives you a handle on that file and a way of working with that file and reading and writing to it. Finally, returns all the way back up into our JavaScript**.**

Inside win: Similarly, there’ll be a file with the same filename → fs.c. So let’s open it. Then search for “uv_fs_open”, what's being called from our Node.js API bindings.

No alt text provided for this image

again, we have this in its pattern where the actual function that opens the file lives close by. And the actual function is “fs__open”,

No alt text provided for this image

This is where the good stuff happens on windows. And we see those DWORDs, which are structures that Windows users. It's actually well over 100 lines of code, maybe two or three, 300.

We only get to the actual call that opens the file on windows all the way down here.

No alt text provided for this image

This CreatesFileW function gives us that file handle that we just talked about on UNIX. And there's more after that.

No alt text provided for this image

This _open_osfhandle function over here, transfers, ownership of the file handle that we got from creating a file to the fd(file descriptor) variable here, which is what we then use in C to work directly on the file.

And finally, at the bottom of this super long function, we set a result with that file descriptor and return it back up through our code.

No alt text provided for this image

Well, things are just a little bit more involved in Windows Land. But all these hard works made opening a file to be compatible between Windows and Unix so that we can work with files in the same way, regardless of which operating system we're on. This is one of the benefits that working with Libuv gives us as node developers.

Bruno Pasquarelli Macedo

Software Engineer - PHP, Laravel, Javascript, NodeJS | PHP Developer na Globant

1 年

Awesome article, thank you for sharing!

Jonathan Alejandro Diaz Mollocondo

Backend Software Engineer | C++ | Node.js | AWS | serverless | Lambda

1 年

Great article

要查看或添加评论,请登录

??Prasenjit Sutradhar的更多文章

社区洞察

其他会员也浏览了