Deep Dive Into Node.js Module Architecture

Deep Dive Into Node.js Module Architecture

Originally published by me on Medium ?on?Apr 6, 2019

In this article, I will tell you the story of Node.js modules that few Node developers know about in details. The terms modules and packages will be used interchangeably because they represent the same concept, a piece of functionality.

Node’s module management system encourages the development of code bases that grow and remain maintainable. The idea behind the module system is to put in place a rich ecosystem of clearly defined packages with consistent interfaces that are easy to combine, delivered via npm. A module can be a single file or an entire directory.

Node developers will find many packages that represent pieces of ready-made functionalities they need in their projects. These packages can be rapidly composed into larger modules, but still consistent and predictable. Node’s simple and scalable module architecture has let the Node ecosystem grow rapidly and ubiquitously.

This article will dive deep into the details of how Node understands modules and module paths, how modules are defined, how to use modules in the npm package repository, and how to create and share new npm modules. We will also have a look at the future of Node’s modules i.e. ES6 modules.

The purpose will be to make it easier to shape your application structure by understanding the “behind the scene” mechanisms of module creation and use. This will also make your apps more readable by other developers.



Loading and Using Core Node Modules (Node’s Standard?Library)

Node comes out-of-the-box with modules that give access to the core functionalities available. These core modules constitute the standard library of the Node platform. The Node designers made an effort to limit the growth of the standard library because they believe modules should be developed in “userland” i.e. “by developers, for developers”. The core modules are defined within Node’s source and are located in the lib/ folder. Core modules are always preferentially loaded if their identifier is passed to require().

As of the time of this writing, here are the main (not all) modules of the Node standard library:

Network and I/O

  • DNS (offline [operating system facilities] name resolution and online connection to a DNS server to perform name resolution)
  • FileSystem (API for interacting with the file system in a manner closely modeled around standard POSIX functions)
  • HTTP (HTTP server and client)
  • HTTPS (HTTPS is the HTTP protocol over TLS/SSL)
  • HTTP/2 (implementation of the HTTP/2 protocol)
  • Net (asynchronous network API for creating stream-based TCP or IPC [inter-process communication] servers and clients)
  • Readline (interface for reading data from a Readable stream (such as process.stdin) one line at a time)
  • REPL (Read-Eval-Print-Loop (REPL) implementation that is available both as a standalone program or includable in other applications)
  • TLS/SSL (implementation of the Transport Layer Security (TLS) and Secure Socket Layer (SSL) protocols that is built on top of OpenSSL)
  • TTY (Text terminal — not usable directly in most cases)
  • UDP/Datagram (implementation of UDP Datagram sockets)

Strings and Buffers

  • Buffer (interaction with octet streams in TCP streams, file system operations, and other contexts)
  • Path (utilities for working with file and directory paths)
  • QueryString (utilities for parsing and formatting URL query strings)
  • StringDecoder (API for decoding Buffer objects into strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16 characters)
  • Url (utilities for URL resolution and parsing)

Utilities

  • Assert (set of assertion tests)
  • Console (global object providing info about, and control over, the current Node process)
  • Utilities (useful methods originally for Node internal APIs)
  • VM (APIs for compiling and running code within V8 Virtual Machine contexts)

Encryption and Compression

  • Crypto (cryptographic functionality that includes a set of wrappers for OpenSSL’s hash, HMAC, cipher, decipher, sign, and verify functions)
  • ZLIB (compression functionality implemented using Gzip and Deflate/Inflate)

Environment

  • Module (global object — In the Node module system, each file is treated as a separate module.)
  • OS (operating system-related utility methods)
  • Process (global object that provides information about, and control over, the current Node process [current running instance of your Node app])
  • V8 (exposes V8 JavaScript engine APIs)

Events, Streams and Multi-Threading

  • Child Processes (provides the ability to spawn child processes — processes created from the main parent Node process )
  • Cluster (easy creation of child processes that all share server ports)
  • Events (creation of event emitters, core concept of Node’s asynchronous event-driven architecture)
  • Stream (base API that makes it easy to build objects that implement the stream interface)
  • Worker Threads (use of threads that execute JavaScript code in parallel)

Modules are loaded via the global require statement, which accepts the module name or path as a single argument. The module system itself is implemented in the require (module) module.



The module?object

A Node module is basically a Javascript file. In order to use code in one file in another file, you will use the exports property of the module object, which is an empty object. The code put inside that exports object will be available outside the file containing it by using the require() function where needed. Here’s an example:

// getUsers.js
const fetch = require("node-fetch");
const fetchUsers = async () => {
  const res = await fetch("https://jsonplaceholder.typicode.com/users");
  const users = await res.json();
  // the return of async functions is wrapped into a promise
  return users;
};
module.exports = fetchUsers()        

then

//index.js
const users = require("./getUsers.js");
(async () => console.log(await users))();        

As you can see, the return of the fetchUsers method is assigned to the exports property of the module object. This is how you tell Node that you want to export code, by assigning it to the module.exports object.

Node can then load that code via the require method in the current file. Anything on the module.exports object will be available outside of the file where it is defined but only if you require that module. This is in essence the MODULE PATTERN which consists of encapsulating code and make it available only when needed (no namespace collisions).

As you know, the?.js extension is not necessary when importing modules. We will get more into module path resolution in a later section.



Modules, exports, and module.exports (The Module?Wrapper)

As a Node developer, you have seen cases where the code is exported by using the module.exports object or the exports object directly.

module.exports.users = getUsers();
exports.users = getUsers();        

What is the difference then?? Simply put there is no difference because in the end module.exports and exports both point to the same object in memory. The exports variant is just an alias for module.exports. These references to the exports object are made available thanks to the module wrapper. Before executing a module, Node wraps its content with a function wrapper that gives the developer access to the module object.

The module wrapper as defined in Node source code (on Github) looks like this:

Module.wrap = function(script) {
    return Module.wrapper[0] + script + Module.wrapper[1];
};

Module.wrapper = [
    '(function (exports, require, module, __filename, __dirname) { ',
    '\n});'
];        

The module wrapper precludes namespace collisions with the properties on Node’s global object because the variables defined in a module (script) will be scoped to the wrapper function (basic JavaScript function scoping rules). So, each file (module) executed by Node will be wrapped with the module wrapper and give the developer access to the following “globals”:

  • exports (alias for module.exports)
  • require (to import outside code)
  • module (the module object)
  • __filename (the current file/module’s absolute filename)
  • __dirname (the current file/module’s absolute directory path)

Careful when using the exports alias. Only define properties on this object but do not assign to the object itself because it will replace the reference to module.exports.

exports = getUsers(); // bad        

The above code will not export anything because the reference to module.exports has been replaced by the return of the execution of the getUsers function. The proper syntax to keep the reference to module.exports is:

exports.users = getUsers(); //  == module.exports.users = getUsers()        



Modules and?caching

Have you ever wondered what happens when you export a constructor function and construct objects in different files?? Do they share data??

The answer is no because these objects are created in different files even if they are constructed from the same constructor function. But what if you need to apply the SINGLETON PATTERN and only have a unique object no matter how many times the constructor is called to create objects.

In standard JavaScript, you would define a static property (shared across all instances) in your constructor that would check if the constructor was already called once:

class Toto { 
  constructor(){
    let instantiatedOnce;
    if(instantiatedOnce) return this; 
    instantiatedOnce = true;
  }
  ...
}
export default Toto;
// or directly export the singleton:
// export default new Toto();        

Node Modules are cached after the first time they are loaded. This means that every call to require('toto') will get exactly the same object returned, if it would resolve to the same file. Modules are cached based on their resolved filename, resolved relative to the calling module.

Provided require.cache is not modified, multiple calls to require('toto') will not cause the module code to be executed multiple times. This is an important feature. With it, "partially done" objects can be returned, thus allowing transitive dependencies to be loaded even when they would cause cycles.

To have a module execute code multiple times, export a function, and call that function like in the above code except that you would remove the instantiatedOnce check.

Careful with caching (as always with caching matters) because modules are cached based on their resolved filename which depends on the operating system ways of resolving file paths (Linux, Windows, etc.) and also on how you write the relative path. Inside a module. Requiring a module with the path?./toto will return a different object than requiring with the path../myPackage/toto because the relative path are different.

The module object itself contains several useful readable properties:

  • module.filename: The name of the file defining this module.
  • module.loaded: Whether the module is in the process of loading. Boolean true if loaded.
  • module.parent: The module that required the current one, if any.
  • module.children: The modules required by the current one, if any.

You can determine whether a module is being executed directly via node myModule.js or via require(‘./myModule.js’) by checking:

require.main === module        

This is useful to know if the module is the entry point of your app.



How Node handles module?paths

Here is how Node resolves modules when you require them:

require(X) from module at path Y
1. If X is a core module,
   a. return the core module
   b. STOP
2. If X begins with '/'
   a. set Y to be the filesystem root
3. If X begins with './' or '/' or '../'
   a. LOAD_AS_FILE(Y + X)
   b. LOAD_AS_DIRECTORY(Y + X)
4. LOAD_NODE_MODULES(X, dirname(Y))
5. THROW "not found"        

Load as file algorithm:

LOAD_AS_FILE(X)
1. If X is a file, load X as JavaScript text.  STOP
2. If X.js is a file, load X.js as JavaScript text.  STOP
3. If X.json is a file, parse X.json to a JavaScript Object.  STOP
4. If X.node is a file, load X.node as binary addon.  STOP        

Load index algorithm:

LOAD_INDEX(X)
1. If X/index.js is a file, load X/index.js as JavaScript text.  STOP
2. If X/index.json is a file, parse X/index.json to a JavaScript object. STOP
3. If X/index.node is a file, load X/index.node as binary addon.  STOP        

Load as director algorithm:

LOAD_AS_DIRECTORY(X)
1. If X/package.json is a file,
   a. Parse X/package.json, and look for "main" field.
   b. let M = X + (json main field)
   c. LOAD_AS_FILE(M)
   d. LOAD_INDEX(M)
2. LOAD_INDEX(X)        

Load node_modules algorithm:

LOAD_NODE_MODULES(X, START)
1. let DIRS = NODE_MODULES_PATHS(START)
2. for each DIR in DIRS:
   a. LOAD_AS_FILE(DIR/X)
   b. LOAD_AS_DIRECTORY(DIR/X)        

Node_modules paths algorithm:

NODE_MODULES_PATHS(START)
1. let PARTS = path split(START)
2. let I = count of PARTS - 1
3. let DIRS = [GLOBAL_FOLDERS]
4. while I >= 0,
   a. if PARTS[I] = "node_modules" CONTINUE
   b. DIR = path join(PARTS[0 .. I] + "node_modules")
   c. DIRS = DIRS + DIR
   d. let I = I - 1
5. return DIRS        

File paths can be absolute or relative. In order to use relative local paths, you need to prepend the path with?./, otherwise Node’s lookup will check for core modules or modules in node_modules.

As seen in the preceding pseudo code, this node_modules lookup ascends a directory tree beginning from the resolved path of the calling module or file. For example, if the file at /home/$USER/project.js called require(‘totoModule’), Node would seek in this order:

/home/$USER/node_modules/totoModule.js
/home/node_modules/totoModule.js
/node_modules/totoModule.js        

Organizing your files and/or modules into directories is always a good idea. Usefully, Node allows modules to be referenced through their containing folder in two ways. Given a directory, Node will first try to find a package.json file in that directory, alternatively seeking for an index.js file. We will discuss the use of package.json files in the next section. Here, we simply need to point out that if require is passed the?./myModule directory, it will look for?./myModule/index.js.

If you’ve set the NODE_PATH environment variable, then Node will use that path information to do further searches if a requested module is not found via normal channels. For historical reasons, $HOME/.node_modules, $HOME/.node_libraries, and $PREFIX/lib/node will also be searched. $HOME represents a user’s home directory, and $PREFIX will normally be the location Node was installed to.



Creating a package?file

Modules may be contained within a folder. Especially if you are developing a module designed to be shared with other developers, you should organize that module within its own folder, and create a package.json file within that folder.

The package.json file describes a module, usefully documenting the module’s name, version number, dependencies, etc. It must exist if you want to publish your package with npm.

In the terminal, type:

npm help json        

to display detailed documentation for all available package.json fields, or visit the package.json documentation . A package.json file must conform to the JSON specification to be valid.

Adding scripts to package.json

Npm can also be used as a build tool. The scripts field in your package file allows you to set various build directives executed at some point following certain npm commands. For example, if you want to minify your Javascript code, or execute shell commands to build dependencies that your module will need whenever npm install is executed.

To clarify, when you run the directive start (npm start), it will automatically run the directives prestart and poststart.

The available directives are as listed:

  • prepublish | publish | postpublish: Run by the npm publish command as well as on local npm install without any arguments.
  • prepublishOnly: Run before published only on the npm publish command.
  • prepare: Run before the package is published and on npm install without any arguments. Run after prepublish, but before prepublishOnly.
  • prepack: Run before a tarball (.tar file)is packed via npm pack or npm publish, and when installing git dependencies.
  • postpack: Run after a tarball has been generated and moved to its final location.
  • preinstall | install | postinstall: Run by the npm install command.
  • preuninstall | uninstall | postuninstall: Run by the npm uninstall command.
  • preversion | version | postversion: Run by the npm version command.
  • preshrinkwrap | shrinkwrap | postshrinkwrap: Run by the npm shrinkwrap command (creates a package-lock.json).
  • pretest | test | posttest: Run by the npm test command.
  • prestop | stop | poststop: Run by the npm stop command.
  • prestart | start | poststart: Run by the npm start command.
  • prerestart | restart | postrestart: Run by the npm restart command. Note that npm restart will run the stop and start scripts if no restart script is provided.

It should be clear that pre- commands will run before, and post- commands will run after their primary command (such as publish) is executed.

npm as a build system using custom?scripts

You are not limited to the predefined script directives. Extending the scripts collection in a package file is a very common practice. Consider the following script definition:

“dev”: “NODE_ENV=development node --inspect --expose-gc index.js”        

When this command is run via npm run dev, it starts the Node app in debug mode ( — inspect), and exposes the garbage collector so that we can track its impact on our application’s performance.

In many cases you can completely replace more complex build systems like Gulp or Webpack with npm scripts. For example you might want to build your TypeScript application for deployment in a script:

“scripts” : {
    "build": "tsc -p tsconfig.build.json"
}        

Of course, you can make your script more complex by combining commands with && between them.

Additionally, npm scripts are running on the host system in the context of npm, so you are able to execute system commands (mkdir, mv, cp, etc.) and address locally installed modules.

You can combine other scripts into one directive:

“prod”: “npm run build && npm run deploy”        

Any number of steps can be chained in this manner. You might add tests, run a file watcher, and so forth.

Lastly, you can define custom directives for different context in the following way (by convention):

"scripts" : {
  “start:dev” : “NODE_ENV=dev node main.js”,
  “start:prod” : “NODE_ENV=prod node main.js”,
  “start:staging” : “NODE_ENV=staging node main.js”
}        

Registering package dependencies

It is likely that a given module will itself depend on other modules. These dependencies are declared within a package.json file using four related properties:

dependencies: The core dependencies of your module should reside here.

devDependencies: You may depend on some modules, while developing your module, that are not necessary to those who will use it. Typically, test suites are included here. This will save some space for those using your module.

bundledDependencies: Node is changing rapidly, as are npm packages. You may want to lock a certain bundle of dependencies into a single bundled file and have those published with your package, so that they will not change via the normal npm update process.

optionalDependencies: Contains modules that are optional. If these modules cannot be found or installed, the build process will not stop (as it will with other dependency load failures). You can then check for this module’s existence in your application code.

Dependencies are normally defined with a npm package name, followed by versioning information:

“dependencies” : {
 “express” : “4.16.0”
}        

However, they can also point to a tarball (.tar file):

“toto” : “https://toto.com/toto.tar.gz"        

You can point to a GitHub repository:

“nestjs”: “git://github.com/nestjs/nest.git#master”        

They can even point to the shortcut:

“nestjs”: “nestjs/nest”        

These GitHub paths are also available to npm install, for example,

npm install nestjs/nest        

Additionally, in cases where only those with proper authentication are able to install a module, the following format can be used to source secure repositories:

“dependencies”: {
 “a-private-repo”:
 “git+ssh://[email protected]:<USER>/<REPO>.git#master”
}        

Publishing and managing NPM?packages

When you install Node, npm is also installed, and it functions as the primary package manager for the Node community. In order to publish to npm, you will need to create a user. You can do that withnpm adduser that will trigger a series of shell prompts requesting your name, email, and password. You may then use this command on multiple machines to authorize the same user account.

To reset your npm password, visit: https://npmjs.org/forgot .

Once you have authenticated with npm, you will be able to publish your packages using the npm publish command. The easiest way is to run this command is from within your package folder. You can also target another folder for publishing (remembering that a package.json file must exist in that folder).

You can also publish a gzipped tar archive (.tar.gz file) containing a properly configured package folder.

To remove a package, use

npm unpublish <name>[@<version>]        

Note that once a package is published, other developers may come to depend on it. For this reason, it is strongly discouraged to remove packages that others are using. If you want to discourage the use of a version, use

npm deprecate <name>[@<version>] <message>        

To further assist collaboration, npm allows multiple owners to be set for a package:

  • npm owner ls <package name>: Lists the users with access to a module
  • npm owner add <user> <package name>: The added owner will have full access, including the ability to modify the package and add other owners
  • npm owner rm <user> <package name>: Removes an owner and immediately revokes all privileges

All owners have equal privileges — special access controls are unavailable, such as being able to give write but not delete access (part of the open source philosophy).

Global installs and?binaries

Some Node modules are useful as command-line programs. Rather than requiring > node module.js to run a program in the terminal, you can simply type > module on the console and have the program execute. In other words, we can run a module as an executable file installed on the system PATH and therefore accessible from anywhere. There are two ways to achieve this using npm.

The first and simplest way is to install a package using the -g (global) argument as follows:

 npm install -g module        

Another way to ensure global access is by setting the package.json’s bin property:

“name”: “myModule”,
 “bin” : {
  “myModule” : “./path/to/module”
 }        

Please make sure that your file(s) referenced in bin starts with #!/usr/bin/env node, otherwise the scripts are started without the node executable!

When this module is installed (run npm i -g, without module identifier), myModule will be understood as a global CLI command. Any number of such programs may be mapped to bin. As a shortcut, a single program can be mapped, as shown:

“name”: “myModule”,
“bin” : “./path/to/module”        

In this case, the name of the package itself (myModule) will be understood as the active command.

Other repositories

Node modules are often stored in version control systems, allowing developers to manage package code. The repository field of thepackage.json is mostly used to point developers to remote repositories. Consider the following example:

“repository” : {
 “type” : “git”,
 “url” : “https://github.com/nestjs/nest.git"
}        

Similarly, you might want to point users to where bug reports should be filed using the bugs field:

“bugs”: {
 “url”: “https://github.com/<USER>/<PROJECT>/issues"
}        

Lockfiles

Ultimately, npm install is a command that takes a package.json and builds a node_modules folder from it. However, does it always produce the same one? The answer is sometimes, and we will cover the details in a bit.

If you’ve made a new project, or updated npm (npm i -g npm@latest), you may have noticed a new file alongside the familiar package.json -the package-lock.json.

Inside, the contents looks like this:

{
 "name": "project",
 "version": "0.0.0",
 "lockfileVersion": 1,
 "requires": true,
 "dependencies": {
 "@babel/code-frame": {
  "version": "7.0.0",
  "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.0.0.tgz",
  "integrity": "sha512-  OfC2uemaknXr87bdLUkWog7nYuliM9Ij5HUcajsVcMCpQrcLmtxRbVFTIqmcSkSeYRBFBRxs2FiUqFJDLdiebA==",
  "dev": true,
  "requires": {
  "@babel/highlight": "^7.0.0"
 }
},
...
}        

Here are the npm packages your project depends upon. Dependencies of dependencies are nested appropriately.

The real usefulness beyond package.json is delivered by the resolved and integrity fields. Here, you can see the exact file npm downloaded and unzipped to create the corresponding folder within node_modules, and, even more importantly, the cryptographically-secure hash digest of that file (a bit difference will generate a different hash).

With package-lock.json, you can now get an exact and reproducible node_modules folder, so basically to produce deterministic app dependencies. Committed into source control, you can see when a dependent module version has changed right in a diff during a code review. Also, with hashes everywhere, you can be more certain that the code your application depends upon hasn’t been tampered with (no malware code added).

You can pretty much ignore thepackage-lock.json?. To explain how, and why, it’s helpful to expose two common questions (or exclamations) that developers commonly have when encountering the file:

When npm finds a newer version of a package, it’ll download it and update your node_modules folder. It will also update package-lock.json, with the new version number and the new hash.

If there’s a newer version of a package your project depends upon, you probably want npm install to give you the most recent one (fixed bugs).

However, what if you want npm to install a specific version? What if you want it to get the modules with exactly specific versions and hashes? The way to do this lies not in package-lock.json, but back in package.json, and deals with semantic version numbers. Take a look at these three:

1.2.3
~1.2.3
^1.2.3        

1.2.3 means exactly that version, nothing earlier, and nothing later. Use npm i --save-exact <MODULE> to save the exact latest version and not later ones when running npm i.

~1.2.3 matches that version, or anything more recent. The ~ (tilde)means that you only accept patch updates (last digit) when running npm i. You could also use 1.2 or 1.2.x to limit the scope of the updates.

^1.2.3, will bring in that version or something more recent, but stay in version 1. The ^ (caret) means you will receive new features (middle digit) and patches but nothing backward incompatible. You could also just write 1 or "1.x".

Caret is the default when you run npm i <MODULE>. It makes sense, as a change to the first number indicates a major version which might break compatibility with previous versions, in turn potentially breaking your preceding code.

Far beyond these three common examples, there’s a whole language of comparators, operators, identifiers, tags, and ranges possible with semantic versioning and supported by npm (see the npm semver docs ).

As a best practice, avoid manipulating the package.json manually and use available npm directive and tags options.



ECMAScript Modules

After years of waiting, ES modules will probably be a stable feature starting Node v14.x. As of Node v13.x, You just need to add the typeproperty and set it to modulein the package.json?:

{
  ...
  type: "module"
}        

By doing so, all?.js files will be considered to use ES modules. You will not be able to the common.js imports/exports. You will need to use the?.cjs file extension to use traditional imports in a folder where the nearest package.json is set to handle ES modules. But there are quirks and limitations. Check the official documentation for up-to-date info.

Frankly, if you want well supported ES6 modules in Node, I recommend using TypeScript. For that you will need to install it

npm i -g typescript        

Then initialize Typescript in your project root directory

tsc --init        

You can now write your Node app in?.ts files that will be compiled to native JavaScript that is executable by Node. I direct you to Typescript site for more.

For example:

import * as crypto from “crypto”;
import * as fs from “fs”;
import * as path from “path”;         

You could use the ESM library and the above imports will look like this:

import crypto from “crypto”;
import fs from “fs”;
import path from “path”;        

But ESM is kinda awkward when you need to run other modules on your code which can modify internal behavior of these modules. For example, it makes testing more complex if you write your tests with ES modules with mocha (forget Jest…). You will first need to run the ESM module before you can run anything else by doing things like:

{
  "scripts": {
    "start": "node -r esm .",
    "test": "mocha ./tests/*"
  }
  ...
  "mocha": {
    "require": [
      "esm"
    ]
  }
}        

It makes automating more complex (or not feasible) because not all modules will access that you run ESM first (Jest…).



Even better, you can choose to use Nest.js which is an opinionated Node framework that takes full advantage of TypeScript and modern JavaScript.

For even more details, go to the official Node documentation on modules at and the NPM documentation .

There you have it! You now know what most Node developers don’t about Node modules.

Congratulations and thanks for reading and please share if it was useful. Now, go be a great Node developer!

My Packt courses:

Hands-On Web Development with TypeScript and Nest.js [Video]

RESTful Web API Design with Node.js 12 [Video]

要查看或添加评论,请登录

社区洞察

其他会员也浏览了