Foundations of Tree Shaking
Benjamin Cloughessy
Application Developer @ Threat Tec | People-First Professional | TS - JS - Python - React - Node - Azure
A rabbit hole for the curious developer
Note: Being neither a mathematician nor a tree-shaking expert, I have written this article with a reasonable amount of speculation where exact processes were unknown to me.
The reader might be interested to know that - only recently - I discovered I didn't actually know what a graph was. Not completely, anyway.
Or, I suppose, you might be saying to yourself: "I'm actually not that interested."
Totally fine. The list of things I don't know is quite long, and if you took notice every time I learned something new, I'm not sure you would have time for anything else.
The point is, I was recently learning about graphs, while also working on optimizing the bundle size for a Microsoft pcf control, and it lead me down a really fun exploration of JavaScript module bundlers and tree-shaking.
I had a hard time finding any single resource that dug into the general steps, mechanics, and mathematics of the tree-shaking process in a way that ?????? ???????????????? someone like me could understand, so here we are.
Hope you enjoy it.
Intro - Graphs
We'll jump right in with a fly-by of the relevant topics in graph theory. I ask forgiveness from the mathematicians for any lack of nuance in the definitions - my focus is simple practicality.
A graph in graph theory is way of describing relationships. For certain types of data, it is a more efficient storage method, and it also gives us access to several useful mathematical tools and algorithms.
In a graph, nodes (1,2,3,4,5,6) are entities connected by relationships called edges:
A directed acyclic graph is a directed graph (the edges have a direction) with no closed loops:
Intro - Modules
Hang on to the graph concepts, we'll come back to them soon. For now, let's briefly mention JavaScript modules. The reader is likely familiar with ES2015 import/export statements which allow us to split JavaScript code into (hopefully) small exportable and importable modules. Unless the imports are done dynamically, then all the information regarding what exports are available and what imports are made is static and available at build time.
Intro - Bundlers and Tree-Shaking
While modules are great for developer experience and productivity, our web applications need a single JavaScript bundle at runtime. Bundlers (Webpack, Rollup, Parcel, etc..) try to give us the best of both worlds on this one.
Modules in, bundle out - so to speak.
That bundle can be enormous, and sometimes it's because ?????? ????????????? ??????? ?????????? we're including unused modules in our bundle. Tree-shaking is the process of finding and removing unused code from our modules before bundling.
Tree Shaking - Building the Dependency Graph
The first step in the tree-shaking process is to build the dependency graph - a directed acyclic graph of all the modules connected to our project.
From here on out I will use a simplified example made with Flourish - an actual example can be far more complex (imagine all the modules and downstream dependencies in even a small node_modules folder)!
Disconnected Nodes
We start by laying out our modules as disconnected nodes. We will imagine using a simple TypeScript/React project (if such a thing exists) with no external libraries:
Relationships
Now we define the entry point(s) for our application. An entry point is a starting place for determining what code is reachable in our code base. We'll use a single entry point - Index.tsx.
If we were using Webpack the config file might look something like this:
领英推荐
module.exports = {
entry: {
main: './Index.tsx',
},
};
We're going to turn this lazy sack of nodes into a useful dependency graph! Remember how the imports and exports of our ES2015 modules are static and available at build time? With this, we can give each node metadata about exactly what it is importing/exporting and make our graph look a bit more... graphy.
If node B imports anything from node A, we will draw an arrow from node A to node B. The direction of the arrow might seem a bit backwards, so it can be helpful to remember that the arrow points towards a dependency.
Here's what that might look like in our example:
Now we're getting somewhere!
We can easily spot that the Footer component isn't actually being imported into the rest of the project. We'll be able to eliminate it from the bundle in a bit.
If we could see all the metadata, we would also be able to visualize the specific import/export data for each edge. Although Index.tsx and Login.tsx are both importing something from utils.ts, it might have some unused exports left over. We'll be able to eliminate them soon.
Graph Traversal - Topological Sorting
There is a way of sorting graphs, called the topological sort, which is only valid for directed acyclic graphs. It will transform our jumbled network of nodes into an ordered array of nodes where each node appears before all the nodes that it point to.
There are two prominent algorithms for traversing a directed acyclic graph and performing the topological sort, namely: depth-first search and Kahn's algorithm.
The basic idea of Kahn's algorithm is to look for a node with no arrows pointing towards it. That node is said to have an indegree of 0. In the case of dependencies, this node would represent a module upon which nothing else depends. If nothing depends on the node, we add it to the list, 'remove' the node, and repeat.
The depth-first search algorithm starts with an arbitrary unvisited node in the graph. It marks the current node as visited and recursively follows the outgoing edges (dependencies) to visit all its unvisited neighboring nodes. The algorithm adds the current node to the topological order only after visiting all its neighbors (post-order traversal). The result is a reverse topological order, but we can flip it when we're done to get the final topological order.
Regardless of the algorithm used, we end up with a list off all modules used in the project, ordered such that every module appears before its dependencies. Any modules that didn't make it into this list shouldn't make it into our final bundle.
Given our example project, and the entry point of Index.tsx, a potential topological ordering might be:
[Index.tsx, indexStyles.css, Login.tsx, utils.ts, loginStyles.css]
Import/Export Analysis
At this point, we should stop to appreciate that we've removed any unused modules from our bundle. The next helpful step will be to analyze the specific imports and exports to trim down any modules with unused exports.
Because each node can carry metadata about what it can export, and each edge can carry metadata about what is being imported, our graph already has all the information necessary to trim unused exports off our modules.
If we were to consider our example utils.ts file, I imagine it would look something like this:
{
"utils.ts": {
"exports": [
"getData()", "postData()", "transformData()", "authenticateUser()"
],
"imports": []
},
"Login.tsx": {
"exports": [
"Login.tsx"
],
"imports": [
"authenticateUser()", "loginStyles.css"
]
},
"Index.tsx": {
"exports": [
"Index.tsx"
],
"imports": [
"getData()", "indexStyles.css", "Login.tsx"
]
}
}
By comparing the exports of utils.ts against the imports listed in other nodes, we know we can safely remove both the postData and transformData function exports - further reducing our bundle size.
Practical Implications
I certainly didn't explore this topic with any goal other than to satisfy my curiosity, but what was learned reinforces and explains some of the best practices we are already familiar with:
When writing code to be imported/exported:
When importing code:
And that's a wrap! I hope you enjoyed reading.
If you learned anything with me, or if you have any correction suggestions, I would love to hear from you.
You can check out the dynamic version of our graph here.