How To Optimize Full Stack JavaScript Applications And Deal With Huge Database(MongoDB)
Sandip Das
Senior Cloud, DevOps, MLOps & ML Platform Engineer | Heading Cloud, DevOps & MLOps for start-ups | AWS Container Hero | Educator | Mentor | Teaching Cloud, DevOps & Programming in Simple Way
Over the last couple of years working as full stack javascript application developer and there is many issues I faced , some are small / easy , some are medium and some are big / too complex but solving those issues I have learned many facts , one of the biggest issue is how to optimize a full stack javascript application where javascript running in both frontend and backend at the same time and handle large database (MongoDb), below a brief summary of my story in making that happen.
Here's The Issue:
Loading any web application taking much time . Simply loading the HTML page , then the application would request several other JavaScript and CSS files, loading data from server is also taking much time as there is too much data to show and some complex calculations are running on the run at back-end .
Clearly this was not acceptable, so I set about doing everything I could think of to make things faster as faster as I can.
The browser UI and JavaScript code share a single processing thread. It doesn’t matter whether the browser needs to respond to its own menu click, render an HTML page or execute your Ajax call — every event is added to a single queue. When the browser becomes idle, the next item on it’s to-do list is retrieved and executed.
In reality, no modern browser operates on a single thread. As an extreme example, IE9 and Chrome start a new OS process for every tab. However, there is still a single event queue per viewed page and only one task can be completed at a time. This is absolutely necessary because the browser or your JavaScript can alter the rendered HTML before, during or after it’s downloaded.
Understandably, the browser must limit the time it takes for your JavaScript code to run. If a script takes too long, it will lock the application and potentially cause OS instability. It’s the reason you’ll see the dreaded “Unresponsive Script” alert:
But how does a browser determine when a script has run for too long? As you’d expect, the top 5 vendors implement different techniques and limits…
Internet Explorer
IE limits JavaScript execution to 5 million statements.
Firefox
Firefox uses a timed limit of 10 seconds.
Safari
Safari uses a timed limit of 5 seconds.
Chrome
Chrome does not limit execution but detects when the browser crashes or becomes unresponsive.
Opera
Opera does not implement a limit and will execute JavaScript indefinitely. However, the browser will not cause system instability — you can continue to open other tabs or close the page executing the code.
Several of the browsers allow you to configure the execution limit parameters, but that’s not something I’d personally recommend. I won’t publish the details here because someone, somewhere will use it as a “fix” for their unresponsive page! Google it if you like, but tweaking browser settings for badly-behaved code does not address the root of the problem.
Solutions:
So how can we prevent JavaScript execution alerts? The best solution is to avoid long-running client-side tasks. Ideally, no event handler should take longer than a few dozen milliseconds. Intensive processing jobs should normally be handled by the server and retrieved with a page refresh or an Ajax call.
Go Parallelize
In order to render the HTML page for any web app, the node.js application needs to retrieve a lot of data for the aplication in question.
At minimum this means it needs to retrieve the data from the user’s current browsing session to check they’re logged in and it needs to pull in data about the user (e.g. the user’s name, which sites they have access to, their API key and the parameters of their subscription), and about the site in question for the app (site name, unique token etc).
In order to retrieve this data, the application needed to make several calls to internal API functions, many of which could take up to 1 or more seconds to complete. Each request was made by a separate Express middleware, which meant they were running in series. Each request would wait for the previous one to complete before starting.
Since node.js is perfectly suited to running multiple asynchronous functions in parallel, and since a lot of these internal API requests didn’t depend on each other, it made sense to parallelize them — fire off all the requests at once and then continue once they’ve all completed. I achieved this with the aid of the (incredibly useful) async module.
Use Cache
Sometime even after parallelized all of our internal data-fetching, loading a web app was still pretty slow. The reason for this was because not only was the application fetching all this data for the initial page load, it was also fetching it for a lot of subsequent JavaScript requests .
Now solution for this is to cache any data which already fetched and probability that data will change very often.
Loading JS and CSS
The front-end of the web application has a lot of interconnected components. The JavaScript for the application falls into some smaller parts , the main part was application core, and js widgets of which each js component has it's own code(which now simplified by React.js).
I bundled all our libraries into the main application bundle (, and all of the javascript widgets code into a secondary bundle which was loaded dynamically.
One way around this problem would be to break each individual component into its own file and include them all individually — that way any files that don’t get changed frequently can sit in the browser’s HTTP cache and not be requested. The problem with this, though, is that there would be a lot of files, some of them incredibly small. And (especially on mobile browsers), the overhead of loading that many individual resources vastly outweighs the extra overhead we had before of re-downloading unchanged content.
I used basket.js to load JavaScript from localStorage but you can use requirejs or browerify.js too if don't want to use localStorage , using a combination of server-side script concatenation and localStorage for caching. In a nutshell, the page includes a lightweight loader script, which figures out which JS and CSS it has already cached and which needs to be fetched. The loader then requests all the resources it needs from the server in one request, and saves all the resources into localStorage under individual keys. This gives us a great compromise between cutting down the number of HTTP requests while still being able to maintain cacheability, and not re-downloading code unnecessarily when it hasn’t changed. Addtionally, after running a few benchmarks, we found that localStorage is (sometimes) actually faster than the native HTTP cache, especially on mobile browsers.
Along with this, I also switched all of our static (JS and CSS) asset loading to be served through CloudFront, Amazon Web Service’s content delivery network. This means content is served from the nearest possible geographic location to the user.
I also found some optimizations to prevent loading or storing duplicate code. By de-duplicating the caching and requests based on a digest of each resource’s contents, we were able to cut out unnecessary requests and storage.
With these intelligent changes to resource loading we were able to cut down the total number of HTTP requests necessary to render the app to one (just the page itself), which meant that for users quickly switching between web app for different sites, each page would load within a few seconds.
But I always believe that could do even better, there are now new technologies out like gulp , grunt to minify js and css and used for faster loading them.
Directly Fetching Data
All the user, site and subscription data described in the first two steps was being fetched via a secure internal HTTP API to our internal account system, we can able to cut out the internal HTTP component completely, instead including a node module directly in the application and requesting our databases directly. This allowed us much finer-grained control over exactly what data we were fetching, as well as eliminating a huge amount of overhead
Give More Importance in Client Side
Thanks to all the changes I have made up to this point, all that was different between different app for different sites was a config object passed to the loader on initialization. It didn’t make sense, therefore, to be reloading the entire page when simply switching between sites or between Now and Trends, if all of the important resources had already been loaded. With a little bit of rearranging of the config object, we were able to include all of the data necessary to load any of the web page accessible to the user. Throw in some HTML5 History with pushState and popState, and we’re now able to switch between sites or pages without making a single HTTP request or even fetching scripts out of the localStorage cache. This means that switching between pages now takes a couple of hundred milliseconds, rather than several seconds.
So far all this has been about reducing load times and getting to a usable web app in the shortest time possible. But we’ve also done a lot to optimise the application itself to make sure it’s as fast as possible. In summary:
Avoid Big Complex Libraries — for example, jQuery UI is great for flexibility and working around all manner of browser quirks, but we don’t support a lot of the older browsers so the code bloat is unnecessary. We were able to replace our entire usage of jQuery UI with some clever thinking and 100-or-so lines of concise JS (we also take advantage of things like HTML5’s native drag-and-drop).
Check Weak Spots In Popular Libraries — for example I use moment with moment-timezone for a lot of our date and time handling. However moment-timezone is woefully inefficient (especially on mobile) if you’re using it a lot. With a little bit of hacking we added a few optimizations of our own and made it much better for our use-case.
Don't Use Slow Animations — a lot of studies have been posted about this in the past, and it really makes a difference. Simply reducing some CSS transition times from 500ms to 250ms, and cutting others out entirely, made the whole app ui feel snappier and more responsive
Visual Feedback — one of the big things I found when using Trends was that switching between time frames just felt slow. It took under a second, but because there was a noticeable delay between clicking on the timeframe selector and anything actually happening, things felt broken. Fetching new data from our API is always going to take some time — it’s not going to be instant. So instead I used the loading spinner on each widget. Nothing is actually any faster, but the whole experience feels more responsive. There is immediate visual feedback when you click the button, so you know it’s working properly.
Use Flat design For Steady Performance — it may well just be a design trend, but cutting out superficial CSS gradients and box shadows does wonders for render performance. If the browser doesn’t have to use CPU power to render all these fancy CSS effects, you get an instant boost to render performance.
Even after all these optimizations and tweaks, I am well aware that there’s still plenty of room for improvement. Especially on mobile, where CPU power, memory, rendering performance, latency and bandwidth are all significantly more limited than they are on the desktop , so I don't stop after making above improvements , have to do much more and I have continued further improvements.
Now let's Talk About back-end side , in back-end most of operation gone through database and most likely for full stack javascript applications database is MongoDb.
I’ve been using MongoDB in production since mid-2013 and have learned a lot over the years about scaling the database. I do run multiple MongoDB clusters but the one storing the historical data does the most throughput and is the one I shall focus on in this article, going through some of the things we’ve done to scale it.
Use Dedicated Hardware, and SSDs
All my MongoDB instances run on dedicated servers across . I’ve had bad experiences with virtualisation because I have no control over the host sometime, and databases need guaranteed performance from disk i/o. When running on shared storage (e.g., a SAN) this is difficult to achieve unless you can get guaranteed throughput from things like AWS’s Provisioned IOPS on EBS (which are backed by SSDs).
MongoDB doesn’t really have many bottlenecks when it comes to CPU because CPU bound operations are rare (usually things like building indexes), but what really causes problem is CPU steal - when other guests on the host are competing for the CPU resources.
The way we can combat these problems is to eliminate the possibility of CPU steal and noisy neighbours by moving onto dedicated hardware. And we can avoid problems with shared storage by deploying the dbpath onto locally mounted SSDs.
Use Multiple Databases To Benefit From Improved Concurrency
Running the dbpath on an SSD is a good first step but you can get better performance by splitting your data across multiple databases, and putting each database on a separate SSD with the journal on another.
Locking in MongoDB is managed at the database level so moving collections into their own databases helps spread things out - mostly important for scaling writes when you are also trying to read data. If you keep databases on the same disk you’ll start hitting the throughput limitations of the disk itself. This is improved by putting each database on its own SSD by using the directoryperdb option. SSDs help by significantly alleviating i/o latency, which is related to the number of IOPS and the latency for each operation, particularly when doing random reads/writes. This is even more visible for Windows environments where the memory mapped data files are flushed serially and synchronously. Again, SSDs help with this.
The journal is always within a directory so you can mount this onto its own SSD as a first step. All writes go via the journal and are later flushed to disk so if your write concern is configured to return when the write is successfully written to the journal, making those writes faster by using an SSD will improve query times. Even so, enabling the directoryperdb option gives you the flexibility to optimise for different goals (e.g., put some databases on SSDs and some on other types of disk, or EBS PIOPS volumes, if you want to save cost).
It’s worth noting that filesystem based snapshots where MongoDB is still running are no longer possible if you move the journal to a different disk (and so different filesystem). You would instead need to shut down MongoDB (to prevent further writes) then take the snapshot from all volumes.
Use Hash-based Sharding For Uniform Distribution
Every item we monitor (e.g., a server) has a unique MongoID and we use this as the shard key for storing the metrics data.
The query index is on the item ID (e.g. the server ID), the metric type (e.g. load average) and the time range; but because every query always has the item ID, it makes it a good shard key. That said, it is important to ensure that there aren’t large numbers of documents under a single item ID because this can lead to jumbo chunks which cannot be migrated. Jumbo chunks arise from failed splits where they’re already over the chunk size but cannot be split any further.
To ensure that the shard chunks are always evenly distributed, we’re using the hashed shard key functionality in MongoDB 2.4. Hashed shard keys are often a good choice for ensuring uniform distribution, but if you end up not using the hashed field in your queries, you could actually hurt performance because then a non-targeted scatter/gather query has to be used.
Let MongoDB Delete Data With TTL Indexes
The majority of our users are only interested in the highest resolution data for a short period and more general trends over longer periods, so over time we average the time series data we collect then delete the original values. We actually insert the data twice - once as the actual value and once as part of a sum/count to allow us to calculate the average when we pull the data out later. Depending on the query time range we either read the average or the true values - if the query range is too long then we risk returning too many data points to be plotted. This method also avoids any batch processing so we can provide all the data in real time rather than waiting for a calculation to catch up at some point in the future.
Removal of the data after a period of time is done by using a TTL index. This is set based on surveying our customers to understand how long they want the high resolution data for. Using the TTL index to delete the data is much more efficient than doing our own batch removes and means we can rely on MongoDB to purge the data at the right time.
Inserting and deleting a lot of data can have implications for data fragmentation, but using a TTL index helps because it automatically activates PowerOf2Sizes for the collection, making disk usage more efficient. Although as of MongoDB 2.6, this storage option will become the default.
Take Care Over Query And Schema Design
The biggest hit on performance I have seen is when documents grow, particularly when you are doing huge numbers of updates. If the document size increases after it has been written then the entire document has to be read and rewritten to another part of the data file with the indexes updated to point to the new location, which takes significantly more time than simply updating the existing document.
As such, it’s important to design your schema and queries to avoid this, and to use the right modifiers to minimise what has to be transmitted over the network and then applied as an update to the document. A good example of what you shouldn’t do when updating documents is to read the document into your application, update the document, then write it back to the database. Instead, use the appropriate commands - such as set, remove, and increment - to modify documents directly.
Consider Network Throughput & Number Of Packets
Assuming 100Mbps networking is sufficient is likely to cause you problems, perhaps not during normal operations, but probably when you have some unusual event like needing to resync a secondary replica set member.
When cloning the database, MongoDB is going to use as much network capacity as it can to transfer the data over as quickly as possible before the oplog rolls over. If you’re doing 50-60Mbps of normal network traffic, there isn’t much spare capacity on a 100Mbps connection so that resync is going to be held up by hitting the throughput limits.
Also keep an eye on the number of packets being transmitted over the network - it’s not just the raw throughput that is important. A huge number of packets can overwhelm low quality network equipment - a problem we saw several years ago at our previous hosting provider. This will show up as packet loss and be very difficult to diagnose.
Finally
Optimizing and Scaling application is an incremental process - there’s rarely one thing that will give you a big win. All of these tweaks and optimisations together will help us to perform application load quickly and back-end operations becomes more faster.
Ultimately, all this ensures that our clients can get excellent product and behind the scenes we could know that data is being written quickly, safely and that we can scale it as we continue to grow.
Thanks for reading!
About the Author
Sandip Das is a tech start-up adviser as well as working for multiple international IT firms , tech-entrepreneurs as Individual IT consultant / Sr. Web Application developer / JavaScript Architect , worked as a team member, helped in development and in making IT decision . His desire is to help both the tech-entrepreneurs & team to help build awesome web based products , make team more knowledgeable , add new Ideas to give WOW expression in product.
Founder | Army Veteran | Veterans Advocate | Startup Lover | Explorer
8 年How do you feel about Firebase
Senior Cloud, DevOps, MLOps & ML Platform Engineer | Heading Cloud, DevOps & MLOps for start-ups | AWS Container Hero | Educator | Mentor | Teaching Cloud, DevOps & Programming in Simple Way
9 年Yes, I just ignore those comments , I was written many posts , I am writing some posts (in progress) , and I will write in future as well , no one can stop me to do that , even they comment whatever they want , I will write , I don't bother about 1000 person who liked my post , I am writing for the 1 who is seeking for my posts , numbers really does not matter ,back in past there is many post I have written as well but there is no viewers or no likes , but still I written many more posts , not to please anyone , it's just feel excellent when someone get benefited by my post .
Senior Cloud, DevOps, MLOps & ML Platform Engineer | Heading Cloud, DevOps & MLOps for start-ups | AWS Container Hero | Educator | Mentor | Teaching Cloud, DevOps & Programming in Simple Way
9 年Hi Khalid , There is some people , who offered me projects , assignments and asked me to join their start-up , but I refused every time , so it's kind of revenge , they are commenting bad things on my post to make me down and try to charge me with false allegations, Please ignore those kind of peple , I hope they someday will get life and try to do something good other then harming people or their reputation . I never pray bad for anyone , still wish their start-up may get success even though they don't want to see my success
Principal Developer & Lead, Azure AI Foundry
9 年Please add links to the original posts, otherwise it might just be plagiarism.