The Fundamentals Of Scaling Applications
Awais Kamran
Engineering Lead @ Conrad Labs | AWS Certified Solution Architect | Full Stack Developer | Machine Learning Enthusiast | Tech Speaker
The Scale Cube
Scaling application at large is an ever-evolving practice as the technological landscape continues to grow. The eventual goal for the applications is to accommodate incoming traffic by increasing availability and latency.
This article discusses the three essential aspects to achieve it through the scale cube principle introduced in the book "The Art of Scalability" by Martin L. Abbott and Michael T. Fisher. The scale cube (as depicted in the image above) comprises three axes, the x-axis denotes the instances of your application or the number of processes upon which the current application is running. The y-axis denotes microservices, which denotes the decluttering of loosely coupled services within the main services, enabling them to operate independently along with a single or multiple databases. The z-axis represents the database being split into multiple databases with multiple copies that may or may not sync, this is achieved to either distribute the database load or to persist data independently.
The scale cube represents a bigger picture that software architects have to take into account while designing the systems depending on their use case, which is always subject to change. The essence of the pictorial representation is to help you through your system design interviews and also to summarize the important aspects that may come in handy within your next project. Let's go through each of the axes in detail.
Instances
Before jumping into creating multiple instances, it's important to discuss the difference between horizontal and vertical scaling. The picture below depicts the difference in the aforementioned terms. [ Image Source - https://www.geeksforgeeks.org ]
Vertical Scaling
Also known as scaling up
Back in the day, when virtual machines were rented out to the users, vertical scaling was the option that took quite a toll on the users when they experienced high incoming traffic. Bumping up the hardware has always been expensive and required manual monitoring and making a lot of calls to your service providers. Sometimes, adding up more hardware would result in being futile for longer durations with your application experiencing a brief thrust of traffic within dedicated time slots.
Horizontal Scaling
Also known as scaling out
With the emergence of the cloud, it became more common to replicate your application process into multiple instances in order to distribute the application's traffic load with or without a load balancer. The following example would explain this a bit further, lets's say we have a node process as app.js which send out a payload comprising of the port number, process, and a simple message.
const http = require('http')
const port = parseInt(process.argv[2] || 3000)
const server = http.createServer((req, res) => {
const payload = JSON.stringify({
port,
processID: process.pid,
message: "Hello Human!"
})
res.writeHead(200, { 'Content-Type': 'application/json'})
res.end(payload)
})
server.listen(port)
console.log(`Server running on port ${port}`)
Now, let's create another file index.js, which will fork the above process into multiple processes. Consider these processes as multiple instances, with the only difference of it being run on one machine instead of multiple separate machines.
const { fork } = require('child_process')
const processes = [
fork('./app', ['3001']),
fork('./app', ['3002']),
]
console.log(`Forked ${processes.length} processes`)
Now, if you run the node application, you will observe that the application is running on two separate ports 3001 and 3002 with their own independent memory. This example is obviously without a load balancer, to depict an example with a load balancer we can use pm2 which acts as an advanced node process manager for production applications, and acts as a load balancer as well for multiple processes. You can run the following in your root to install
> npm install -g pm2
Once it is installed, we can use pm2 to create multiple instances for us instead of using index.js to fork the processes. It's important to point out that forking was launching the processes on separate ports but pm2 would launch the processes under the same port because it operates on clusters underneath. Cluster mode launches processes as a master-slave process under the same port, where the master is responsible for spawning the processes and slaves act as distributed child processes.
The following command will create three independent processes.
> pm2 start app.j -i 3
...
You can run the following command to inspect your running processes along with their memory and CPU usage
领英推荐
> pm2 list
...
To load test your application, let's install an npm package to demonstrate that
> npm install -g loadtest
...
...
> loadtest -n 4000 https://localhost:3000
...
...
The last command listed above would fire 4000 requests to port 3000, and pm2 distributes the load amongst the three instances created above. To visualize that, you can run the following command which will enable you to check the logs for each separate instance created and you can observe that each instance served requests showing that the load was indeed distributed across the three processes.
> pm2 logs
The essence of the above demonstration was to explain horizontal scaling and to show you the way a process can run on multiple instances and can be managed by a tool like pm2 to help load balance and orchestrate the processes.
Microservices
The x-axis caters to the scaling out of the process, either on the same or different machines but the challenge arises when you have multiple high traffic modules and even after scaling out they perform badly during traffic bursts. The question here is not "how much you should scale out?", instead it is "how far you should scale out?"
Sometimes, it is important to identify that one monolith application can have multiple high-traffic sub-modules. For example, consider an e-commerce application with multiple modules namely:
In theory, the application can have more modules besides the above ones, scaling out your application and distributing the processes across instances might not be fair because the aforementioned modules require a stronger engine in comparison to the rest of the modules. Hence, the idea of microservices is to identify certain modules and treat them independently. Instead of launching seven instances of the main module, it will be wiser to launch two instances for the payment module, three instances of the inventory module and launch other modules within the remaining instances. This will help in distributing the load in a better manner since we are giving more importance to high-traffic modules and we can always monitor our modular services and change the distribution as per the incoming traffic.
The cloud platforms today help in achieving this through automated scaling out for each instance, you can set up configurations in terms of several attributes which may involve CPU usage and traffic. The question raised above was - "how far you should scale out?", to answer this you should identify your use cases beforehand which might have the tendency to clog your application and derail its performance.
Microservices leverage the independence of modules not only in terms of hardware but also in terms of team allocation. One can easily drop out of the monolith architecture when the sub-modules have been segregated, which means we can have separate teams and codebases for the above three modules, which will be operated and deployed independently while exposing ways to interact with other services within the application's ecosystem. Another advantage of such a segregated service is that you don't have a single point of failure anymore, let's say a developer accidentally deployed a release with a crucial bug within the inventory service, only that particular service will be affected while all other services will work fine as before.
By just adopting the y-axis and x-axis of the scale cube, you can achieve a significant degree of scalability, to enhance it further we have to understand the partitioning part within the z-axis.
Partitioning
Also known as sharding
Partitioning is mostly considered in terms of storage instead of processes but the reason it is important is that it completes the scalability picture. The question may arise - "If partitioning is considered for a separate entity within the architectural paradigm then why have it within the scale cube?"
To answer this, consider partitioning in terms of microservices, where multiple services can share one or more databases and partitioning can be vertical and horizontal just like instances. To explain it further, let's take an example from the following table below.
Vertical sharding can be achieved column-wise (similar to normalization) and horizontal sharding can be done by segregating the rows while being stored in multiple databases through a sharding function.
The sharding or hash function returns a value for which the data location can be determined within multiple shards. The image above shows the example of horizontal sharding, here the partition keys are stored in separate databases, this will eventually help reduce read operations on one database and would distribute the I/O load within two database shards. [ Image Source - https://aws.amazon.com/blogs/database]
Sharding just like the rest of the two axis needs to be scenario-based, lets's take another example of Netflix - "Would it help Netflix to have one database? or multiple database shards representing regions?".
A particular user would be shown suggestions as per their regions and it would be more efficient in terms of latency and I/O operations to fetch results from a region-based shard instead of having multiple databases or master-slave databases.
Conclusion
This article provides a brief and concise overview of what initial options one might consider while thinking bout scaling. Moreover, this article also serves as a starting point for preparing your system design interview in terms of getting accustomed to formal terminologies and concepts. There is more to each of the discussed axes but hopefully, this read will give you some clarity and lay a foundation to explore the scalability concepts in detail.
Staffing freelancers across Germany
2 年Love to see this kind of knowledge sharing. This is what LinkedIn should be about! ??