Demystifying Latency: a critical aspect of Data-Intensive Scalable Architectures
Nagendra Sharma
Leader Google Cloud Platform(GCP) | Applications, Data & Cloud Innovation | GenAI & Advanced Analytics Expertise | Driving Scalable Tech Solutions
?
It is not pragmatic to design and build systems with no-latency. Even for human brain need a pause to ' think.' Let's start with the three essential questions- What is the latency, how is it calculated, and how understanding latency will help design composable applications into two or more categories (Online/Real-time and Batch/Offline-Data) Applications?
How Latency is calculated/measured or what are the factors that contributed to high latency:- Multiple Layers ( Full-Stack) of Application Design
?
The below figure explains the different layers of a simple web application design:-
?
?
?
Here is the headroom equation that is very useful for capacity estimation, measuring latency, and suggesting headroom over a period of time. Moreover, this equitation encourages the project team to optimize the performance of the system in production continuously.
·?????? Headroom= [(Ideal Usage PercentageMaximum Capacity)- Current Usage Growth(t)- Optimization(Projects(t))]
·?????? *Headroom Time= HeadroomGrowth(t)- Optimization(Projects(t))]
?
To design the data-intensive application and platform, I want to explicitly explain the core principle to avoid the latency of reading data from the disk & writing data to the disk. After the hard drive, SSD(solid-state disk) is an amicable solution, but it has limitations ( high cost and not scalable). There are many processes and methods are available that are being used by No-SQL database engines, sharding (horizontal scaling) and caching( Key-value pair). Caching and Sharding are unique concepts that evolve data access patterns. To dive deep, 'consistent hashing' overcomes much more traditional horizontal scaling challenges; what if we need to add more nodes or a couple of nodes are inactive in the DB cluster.
领英推荐
3-D Scalable Solution- Cube:
?
The below figure is trying to explain the scalability rules as a 3D:-
??
?
'Fractal Tree Indexing'?over Binary Tree Data structure a solution for eliminating I/O ops(Disk Seek):
?
Here comes the fantastic concept of?'Fractal Tree Indexing'?of Tokutek.?Over the traditional 'Binary-Tree' algorithm for reading and writing data at B-tree leaf nodes (at disk). I like this algorithm for two remarkable aspects:-
The database engine of MySQL that is TokuDB is built using 'Fractal tree indexing.' MongoDB is using TokuMx as DB engine, which is also using 'Fractal Tree Indexing.'
Before diving deep, let's start with the?Binary Tree?data structure used to store large data blocks (RAM/Disk). Beyond RAM/Main memory, data need to be inserted into the disk as B-tree leaf nodes. The Fractal Tree is like a Binary tree in implementation. Additionally, each internal node of Fractal Tree Indexing has a 'buffer.'?
?
The above explains that the Fractal Tree indexing can reduce disk I/O operations, and 'disk seeks' is one of the latency's critical aspects.
?
Conclusion:??Global and scalable data-intensive designs sometimes add latency to get the value of consistency and availability. As per the CAP (Consistency, Availability, and Partition tolerance) theorem, at the most, two dimensions can be achieved, whereas we may either compromise or will trade-off for one aspect. By reducing disk seeks (I/O), having an edge and CDN solution, apply caching across layers, having a cache eviction policy ( cache hit), right sharding approach, one can build a low latency solution. It is essential to calculate headroom* for each component of the architected system so that the seasonality (peak/headroom time) trend can be projected for a website or service. The key factors that will be adding latency should be identified, and one should derive related solutions to minimize the latency.
References: The Art of Scalability?
?