Demystifying Latency:  a critical aspect of Data-Intensive Scalable Architectures

Demystifying Latency: a critical aspect of Data-Intensive Scalable Architectures


?

It is not pragmatic to design and build systems with no-latency. Even for human brain need a pause to ' think.' Let's start with the three essential questions- What is the latency, how is it calculated, and how understanding latency will help design composable applications into two or more categories (Online/Real-time and Batch/Offline-Data) Applications?

  • Latency:?

  • Generally, latency is one of the opportunities that thrives enormous improvements in scalable architecture designs.?It is defined as the delay between a request/action and response (webpage/mobile app) to that action. It is often referred to as the time is taken (data packet travel duration)in the total round trip time in computer networking terms.
  • Online( Real-time) vs. batch ( offline-data) applications categorization This is my viewpoint that the level of acceptance criteria of latency (milliseconds) is the critical factor in categorizing Real-time applications & Batch applications.

How Latency is calculated/measured or what are the factors that contributed to high latency:- Multiple Layers ( Full-Stack) of Application Design

  • Lack of edge computing:?Cloud computing, DB & storage are not hosted in the same 'availability zone/AZ,' or compute/storage is hosted in dispersed in multi-data centers ( in case of on-premise).
  • Frontend development (Code)

  • Static content (Gzip)
  • Choosing to React or Angular Javascript framework/library according to use case of SPA( single page architecture) requirement for Mobile or web apps.
  • Backend development (Code) Code - function points, repetitive loops wrappers ( Microservice to access old code/program)
  • Network Bandwidth & low processing power: Slower network and low processing compute(?worker nodes) I/O ops Lack of intelligent load balancer(ALB/ELB)
  • Middleware & Security Rate limiting/DDoS Message transport?
  • Databases SQL and No-SQL

  • Sharding/ Horizontal scaling (No-SQL)
  • Indexing for RDBMS

?

The below figure explains the different layers of a simple web application design:-

?

?

  • How many steps ping a URL work in the background for us L1 Cache Reference L2 Cache Reference Branch Mispredict Main Memory Reference Mutex lock/unlock Compress bytes transport bytes over the network Read bytes sequentially from memory. roundtrip from the datacenter disk seeks ( read and write data) Read bytes sequentially from Network. Read sequentially from disk. Send packet /round trip.
  • What are other pragmatic factors & solutions that need to be considered: Generally, it takes a longer time and a slower response while sending data packets across data centers located far away from each other. Sharing data/files/content globally is expensive.?(CDN- Content delivery networks, Horizontal Scaling) ?Reading & writing data to Disks ( I/O ops): Disks seek slower than memory?( Try to avoid disk seeks by Caching as much as possible) Network bandwidth?( compression algorithm, Vertical Scaling) Compute?( GPU over CPU size for working nodes in case of online/ETL processing speed matters to business) ?Choosing wisely-Reads and write?data access patterns, 'writes' are costlier than 'reads.'?( using DBs their Database Engines having Fractal Tree design instead of B-Tree)

?

Here is the headroom equation that is very useful for capacity estimation, measuring latency, and suggesting headroom over a period of time. Moreover, this equitation encourages the project team to optimize the performance of the system in production continuously.

·?????? Headroom= [(Ideal Usage PercentageMaximum Capacity)- Current Usage Growth(t)- Optimization(Projects(t))]

·?????? *Headroom Time= HeadroomGrowth(t)- Optimization(Projects(t))]

?

To design the data-intensive application and platform, I want to explicitly explain the core principle to avoid the latency of reading data from the disk & writing data to the disk. After the hard drive, SSD(solid-state disk) is an amicable solution, but it has limitations ( high cost and not scalable). There are many processes and methods are available that are being used by No-SQL database engines, sharding (horizontal scaling) and caching( Key-value pair). Caching and Sharding are unique concepts that evolve data access patterns. To dive deep, 'consistent hashing' overcomes much more traditional horizontal scaling challenges; what if we need to add more nodes or a couple of nodes are inactive in the DB cluster.


3-D Scalable Solution- Cube:

?

The below figure is trying to explain the scalability rules as a 3D:-

  1. X-Axis:??Horizontal Scaling, Read Replicas, Services, and Data Replications. This is the cheapest solution.
  2. Y-Axis:?Scaling is done by splitting services, Functions, and Methods. It is costlier than X-axis because compute/Storage is vertically scaling to achieve high performance and low response( CPU to GPU, SSD- Storage)
  3. Z-axis:?It is defined for keeping customer location and requests as the center point. It is the costliest solution.


??

?

'Fractal Tree Indexing'?over Binary Tree Data structure a solution for eliminating I/O ops(Disk Seek):

?

Here comes the fantastic concept of?'Fractal Tree Indexing'?of Tokutek.?Over the traditional 'Binary-Tree' algorithm for reading and writing data at B-tree leaf nodes (at disk). I like this algorithm for two remarkable aspects:-

  1. Write latency problem is solved.
  2. Data consistency ( as soon as data is written, read data access pattern will be consistent)

The database engine of MySQL that is TokuDB is built using 'Fractal tree indexing.' MongoDB is using TokuMx as DB engine, which is also using 'Fractal Tree Indexing.'

Before diving deep, let's start with the?Binary Tree?data structure used to store large data blocks (RAM/Disk). Beyond RAM/Main memory, data need to be inserted into the disk as B-tree leaf nodes. The Fractal Tree is like a Binary tree in implementation. Additionally, each internal node of Fractal Tree Indexing has a 'buffer.'?

  • Write DataOps:?Temporarily, data related to upsert(update/insert) are stored in these buffers. Once the buffer of a node is filled, data is flushed to its child node buffer. Later buffer data is replicated/written to the leaf node(disk), and hence, there will not be latency while writing data. During a power outage or failure event, the buffers' data is serialized to disk, so messages in the internal node's buffer will not be lost.
  • Read DataOps:??The data consistency is also being maintained during the read operations. Like Binary tree, Fractal Tree indexing algorithm follows the same query path from the root node to the leaf node. Hence, each query for reading, writing, or updating will be aware of the current data state.

?

The above explains that the Fractal Tree indexing can reduce disk I/O operations, and 'disk seeks' is one of the latency's critical aspects.

?

Conclusion:??Global and scalable data-intensive designs sometimes add latency to get the value of consistency and availability. As per the CAP (Consistency, Availability, and Partition tolerance) theorem, at the most, two dimensions can be achieved, whereas we may either compromise or will trade-off for one aspect. By reducing disk seeks (I/O), having an edge and CDN solution, apply caching across layers, having a cache eviction policy ( cache hit), right sharding approach, one can build a low latency solution. It is essential to calculate headroom* for each component of the architected system so that the seasonality (peak/headroom time) trend can be projected for a website or service. The key factors that will be adding latency should be identified, and one should derive related solutions to minimize the latency.


References: The Art of Scalability?

?

要查看或添加评论,请登录

Nagendra Sharma的更多文章

社区洞察

其他会员也浏览了