Techniques to Scale up a database
Vijaya Lakshmi V
SWE at Target || Ex-Intern, Ignite+ Trainee at Target || Open for collaboration
Database Scaling :
??? With millions of users trying to access database expecting to receive results in milliseconds, scaling the database to serve these millions of requests simultaneously is crucial without performance degradation.
??? We can scale the database in two ways :
?? Vertically Scalable : Vertical scaling(aka scaling up) refers to adding?more resources(CPU, memory, disc space) to an existing database.
?? Horizontally Scalable: Horizontal scaling (aka scaling out) refers to adding additional?databases to the system to cope up with new demands.
?? There’s a limit on how much we can scale vertically due to cost limitations, so horizontal scaling is preferred at certain point when handling millions of users.
?? For more detailed explanation on scaling, check out my article scalability
??? Techniques to scale a database:
?? Master-Master Replication :
In master-master replication there are two active databases(master nodes) which operate parallely synchronizing write operations between them.
Read and write load will be uniformly distributed among the nodes. However, the write operation needs to be carefully synchronised so that the system wouldn't be in inconsistent state.
This kind of system has smooth failover handling; since if either of the master fails, the system will continue to operate with the existing master.
?? There is a potential for loss of data if the master fails before any newly written data can be replicated to the other master.
?? Master-slave Replication :
Master-slave architectures are used to help stabilize a system.
In master-slave replication master is the true data keeper while a slave is a replication of master.
The slave database serves as the backup for the master database.
All the writes and updates are handled by the master, while the reads are distributed among slaves, which improves the performance of the system drastically by leveraging slaves for read operations.
The system continues to operate even when master goes down, as one of the slave will be promoted as the new master.
?? All writes are made by master which could degrade performance in case of huge simultaneous writes. There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
?? Fedaration :
Federation or functional partitioning is the process of splitting the database based on the functionality.
For example, instead of having one monolithic database for our application, we can have three databases each dedicated to handling different functionality say;?payments,?order?and?profiles, resulting in less read and write traffic on each database hence improving overall performance.
The latency also impoves due to parallel read-write operations.
?? Federation isn’t effective if the application has huge number of functions(or tables) and since most of the operations require accumulation of data from two or more tables, performing operations are costly and become complex overtime.
领英推荐
?? Sharding :
Database sharding is the process of storing a large database across multiple machines.
A single database server, can store and process only a limited amount of data. Database sharding overcomes this limitation by splitting data into smaller chunks(a.k.a partitions or shards) and storing them across several database servers.?
Similar to federation, sharding promotes traffic distribution, so the load is reduced on the database, less replication and drastically improves latency, and more cache hits.
Writes and reads can be done parallel, thus improving throughput.
?? Bonus Read : ??????
The system in which read operations heavily outnumbers write operations (say 100:1) is a read-heavy system. Moreoften, read-operations requires accumulated results from two or more databases, for which join operations are required.
However, join operations maybe costly, complex and time consuming thereby degrading the performance of the database.This could be resolved with either:
?? Caching :
Caching results of most common queries in the hope to reduce join operations.
But before equipping a cache into the system a strong analysis of the following questions is required :
1. How often does the data change?
2. Will out-of-date data really hurt??
3. How much data are we talking about? (Cache size is limited, so we may have to come up with the most suitable cache-eviction policies for our system)
?? Denormalization :
?????????????????????????????? is the technique of combining the data from to or more tables into a big single table. ( It's a way to optimize databases)
?? This definitely leads to redundancy in the data.
Denormalization attempts to improve performance of reads at the expense of writes performance.
Advantages:
Disadvantages:
?? For more detailed explanation on denomalization, check out my post Normalization vs Denormalization
?? Follow Vijaya Lakshmi V for more such articles, posts, blogs.
Associate Technical Support Engineer @ Salesforce || Trialhead Ranger
2 年Good one!
Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022
2 年Informative share ??
Leetcode (Guardian- 2295 rating Top 0.64% ) |Codechef 5* ( 2002 rating ) | Codeforces Specialist ( 1549 rating ) | Data structures,Algorithms and Competitive Programming.
2 年That is well written, I understood most of the things ??. Thanks Vijaya Lakshmi !!