Foundations Of Highly Available System Design - Data Replication And Replication Strategies
What is data replication?
Database replication is the process of copying data from a source(Leader) database server to one or more target(Follower) database servers. It involves the frequent copying or streaming of data from a database server to another database server so all users have access to synced data.
Let's take an example of single leader - followers data replication model -
Why do we need Data Replication?
Suppose your main database server is running in USA and the user wants to read data all the way from India which will take a lot of time here. So what you can do is you can keep follower replica of data in India i.e. keeping data close to geographical users which will provide results fast hence reducing latency.
4. Support for real-time analytics - data replication is a continuous real-time process, it allows businesses to get immediate insights from their data. A simple example is populating a dashboard. A more advanced example is using data replication to pull user behavioral data from various data sources to analytical data stores, then running a predictive model to provide real-time personalized recommendations to improve the customer experience.
What is the most challenging part of Data replication?
Maintaining Consistency of data(Same copy) among Leader and Followers is the challenging problem here. For this we have various consistency models each having its own trade-offs.
What is the trade off here?
Here the system will offer high consistency, if there is read request then the data returned as response will be the latest one but availability will go down as it increases the waiting time.
领英推荐
Example - Banking applications where consistency of transactions is really important.
What is the trade off here?
Here the write request will be performed on leader and OK response will be immediately returned hence offering high availability. Suppose a read request is performed on a follower, it may return stale data hence decreasing the data consistency.
Note - Given enough time, the data will eventually become consistent across all followers in the system, so after one point the data b/w leader-followers will be synced.
Example - Like and view count in applications such as YouTube Twitter Instagram , where availability is more important than immediately returning exact count of views and likes.
This hybrid model will provide fair availability as well as consistency(strong consistency > immediate wait time and data consistency > eventual consistency).
If you find the article useful and want more such articles than subscribe the newsletter and follow?Kartik Sapra
Cheers
Kartik Sapra
Frontend Developer @NeetAdvisor | Ex Senior Software Developer at @Encrobytes | Mern Stack developer | ReactJs | Javascript | MongoDb | NodeJs | Redux
2 年Reach++ , awesome insights
SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations
2 年#linkedinfamily #linkedinconnections
SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations
2 年Prateek Narang bhaiya
SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations
2 年Megha Arora ji