登录查看更多内容

Foundations Of Highly Available System Design - Data Replication And Replication Strategies

Kartik S.

SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations

发布日期: 2023年3月3日

What is data replication?

Database replication is the process of copying data from a source(Leader) database server to one or more target(Follower) database servers. It involves the frequent copying or streaming of data from a database server to another database server so all users have access to synced data.

No alt text provided for this image — Leader - Follower Data replication

Let's take an example of single leader - followers data replication model -

In this model Write request is performed on Leader and Read request can be performed on leader/followers.
The data is synced with followers to make the data consistent b/w Leader and Followers on the basis of different consistency models discussed ahead.

Why do we need Data Replication?

High availability, fault tolerance and disaster recovery - Suppose there is no replication followed in a system, and the database instance outage happens, the system will be left inaccessible. Now multiple copies will provide fault tolerance so that if one fails, then other can take its place ensuring high availability.
Even Load Distribution and increased read throughput - Generally the systems are read heavy, which means there will be more read requests than write requests. Here data replication ensures even load distribution to plural follower instances and hence increases the read throughput of the system.
Reduced Latency - Lets take an example to understand this advantage.

Suppose your main database server is running in USA and the user wants to read data all the way from India which will take a lot of time here. So what you can do is you can keep follower replica of data in India i.e. keeping data close to geographical users which will provide results fast hence reducing latency.

4. Support for real-time analytics - data replication is a continuous real-time process, it allows businesses to get immediate insights from their data. A simple example is populating a dashboard. A more advanced example is using data replication to pull user behavioral data from various data sources to analytical data stores, then running a predictive model to provide real-time personalized recommendations to improve the customer experience.

What is the most challenging part of Data replication?

Maintaining Consistency of data(Same copy) among Leader and Followers is the challenging problem here. For this we have various consistency models each having its own trade-offs.

Strong Consistency/Synchronous Replication - With strong consistency, changes/writes made to the data on leader are immediately reflected on followers in the system hence guarantees the up-to date data in followers. Here the request is marked complete only if write in all the followers is complete until then the client has to wait for OK response.

What is the trade off here?

Here the system will offer high consistency, if there is read request then the data returned as response will be the latest one but availability will go down as it increases the waiting time.

领英推荐

Part 2.1: 1960 to 1980 - The Dawn of Computer Systems…

Mohan Kumar 7 个月前

Part 3.1: Data Architecture Advancements in the…

Mohan Kumar 7 个月前

Benchmark Study: The Industry’s Fastest Data…

Integrate.io 1 个月前

Example - Banking applications where consistency of transactions is really important.

Eventual Consistency/Asynchronous Replication - weakest form of consistency. With eventual consistency, changes made to the data on one leader may not be immediately reflected on followers in the system.

What is the trade off here?

Here the write request will be performed on leader and OK response will be immediately returned hence offering high availability. Suppose a read request is performed on a follower, it may return stale data hence decreasing the data consistency.

Note - Given enough time, the data will eventually become consistent across all followers in the system, so after one point the data b/w leader-followers will be synced.

Example - Like and view count in applications such as YouTube Twitter Instagram , where availability is more important than immediately returning exact count of views and likes.

Hybrid Consistency/Synchronous - Asynchronous Replication - hybrid of above two consistency models in which some of the followers are synced with leader writes(preferably those which are located in same data center) and other are left to be synced eventually.

This hybrid model will provide fair availability as well as consistency(strong consistency > immediate wait time and data consistency > eventual consistency).

If you find the article useful and want more such articles than subscribe the newsletter and follow?Kartik Sapra

Cheers

Kartik Sapra

#systemdesign #systemarchitecture #techinterview #faang #maang #google #microsoft #amazon #instagram #socialemedia

System Design For Interviews

7,593 位关注者

Karan S.

2 年

Reach++ , awesome insights

1 次回应

Kartik S.

2 年

#linkedinfamily #linkedinconnections

Kartik S.

2 年

Prateek Narang bhaiya

Kartik S.

2 年

Megha Arora ji

查看更多评论

要查看或添加评论，请登录

Kartik S.的更多文章

System Design Interview - All about Consistent Hashing With Examples

2023年4月1日

System Design Interview - All about Consistent Hashing With Examples

In the last article “Hash based data distribution and Intro to Consistent Hashing” we have seen the need of consistent…

34 条评论
System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

2023年3月19日

System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

In this article, I will be covering following topics - Hash based distribution of data among the database servers…

21 条评论
System Design Interviews - CAP Theorem Made Easy

2023年3月12日

System Design Interviews - CAP Theorem Made Easy

In this article I will try to make CAP theorem easily understandable, we will see - What is C, A, P in CAP theorem?…

17 条评论
Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

2023年2月22日

Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Systems which are 99.999 %(5 9s) available or operational throughout the year are called highly available systems.

15 条评论
Google Summer Of Code Student Application Phase

2020年3月10日

Google Summer Of Code Student Application Phase

As GSoC student application phase is almost there. I would advice all the aspiring students to start working on their…

See all articles

Foundations Of Highly Available System Design - Data Replication And Replication Strategies

Kartik S.

SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations

What is data replication?

Why do we need Data Replication?

What is the most challenging part of Data replication?

领英推荐

System Design For Interviews

7,593 位关注者

Kartik S.的更多文章

社区洞察

其他会员也浏览了

?? Part 2: Connecting the Dots: A Summary of Data Architecture Evolution (1960-1980)

Data Migration

Simplify Your Database Management with Tessell Data Apps

Design Considerations for Large-Scale Data Migrations: Best Practices and Tools

The Crucial Role of Enterprise Data Architecture in Establishing Effective Data Governance

Data Replication in Key-Value Stores: A Deep Dive in System Design

Physical Models: Bridging Logical Design to Database Implementation

Data Migration

Data Architecture

What is data replication?

Why do we need Data Replication?

What is the most challenging part of Data replication?

领英推荐

System Design For Interviews

7,593 位关注者

Kartik S.的更多文章

System Design Interview - All about Consistent Hashing With Examples

System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

System Design Interviews - CAP Theorem Made Easy

Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Google Summer Of Code Student Application Phase

社区洞察

其他会员也浏览了

?? Part 2: Connecting the Dots: A Summary of Data Architecture Evolution (1960-1980)

Data Migration

Simplify Your Database Management with Tessell Data Apps

Design Considerations for Large-Scale Data Migrations: Best Practices and Tools

The Crucial Role of Enterprise Data Architecture in Establishing Effective Data Governance

Data Replication in Key-Value Stores: A Deep Dive in System Design

Physical Models: Bridging Logical Design to Database Implementation

Data Migration

Data Architecture