登录查看更多内容

Distributing SQL Databases Globally

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

发布日期: 2023年2月11日

+ 关注

There are several reasons, why you would want to distributed your database.

Keep your data closer to your customers. Lower network latency.
Compliance with policies such as GDPR. Some countries have restrictions around movement and storage of data outside their geographic boundaries.
Higher availability and fault tolerance. Such that an incident in one geographical location does not impact customers residing in other part of the world.

Distributing a stateless application is easy, but distribution of your state (data layer) is challenging. There are several ways you can approach distributing your database. Each way has its own pros and cons and understanding them is critical. In this article lets look at some ways you can Distribute your database across several geographical location.

Lets take a Pizza delivery Service as an example.

Approach 1: Scaling and Distributing with read replicas

No alt text provided for this image — Scale with read replicas

In this approach you have a single primary database and several read replicas.
The primary database is responsible for serving all the writes (globally). As you can see in the diagram above, all the "place order" requests are still going to the primary database.
The read replicas are globally distributed and each replica serves "reads" for a specific region. In the above ex. DE, Indian and UK are different regions for example.
These read replicas are simple to spin up and can significantly improve the read latency, because they are closer to the customer in a specific region.

However there are some limitations and challenges:

Writes are still slow and are served by a single node. So if this node goes down, your customer can't place any order.
Read replicas are kept in sync with the primary using async replication strategies. Async means they will have some lag.
For ex. your customer from India places and order. This order is stored on the primary node. However the read replica in India does not have the information yet, because replication is slower. Now if the customer wants to look at their order, it is possible that they might NOT see the order, which is a terrible UX.
Stale reads is a huge problem for many applications, so we need to be careful.
Your orders are still going outside the country, which might have some compliance complications.

Advantages of this approach:

Easy setup, and easy to spin up new replicas.
Easy to understand
Works nicely for read heavy systems where strong read consistency isn't a critical requirement.

Approach 2: Deploying separate database per region

This approach is pretty straightforward, you just deploy a new database (shard) which serves a single region.

So as you can see in the above diagram, each region India, Germany and UK has their own database instance. Customers from respective regions are served from the local (to the country) database.

So when a customer creates an order, you route their request to a region specific app server which talks to a region specific database and everything is cleanly separated.

This is a simple model to understand and gives you an ability to scale each database (vertically, or with read replicas) separately as per needs. For ex: if your Pizza is more popular in India you can vertically scale the Indian database and other databases need no changes.

It is simple approach, but comes with some limitations and problems:

领英推荐

How do indexes make databases read faster?

Arpit Bhayani 2 年前

Partitioning Schemes in Databases Part-1 | Primary…

Saurav Prateek 2 年前

7 Proven Techniques to Optimize Database Performance…

Saurabh Kumar Verma 5 个月前

This is costly and takes heavy lifting whenever you want to launch your app in a new region.
You no longer have a single database for your entire application, which means you cannot query them all together for running some analytical queries. You need a different solution that gives you federated access to all the databases.

Approach 3: Using Natively Distributed Database (Cloud)

There are databases solutions available that support Geo-Distribution natively. So it makes a lot of sense to leverage their capabilities for a Geo-Distributed application. You just need to model your data keeping in mind the distribution of data. So as you can see in the above diagram, you have your Pizza orders table.

Each row (order) has a region column (it can be anything like country code, location, country name etc)
Each region maps to a database partition which is deployed in that region.
So orders from India are routed to the Database partition deployed in India. DE goes to DE partition and similarly UK goes to the UK partition.
This is still a single database with partitions serving reads and writes for a specific region.

How is it different?

Because the database support distribution natively YOU as an application developer OR infra person don't need to worry about the distribution, deployment, query latency, unified access to all regions and so on.
The heavy lifting is done by the database and you can focus on your area of expertise which is innovating in Pizzas and giving your customers an amazing experience.

There are many Distributed SQL databases available in the market, but the leading ones are Google Spanner, Yugabyte DB and CockroachDB.

Here is a sneak peak of how your Distributed Application would look like:

To know more about Designing Geo Distributed applications you can head over to the podcast I did with Denis Magda from Yugabyte, where we have discussed everything you need to know about the various strategies.

I hope you enjoyed learning the article. Stay tuned! Subscribe to the new letter and The GeekNarrator youtube channel.

Cheers,

The GeekNarrator

5,612 位关注者

Andrei Lepikhov

Ph.D | PostgreSQL Contributor | Software Developer | Tech Blogger

1 年

Good for reading. Keeping silent on distributed processing and load balancing, the author just tells us that this profit can’t be achieved, doesn’t it?

Minh Hoang

?? Head of Quality Engineer at GiaoHangNhanh (GHN) | ?? Quality Transformation | ?? Foster Quality Culture | ?? Provide Testing Solution

2 年

Phu Nguyen Khoa Nguyen Tr.

Kyle Bustard

Founder & Software Engineer | AI & Web App Development

2 年

Terrific post! For some reason, I had a silly image come to mind about distributed pizza gone wrong ??: Pepperoni being sent to one region, cheese to another, sauce to another, etc...

1 次回应

Denis Magda

Software Engineering and Developer Relations, MLP Launchpad

2 年

Kaivalya Apte you nailed it! I couldn’t put it better. Exceptional summary of our conversation ?? Btw, if anyone is interested to learn more about high-availability of geo-apps, then welcome to a SpringOne stream next week: https://tanzu-dev-portal.netlify.app/developer/tv/golden-path/5/ And I remember that I still owe you an episode about indexes! On my to-do list.

2 次回应

Ananda Kumar Santhinathan

Data enthusiast

2 年

Excellent, Kaivalya Apte. The post reminds me of the traditional techniques such as mirroring, snap-shotting, log-shipping that were used for the same purposes: high-availability, serving faster reads for reporting applications, etc. All these involved a lot of manual work which the modern databases on the cloud have done away with. In Approach #2, where we have a separate database for each region, the other downside is 'making changes'. To keep the structure, schema of all databases in sync is a challenge. Keeping them different to accommodate localization is yet another challenge too. ????

2 次回应

查看更多评论

要查看或添加评论，请登录

Kaivalya Apte的更多文章

Crash course on JVM Memory Management:

2025年1月13日

Crash course on JVM Memory Management:

JVM provides automatic memory management. In languages like C, developers manage memory explicitly using functions like…
Cassandra 5.0 : ACID Transactions, Vector Search and much more

2023年5月18日

Cassandra 5.0 : ACID Transactions, Vector Search and much more

Introduction In a recent podcast discussion with Patrick McFadin, VP of Developer Relations at DataStax, we delved into…

5 条评论
Capture the Change, you want to see in the database.

2023年3月11日

Capture the Change, you want to see in the database.

Change Data Capture As the name suggests, it is about capturing changes to your data. By capturing, I mean reacting to…

14 条评论
Be Resilient - Humans and Servers

2023年3月4日

Be Resilient - Humans and Servers

What is Resiliency? Ability to recover quickly to a “normal” working state from degradations/problems/failures. Like in…

4 条评论
Functional Programming on the JVM

2023年2月25日

Functional Programming on the JVM

Functional Programming is a programming paradigm which involves breaking down a "giant" functionality into smaller…

3 条评论
Why is DynamoDB AWSome?

2023年2月2日

Why is DynamoDB AWSome?

What is DynamoDB? A cloud NoSQL database service that guarantees consistent performance at any scale. Consistent…

4 条评论
Designing Instagram, Linkedin, Facebook like applications

2023年1月29日

Designing Instagram, Linkedin, Facebook like applications

Hey Everyone, Welcome to the first article of The GeekNarrator newsletter, I am excited to start this newsletter along…
5 things I learned from Hack-Week

2018年9月9日

5 things I learned from Hack-Week

Last few days we spent on hacking things and building something interesting, useful and which could not be done as part…
Readable Code : Just like a fairy tale

2017年5月31日

Readable Code : Just like a fairy tale

Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no…

7 条评论
Work smart(less), achieve big. After all you are an engineer.

2016年7月14日

Work smart(less), achieve big. After all you are an engineer.

In the current world, where IT industry is booming, people often miss the bigger picture.Everyone is running, to…

See all articles

Distributing SQL Databases Globally

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

Approach 1: Scaling and Distributing with read replicas

Approach 2: Deploying separate database per region

领英推荐

Approach 3: Using Natively Distributed Database (Cloud)

The GeekNarrator

5,612 位关注者

Kaivalya Apte的更多文章

社区洞察

其他会员也浏览了

Difference between local and global indexes in DynamoDB

MariaDB Administration Cheat Sheet: Practical Use Cases and Examples

Boost Your SQL Skills With Essential Aggregate Functions In Azure

Triggers in SQL database to extend its declarative data logic (but not business logic)

An Approach to Database Fine-Grained Access Controls

The Hidden Challenge of PostgreSQL Partitioning

Understanding Database Sharding

Mastering DynamoDB Denormalization Techniques

Managing Performance in Azure SQL: Strategies for Handling Skewed Indexes

Approach 1: Scaling and Distributing with read replicas

Approach 2: Deploying separate database per region

领英推荐

Approach 3: Using Natively Distributed Database (Cloud)

The GeekNarrator

5,612 位关注者

Kaivalya Apte的更多文章

Crash course on JVM Memory Management:

Cassandra 5.0 : ACID Transactions, Vector Search and much more

Capture the Change, you want to see in the database.

Be Resilient - Humans and Servers

Functional Programming on the JVM

Why is DynamoDB AWSome?

Designing Instagram, Linkedin, Facebook like applications

5 things I learned from Hack-Week

Readable Code : Just like a fairy tale

Work smart(less), achieve big. After all you are an engineer.

社区洞察

其他会员也浏览了

Difference between local and global indexes in DynamoDB

MariaDB Administration Cheat Sheet: Practical Use Cases and Examples

Boost Your SQL Skills With Essential Aggregate Functions In Azure

Triggers in SQL database to extend its declarative data logic (but not business logic)

An Approach to Database Fine-Grained Access Controls

The Hidden Challenge of PostgreSQL Partitioning

Understanding Database Sharding

Mastering DynamoDB Denormalization Techniques

Managing Performance in Azure SQL: Strategies for Handling Skewed Indexes