Distributing SQL Databases Globally

Distributing SQL Databases Globally

There are several reasons, why you would want to distributed your database.

  • Keep your data closer to your customers. Lower network latency.
  • Compliance with policies such as GDPR. Some countries have restrictions around movement and storage of data outside their geographic boundaries.
  • Higher availability and fault tolerance. Such that an incident in one geographical location does not impact customers residing in other part of the world.

Distributing a stateless application is easy, but distribution of your state (data layer) is challenging. There are several ways you can approach distributing your database. Each way has its own pros and cons and understanding them is critical. In this article lets look at some ways you can Distribute your database across several geographical location.

Lets take a Pizza delivery Service as an example.

Approach 1: Scaling and Distributing with read replicas

No alt text provided for this image
Scale with read replicas

  • In this approach you have a single primary database and several read replicas.
  • The primary database is responsible for serving all the writes (globally). As you can see in the diagram above, all the "place order" requests are still going to the primary database.
  • The read replicas are globally distributed and each replica serves "reads" for a specific region. In the above ex. DE, Indian and UK are different regions for example.
  • These read replicas are simple to spin up and can significantly improve the read latency, because they are closer to the customer in a specific region.

However there are some limitations and challenges:

  • Writes are still slow and are served by a single node. So if this node goes down, your customer can't place any order.
  • Read replicas are kept in sync with the primary using async replication strategies. Async means they will have some lag.
  • For ex. your customer from India places and order. This order is stored on the primary node. However the read replica in India does not have the information yet, because replication is slower. Now if the customer wants to look at their order, it is possible that they might NOT see the order, which is a terrible UX.
  • Stale reads is a huge problem for many applications, so we need to be careful.
  • Your orders are still going outside the country, which might have some compliance complications.

Advantages of this approach:

  • Easy setup, and easy to spin up new replicas.
  • Easy to understand
  • Works nicely for read heavy systems where strong read consistency isn't a critical requirement.

Approach 2: Deploying separate database per region

No alt text provided for this image
Separate database per country

This approach is pretty straightforward, you just deploy a new database (shard) which serves a single region.

So as you can see in the above diagram, each region India, Germany and UK has their own database instance. Customers from respective regions are served from the local (to the country) database.

So when a customer creates an order, you route their request to a region specific app server which talks to a region specific database and everything is cleanly separated.

This is a simple model to understand and gives you an ability to scale each database (vertically, or with read replicas) separately as per needs. For ex: if your Pizza is more popular in India you can vertically scale the Indian database and other databases need no changes.

It is simple approach, but comes with some limitations and problems:

  • This is costly and takes heavy lifting whenever you want to launch your app in a new region.
  • You no longer have a single database for your entire application, which means you cannot query them all together for running some analytical queries. You need a different solution that gives you federated access to all the databases.

Approach 3: Using Natively Distributed Database (Cloud)

No alt text provided for this image
Natively Distributed Database

There are databases solutions available that support Geo-Distribution natively. So it makes a lot of sense to leverage their capabilities for a Geo-Distributed application. You just need to model your data keeping in mind the distribution of data. So as you can see in the above diagram, you have your Pizza orders table.

  • Each row (order) has a region column (it can be anything like country code, location, country name etc)
  • Each region maps to a database partition which is deployed in that region.
  • So orders from India are routed to the Database partition deployed in India. DE goes to DE partition and similarly UK goes to the UK partition.
  • This is still a single database with partitions serving reads and writes for a specific region.

How is it different?

  • Because the database support distribution natively YOU as an application developer OR infra person don't need to worry about the distribution, deployment, query latency, unified access to all regions and so on.
  • The heavy lifting is done by the database and you can focus on your area of expertise which is innovating in Pizzas and giving your customers an amazing experience.

There are many Distributed SQL databases available in the market, but the leading ones are Google Spanner, Yugabyte DB and CockroachDB.

Here is a sneak peak of how your Distributed Application would look like:

No alt text provided for this image

To know more about Designing Geo Distributed applications you can head over to the podcast I did with Denis Magda from Yugabyte, where we have discussed everything you need to know about the various strategies.

I hope you enjoyed learning the article. Stay tuned! Subscribe to the new letter and The GeekNarrator youtube channel.


Cheers,

The GeekNarrator


Andrei Lepikhov

Ph.D | PostgreSQL Contributor | Software Developer | Tech Blogger

1 年

Good for reading. Keeping silent on distributed processing and load balancing, the author just tells us that this profit can’t be achieved, doesn’t it?

回复
Minh Hoang

?? Head of Quality Engineer at GiaoHangNhanh (GHN) | ?? Quality Transformation | ?? Foster Quality Culture | ?? Provide Testing Solution

2 年

Phu Nguyen Khoa Nguyen Tr.

回复
Kyle Bustard

Founder & Software Engineer | AI & Web App Development

2 年

Terrific post! For some reason, I had a silly image come to mind about distributed pizza gone wrong ??: Pepperoni being sent to one region, cheese to another, sauce to another, etc...

Denis Magda

Software Engineering and Developer Relations, MLP Launchpad

2 年

Kaivalya Apte you nailed it! I couldn’t put it better. Exceptional summary of our conversation ?? Btw, if anyone is interested to learn more about high-availability of geo-apps, then welcome to a SpringOne stream next week: https://tanzu-dev-portal.netlify.app/developer/tv/golden-path/5/ And I remember that I still owe you an episode about indexes! On my to-do list.

Excellent, Kaivalya Apte. The post reminds me of the traditional techniques such as mirroring, snap-shotting, log-shipping that were used for the same purposes: high-availability, serving faster reads for reporting applications, etc. All these involved a lot of manual work which the modern databases on the cloud have done away with. In Approach #2, where we have a separate database for each region, the other downside is 'making changes'. To keep the structure, schema of all databases in sync is a challenge. Keeping them different to accommodate localization is yet another challenge too. ????

要查看或添加评论,请登录

Kaivalya Apte的更多文章

  • Crash course on JVM Memory Management:

    Crash course on JVM Memory Management:

    JVM provides automatic memory management. In languages like C, developers manage memory explicitly using functions like…

  • Cassandra 5.0 : ACID Transactions, Vector Search and much more

    Cassandra 5.0 : ACID Transactions, Vector Search and much more

    Introduction In a recent podcast discussion with Patrick McFadin, VP of Developer Relations at DataStax, we delved into…

    5 条评论
  • Capture the Change, you want to see in the database.

    Capture the Change, you want to see in the database.

    Change Data Capture As the name suggests, it is about capturing changes to your data. By capturing, I mean reacting to…

    14 条评论
  • Be Resilient - Humans and Servers

    Be Resilient - Humans and Servers

    What is Resiliency? Ability to recover quickly to a “normal” working state from degradations/problems/failures. Like in…

    4 条评论
  • Functional Programming on the JVM

    Functional Programming on the JVM

    Functional Programming is a programming paradigm which involves breaking down a "giant" functionality into smaller…

    3 条评论
  • Why is DynamoDB AWSome?

    Why is DynamoDB AWSome?

    What is DynamoDB? A cloud NoSQL database service that guarantees consistent performance at any scale. Consistent…

    4 条评论
  • Designing Instagram, Linkedin, Facebook like applications

    Designing Instagram, Linkedin, Facebook like applications

    Hey Everyone, Welcome to the first article of The GeekNarrator newsletter, I am excited to start this newsletter along…

  • 5 things I learned from Hack-Week

    5 things I learned from Hack-Week

    Last few days we spent on hacking things and building something interesting, useful and which could not be done as part…

  • Readable Code : Just like a fairy tale

    Readable Code : Just like a fairy tale

    Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no…

    7 条评论
  • Work smart(less), achieve big. After all you are an engineer.

    Work smart(less), achieve big. After all you are an engineer.

    In the current world, where IT industry is booming, people often miss the bigger picture.Everyone is running, to…

社区洞察

其他会员也浏览了