Scaling from zero to millions of users - Database replication
In the previous chapter, we talked a lot about load balancers, strategies and how, using a LB we can add more server nodes.
For this chapter, we are getting back to databases, specially database replication. With this topic, we're looking to bring to our solution a better performance, reliability and availability. And some more complexity, sorry about that, as trade-off.
Let’s go for it!
--
From where we stopped, this is the current state of our application (Image 1).
Our current architecture (Image 1) relies on a single database server. With three application servers now in place, this single point of failure and potential bottleneck needs to be addressed.
So, the big question here is: What is Database Replication?
Before moving on with this topic, I wanted to let you know that the words "master" and "slave" here are not ideal nor appreciated by the author. They're just the common terms used by the whole industry. For the sake of my mind, I'll call it main and replicas.
Imagine making copies of your important files and storing them in different places. That's essentially what database replication does. It creates copies of your database and keeps them synchronized.
A common way to do this is with a "master/slave" setup. Think of this "master" as the original, main database. It's the only one that accepts changes (like adding new data, updating existing data, or deleting data). The "slaves" are the replicas of the master. They only allow reading data.
Most of the time, people read data much more often than they change it. So, it's common to have more replica databases than main databases. (Image 2)
So, why using database replication can be helpful:
领英推荐
Getting back to reliability, what would happen if one of the databases went offline?
Remember how load balancers help keep your website running even if a server goes down? Database replication does something similar for your data. Check again Image 2.
For a replica failure: If you have multiple replicas, the system simply uses the other healthy ones. If you only have one replica, the system can temporarily read directly from the main. Then, a new replica is created to replace the broken one.
For a main db failure: One of the replicas is promoted to become the new main. This is a bit more complicated because the replica might not have the latest changes. Some extra steps might be needed to make sure everything is up-to-date. There are more advanced ways to handle this, but they are more complex and we won't cover them here.
So, this is how all of this fit together (Image 3):
Sounds better this way, huh?
But we can get faster. For next chapter, we'll talk about using a cache to store frequently used data and using a Content Delivery Network (CDN) to deliver static files like images and videos.
See you in a few days!
Previous chapter: