Design A Highly Available PostgreSQL Cluster With Patroni In GCP
Patroni is the new age HA solution for PostgreSQL with cloud-native features and advanced options for failover and failback. PostgreSQL is one of the world top open-source databases but the worst part is, it doesn’t have inbuilt automatic failover. Still, I remember my old days, fighting with repmgr for HA and its very hard for me to add the failed master back to the cluster. But later pg_rewind made that process more simple. The world is moving very fast to adopt the cloud. So some vintage HA solutions like DRBD, corosync + pacemaker, repmgr, and a few other technologies are out of date.
Problems in traditional replication:
- The default replication mechanism will not support the Failover.
- Disk-based replication, sometimes ends with data corruption.
- Using external tools for failover may need additional effort to keep them up and running and monitoring.
- Automatically adding the failed node back to the cluster, its a nightmare (but still scripting geeks can do this).
- Handle the Split Brain situation.
The Bot approach:
I learned this concept from a conference by Zalando. PostgreSQL will take care of its own process, but we need something to monitor the PostgreSQL service and its replication status in distributed systems like etcd, zookeeper or Consul. But PostgreSQL can’t to DCS directly right? So if PostgreSQL went down, then the bot will start electing a new master. Also if the old master came up, then the bot will add them back to the cluster. Here BOT refers to the Patroni. Patroni is the successor of compose governor.
The Patroni:
- Its developed by Zalando.
- Completely Python based and 100% Open source.
- Maintain the cluster status in DCS(etcd, zookeeper, and Consul).
- Rest API — get the state of the node and etc.
- While adding a node to the replication, we can use the custom method to archive this(barman, Wal-E, your own scripts).
- You can prevent some nodes(the nodes that you want only for reporting)to become a master.
- You can decide, while adding a new node to the replication, from where to sync(from which node) the data.
- Hooks — Trigger some actions once the bootstrap or when PostgreSQL starts, stops, Failover.
- Manual switchover during the maintenance window and integration with HA proxy.
- And a lot more features.
READ MORE ABOUT THE SOLUTION AND IMPLEMENTATION FROM MY BLOG:
Data & Digital Architect | Consultant
1 年Bhuvanesh, thanks for sharing!
Database Specialist at Publicis.Sapient
2 年can you share the url of the blog which has details on implementation and solution?