Achieving High Availability in PostgreSQL
Avinash Vallarapu (Avi)
Technology Leader | Open Source Databases | AI | MLOPS
High availability (HA) is one of the important requirement for database systems, especially in environments where downtime can impact customer experience or result in significant operational or financial loss. PostgreSQL is undoubtedly a popular (db-engine popularity ranking) and robust database loved by developers (stack overflow survey) and small to large scale Enterprises across the planet. There could be a misconception that Open-Source databases do not support HA which is proven wrong by its built-in features and its ecosystem of Open Source HA tools. In this article, we shall explore the various strategies and tools to achieve high availability in PostgreSQL eliminating the need for commercial tools.
What is High Availability ?
High availability refers to a system's ability to remain accessible and functional even during failures or maintenance operations. In database systems, achieving HA involves ensuring minimal downtime and maintaining data consistency in the event of a server crashe, network outage, or a hardware failure.
Replication is the first step for High Availability
Replication is a core component of high availability in PostgreSQL. It not only involves maintaining copies of the same data on one or more standby (or replica) servers, but also ensures that the database changes are continuously applied to the standby servers with zero or minimal lag.
Replication methods supported by PostgreSQL
PostgreSQL supports multiple replication methods.
Streaming replication
Streaming replication is a real-time replication mechanism where the primary database continuously streams its Write-Ahead Logs (WAL) to one or more standby servers. WAL segments in PostgreSQL can be related to Redo logs in Oracle and Transaction logs in SQL Server.
There are 2 simple techniques within Streaming replication, to ensure minimal or zero data loss.
The PostgreSQL primary server does not wait for standby servers to confirm receiving the data, offering better performance but with a risk of some data loss in the event of a failure. This may be related to the "Maximum Performance" mode in Oracle.
The PostgreSQL primary server waits for at least one standby to confirm receiving or applying the data, ensuring zero data loss. However, a client is sent a success message for its transaction only when one of the standby sends the acknoledgement. This, this may introduce some latency. We can relate this technique to the "Maximum Availability" mode in Oracle.
领英推荐
Logical Replication
Logical replication provides fine-grained control over data replication by replicating specific (but not all) tables or a subset of data. It is ideal for use cases like upgrading PostgreSQL versions with minimal downtime or replicating data from multiple OLTP databases to an OLAP database.
What replication method is better for High Availability ?
Streaming replication is generally the most fastest and robust method for High Availability. This is because, it involves a byte-by-byte replication at the storage layer. Whereas the Logical replication involves decoding the WAL segments and sending over the changes to be applied on the Subscriber (or replica).
If you are a first-time PostgreSQL user or someone truly worried about HA, do not think of anything else other than Streaming replication.
How about Automatic Failover ? How can we ensure several nines of availability ?
There can be several reasons for a database outage. However, enabling the ability to immediately failover the database and keep it up and running, is the primary demand from Infrastruture teams. The availability is measured in the form of nine of availability. The concept of nines of availability is based on Service Level Agreements (SLAs). It indicates how much time a service is expected to be up and running without unplanned outages.
In commercial databases, we may need special licenses aka Enterprise License to achieve this functionality of HA with automatic failover.
Do not worry. You got it covered. Following are some of the Open-Source tools supporting Automatic failover for PostgreSQL.
While there are a number of Open-Source tools supporting automatic failover for PostgreSQL, before you explore anything else, try Patroni. Patroni is a cluster management tool that uses etcd, ZooKeeper, or Consul for distributed consensus. It monitors the PostgreSQL instances and automatically promotes a new primary in case of failure.
Conclusion
PostgreSQL includes several built-in functionalities to support Enterprise requirements such as High Availability. If you are starting to think of enabling HA, start from Streaming replication, which is the fundamental requirement behind most of these Open-Source HA tools supporting automatic failover in PostgreSQL. At the same time, tools like Patroni, or others are only famous orchestrations or templates providing proven solutions for automatic failover. You can always build your custom scripts to orchestrate the automatic failover from your application. In case you are looking to simplify migrations to PostgreSQL, please explore Ora2Pg, authored and maintained by HexaCluster.