PostgreSQL Replication: A Detailed Guide
Thiago Azadinho - MBA/OCP/OCE/MCSE
DBA / DBRE / DATABASE ADMINISTRATOR / DEVOPS
As any website owner will tell you, data loss and downtime, even in minimal doses, can be catastrophic. They can strike the unprepared at any time, leading to reduced productivity, accessibility and confidence in the product.
In order to protect the integrity of your website, it is essential to have safeguards against the possibility of downtime or data loss.
That's where data replication comes in.
Data replication is an automated backup process in which your data is repeatedly copied from your main database to another remote location to be kept safe. It is an integral technology for any website or application running a database server . You can also leverage the replicated database to process read-only SQL, allowing more processes to run within the system.
Configuring replication between two databases provides fault tolerance against unexpected setbacks. It is considered the best strategy to achieve high availability during disasters.
In this article, we will dive into the different strategies that can be implemented by backend developers for smooth PostgreSQL replication.
What is PostgreSQL replication?
PostgreSQL replication is defined as the process of copying data from one PostgreSQL database server to another server. The source database server is also known as the “primary” server, while the database server that receives the copied data is known as the “replica” server.
The PostgreSQL database follows a simple replication model, where all writes go to a primary node. The primary node can then apply these changes and transmit them to the secondary node.
What is Automatic Failover?
Failover is a method of recovering data if the primary server fails for any reason. As long as you've configured PostreSQL to manage your physical streaming replication, you – and your users – are protected from downtime due to a primary server snafu.
Note that the failover process may take some time to configure and initiate. There are no built-in tools for monitoring and scanning for server failures in PostgreSQL, so you'll need to get creative.
Fortunately, you don't need to be dependent on PostgreSQL for failover. There are dedicated tools that allow automatic failover and automatic switching to standby mode, reducing database downtime .
By configuring failover replication, you ensure high availability by ensuring that standbys are available if the primary server ever fails.
Benefits of using PostgreSQL replication
Here are some of the key benefits of leveraging PostgreSQL replication:
How PostgreSQL replication works
In general, people believe that when you are dabbling in a primary and secondary architecture, there is only one way to configure backups and replication. PostgreSQL implementations, however, can follow any of these three methods :
Method 1: Streaming
PostgreSQL replication also known as WAL replication can be configured seamlessly after installing PostgreSQL on all servers. This replication approach is based on moving the WAL files from the primary database to the target database.
You can implement PostgreSQL replication through a secondary primary configuration. The primary server is the main instance that handles the primary database and all its operations. The secondary server acts as a supplementary instance and executes all changes made to the primary database on itself, generating an identical copy in the process. The primary server is the read/write server while the secondary server is read-only.
For this method, you need to configure both the primary node and the standby node. The following sections will elucidate the steps involved in setting them up with ease.
Configuring the Primary Node
You can configure the primary node for streaming replication by performing the following steps:
Step 1: Initialize the database
To initialize the database , you can use the utility command initdb. You can then create a new user with replication privileges using the following command:
CREATE USER 'example_username' REPLICATION LOGIN ENCRYPTED PASSWORD 'example_password';
The user will have to provide a password and username for the query in question. The replication keyword is used to give the user the necessary privileges. An example query would look something like this:
CREATE USER 'rep_username' REPLICATION LOGIN ENCRYPTED PASSWORD 'rep_password';
Step 2: Configure Streaming properties
Next, you can configure the streaming properties with the PostgreSQL configuration file ( postgresql.conf ) which can be modified as follows:
wal_level = logical
wal_log_hints = on
max_wal_senders = 8
max_wal_size = 1GB
hot_standby = on
Here's a little background around the parameters used in the previous snippet:
Step 3: Creating new entry
After you have modified the parameters in the postgresql.conf file , a new replication entry in the pg_hba.conf file can allow the servers to establish a connection to each other for replication.
You can usually find this file in the PostgreSQL data directory. You can use the following code snippet for the same:
host replication rep_user IPaddress md5
Once the code snippet is executed, the primary server allows a named user rep_userto connect and act as a standby server, using the specified IP for replication. For example, the IP specified for replication:
host replication rep_user 192.168.0.22/32 md5
Configuring the standby node
To configure the standby node for streaming replication, follow these steps:
Step 1: Backup the primary node
To configure the standby node, use the utility pg_basebackupto generate a backup of the primary node. This will serve as a starting point for the standby node. You can use this utility with the following syntax:
pg_basebackp -D -h -X stream -c fast -U rep_user -W
The parameters used in the syntax mentioned above are as follows:
Step 2: Prepare the replication configuration file
Next, you need to verify that the replication configuration file exists. If it doesn't exist, you can generate the replication configuration file as recovery.conf .
You must create this file in the data directory of your PostgreSQL installation. You can generate it automatically using the option -Rwithin the utility pg_basebackup.
The recovery.conf file must contain the following commands:
standby_mode = 'on' primary_conninfo = 'host=<master host> port=<postgres_port> user=<replication_user> password=<password> application_name="host_name"'' recovery_target_timeline = 'latest' (mais recente)
The parameters used in the above commands are as follows:
To configure a connection, you need to provide the username, IP address, and password as values for the primary_conninfo parameter. For example:
primary_conninfo = 'host=192.168.0.26 port=5432 user=rep_user password=rep_pass'
Step 3: Restart the secondary server
Finally, you can restart the secondary server to complete the configuration process.
However, streaming replication comes with several challenges, such as:
Method 2: Replicated Block Device
The replicated block device method relies on disk mirroring (also known as volume replication). In this approach, changes are written to a persistent volume that is synchronously mirrored to another volume.
The additional benefit of this method is its compatibility and data durability in cloud environments with all relational databases, including PostgreSQL, MySQL, and SQL Server , to name a few.
However, the disk mirroring approach to PostgreSQL replication requires you to replicate both the WAL log and the table data. Since each write to the database now has to go through the network synchronously, you cannot afford to lose a single byte as this could leave your database in a corrupt state.
This method is typically leveraged using Azure PostgreSQL and Amazon RDS.
Method 3: WAL
WAL consists of segment files (16 MB by default). Each segment has one or more records. A log sequence record (LSN) is a pointer to a record in the WAL, letting you know the position/location where the record was saved in the log file.
A standby server uses WAL segments – also known as XLOGS in PostgreSQL terminology – to continually replicate changes from its primary server. You can use write-ahead logging to ensure durability and atomicity in a DBMS by serializing chunks of byte-array data (each with a unique LSN) to stable storage before they are applied to a database.
Applying a mutation to a database can lead to multiple file system operations. A pertinent question that arises is how a database can ensure atomicity in the event of a server failure due to a power outage while it is in the middle of a file system update. When a database starts, it initiates an initialization or replay process that can read the available WAL segments and compare them to the LSN stored in each data page (each data page is marked with the LSN of the last WAL record it affects the page).
Log shipping-based replication (block level)
Streaming replication refines the log shipping process. Unlike waiting for the WAL key, records are sent as they are created, thus decreasing replication delay.
Streaming replication also outperforms log shipping because the standby server connects to the primary server over the network, leveraging a replication protocol. The primary server can then send WAL records directly over this connection without having to rely on scripts provided by the end user.
Log shipping-based replication (file level)
Log shipping is defined as copying log files to another PostgreSQL server to generate another standby server by replaying WAL files. This server is configured to work in recovery mode, and its sole purpose is to apply any new WAL files as they appear.
This secondary server then becomes a hot backup of the primary PostgreSQL server. It can also be configured to be a read replica, where it can offer read-only queries, also called hot standby.
Continuous WAL Archiving
Duplication of WAL files as they are created in any location other than the subdirectory pg_walto archive them is known as WAL archiving. PostgreSQL will call a user-given script for archiving each time a WAL file is created.
领英推荐
The script can leverage the command scpto duplicate the file to one or more locations, such as an NFS mount. Once archived, WAL segment files can be leveraged to recover the database at any point in time.
Other log-based settings include:
WAL Streaming Protocol Details
A process known as the WAL receiver, running on the standby server, takes advantage of the connection details provided in the recovery.conf primary_conninfo parameter and connects to the primary server leveraging a TCP/IP connection.
To start streaming replication, the frontend can send the replication parameter within the initialization message. A boolean value of true, yes, 1, or ON lets the backend know that it needs to enter physical replication walsender mode.
The WAL sender is another process that runs on the primary server and is in charge of sending WAL records to the standby server as they are generated. The WAL receiver saves WAL records to the WAL as if they were created by client activity from locally connected clients.
Once the WAL records arrive in the WAL segment files, the standby server continues to constantly replay the WAL so that the primary and standby are up to date.
Elements of PostgreSQL replication
In this section, you will gain a deeper understanding of the commonly used models (single-master and multi-master replication), types (physical and logical replication), and modes (synchronous and asynchronous) of PostgreSQL replication.
PostgreSQL database replication models
Scalability means adding more resources/hardware to existing nodes to increase the database's ability to store and process more data that can be reached horizontally and vertically. PostgreSQL replication is an example of horizontal scaling that is much more difficult to implement than vertical scaling. We can achieve horizontal scalability mainly by single-master replication (SMR) and multi-master replication (MMR).
Multi-master replication allows data to be modified only on a single node, and these modifications are replicated on one or more nodes. Replicated tables in the replica database are not allowed to accept any changes except those from the primary server. Even if they accept, the changes are not replicated back to the primary server.
Most of the time, SMR is sufficient for the application because it is less complicated to configure and manage with no chances of conflicts. Single-master replication is also unidirectional, since replication data flows in one direction primarily, from the primary database to the replica database.
In some cases, SMR alone may not be enough, and you may need to implement MMR. MMR allows more than one node to act as the primary node. Changes to table rows in more than one designated primary database are replicated to their counterpart tables in all other primary databases. In this model, conflict resolution schemes are often employed to avoid problems such as duplication of primary keys.
There are some advantages to using MMR, namely:
However, the downside of implementing MMR is its complexity and difficulty in resolving conflicts.
Various branches and applications provide MMR solutions as PostgreSQL does not support it natively. These solutions can be open-source , free or paid. One such extension is bidirectional replication (BDR) which is asynchronous and is based on PostgreSQL's logical decoding function.
Because BDR enforcement replays transactions on other nodes, the replay operation may fail if there is a conflict between the transaction being applied and the committed transaction on the receiving node.
Types of PostgreSQL replication
There are two types of PostgreSQL replication: logical and physical replication.
A simple logical initdb operation would perform the physical operation of creating a base directory for a cluster. Likewise, a simple logical operation would perform the physicalCREATE DATABASE ?operation of creating a subdirectory in the base directory.
Physical replication generally deals with files and directories. It doesn't know what these files and directories represent. These methods are used to maintain a complete copy of a single cluster's entire data, typically on another machine, and are done at the file system level or disk level and use exact block addresses.
Logical replication is a way of reproducing data entities and their modifications, based on their replication identity (usually a primary key). Unlike physical replication, it deals with databases, tables, and DML operations and is done at the database cluster level. It uses a publish - subscribe model where one or more subscribers subscribe to one or more publications on a publisher node .
The replication process begins by taking a snapshot of the data in the publisher's database and then copying it to the subscriber. Subscribers pull data from the publications they subscribe to and can re-publish the data later to enable cascading replication or more complex configurations. The subscriber applies data in the same order as the publisher so that transactional consistency is guaranteed for publications within a single subscription, also known as transactional replication.
Typical use cases for logical replication are:
The subscriber database behaves in the same way as any other PostgreSQL instance and can be used as an editor for other databases defining their publications.
When the subscriber is treated as read-only per application, there will be no conflicts from a single signature. On the other hand, if there are other writes made by an application or by other subscribers to the same set of tables, conflicts may arise.
PostgreSQL supports both mechanisms simultaneously. Logical replication allows fine-grained control over both data replication and security.
Replication modes
There are mainly two modes of PostgreSQL replication: synchronous and asynchronous. Synchronous replication allows data to be written to the primary and secondary server at the same time, while asynchronous replication ensures that data is first written to the host and then copied to the secondary server.
In synchronous replication mode, transactions on the primary database are considered complete only when these changes have been replicated to all replicas. The replica servers must all be available at all times for transactions to complete on the primary database. Synchronous mode of replication is used in high-end transactional environments with immediate failover requirements.
In asynchronous mode, transactions on the primary server can be declared complete when changes have been made only on the primary server. These changes are then replicated to replicas later in time. Replica servers can remain out of sync for a certain duration, called replication lag. In the event of a failure, data loss may occur, but the overhead provided by asynchronous replication is small, so it is acceptable in most cases (does not overload the host). Failover from the primary database to the secondary database takes longer than synchronous replication.
How to configure PostgreSQL replication
For this section, we will be demonstrating how to configure the PostgreSQL replication process on a Linux operating system. For this example, we will be using Ubuntu 18.04 LTS and PostgreSQL 10.
Important
Make sure you have installed Ubuntu Linux server. The steps to install PostgreSQL on a Linux server are mentioned below, followed by configuring replication on the primary and secondary server.
Lets go to what matters!
Installation
You will start by installing PostgreSQL on Linux with these steps:
PostgreSQL installation is mandatory for both primary and secondary servers before starting the PostgreSQL replication process.
Once you have configured PostgreSQL on both servers, you can move on to configuring primary and secondary server replication.
Configuring replication on the primary server
Perform these steps once you have installed PostgreSQL on both the primary and secondary servers.
systemctl restart postgresql
You have now completed the configuration on the primary server.
Configuring replication on the secondary server
Follow these steps to configure replication on the secondary server:
systemctl start postgresql
Testing your configuration
Now that we have completed the steps, let's test the replication process and observe the secondary server database. To do this, we create a table on the primary server and see if it is reflected on the secondary server.
Let's do it.
This is the test table output:
| websites |
-------------------
| section.com |
| google.com |
| github.com |
--------------------
You should be able to observe the same data as on the primary server.
If you see the above, then you have successfully carried out the replication process!
What are the failover steps in the PostgreSQL manual?
Let's review the steps for a manual PostgreSQL failover:
If the insert works well, then the standby, previously a read-only server, has been promoted as the new primary server.
How to automate failover in PostgreSQL
Configuring automatic failover is easy.
You will need the EDB PostgreSQL failover manager (EFM). After downloading and installing EFM on each primary and standby node, you can create an EFM Cluster, which consists of a primary node, one or more standby nodes, and an optional witness node that confirms assertions in case of failure.
EFM continuously monitors system health and sends email alerts based on system events. When a failure occurs, it automatically switches to the most up-to-date standby and reconfigures all other standby servers to recognize the new primary node.
It also reconfigures load balancers (like pgPool) and prevents “split-brain” from occurring (when two nodes think each of them is primary).
Summary
Due to the high amount of data, scalability and security have become two of the most important criteria in database management , especially in a transaction environment. While we can improve vertical scalability by adding more resources/hardware to existing nodes, this is not always possible, often due to the cost or limitations of adding new hardware.
Therefore, horizontal scalability is necessary, which means adding more nodes to existing network nodes rather than improving the functionality of existing nodes. This is where PostgreSQL replication comes into play.
In this article, we discuss the types of PostgreSQL replications, benefits, replication modes, installation and failover of PostgreSQL between SMR and MMR. Now let's hear from you.
Which do you usually implement? Which database feature is most important to you and why? We'd love to read your thoughts! Share them in the comments section below.
Save time and costs, plus maximize site performance, with $300+ worth of enterprise-level integrations included in every Managed WordPress plan. This includes a high-performance CDN, DDoS protection, malware and hack mitigation, edge caching, and Google's fastest CPU machines. Get started with no long-term contracts, assisted migrations, and a 30-day money-back guarantee.