Enhancing Data Resilience in Azure Database for PostgreSQL Flexible Server: Exploring High Availability Options
Introduction
I recently spoke with a client who was worried about data loss in their Azure Database for PostgreSQL Flexible Server. They read a statement in the documentation that said:?
"Transaction log backups happen at varied frequencies, depending on the workload and when the WAL file is filled and ready to be archived. In general, the delay (RPO) can be up to 5 minutes."
They were concerned that they might lose up to five minutes of data if something went wrong. I wanted to help them understand that Azure offers several high availability options to reduce/avoid data loss.
In this article, I'll explain these options to help others who might have the same concerns.
What are the High Availability Options??
Azure provides several ways to keep your PostgreSQL database available and minimize unavailability. Let's look at them one by one.
Zone-Redundant High Availability
How Availability Zone HA Works
In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.
Azure keeps a standby replica that is continuously synchronized with the primary server.
If the primary server fails, the system automatically switches to the standby server. This usually happens quickly, without much downtime.
One of the best ways to protect your database is by using zone-redundant high availability within the same Azure region.
Same Zone vs. Zone-Redundant Configurations
- Same Zone Deployment:
A standby replica server is automatically provisioned and managed in the same availability zone with similar compute, storage, and network configuration as the primary server. Data from the primary server is replicated to the standby replica in synchronous mode. In the event of any disruption to the primary server, the server is automatically failed over to the standby replica.
This setup has low latency but doesn't protect against failures in that zone.?
- Zone-Redundant Deployment:
The primary and standby servers are in different availability zones within the same region with automatic failover capability. You can choose the region and the availability zones for both primary and standby servers.
This protects against failures in a single zone. Using zone redundancy greatly improve your system's resilience.
Impact on RPO
The Recovery Point Objective (RPO) is the maximum acceptable amount of data loss measured in time. With same zone and zone-redundant high availability, the RPO is expected to be zero (no data loss).?
Documentation Links
?
Read Replicas
Another option is to use read replicas to improve data availability and reduce the load on your primary server.
The read replica feature allows you to replicate data from an Azure Database for PostgreSQL flexible server instance to a read-only replica.
Setting Up Read Replicas
You can create up to five read-only replicas of your primary server.?
When a read replica is created, it inherits specific server configurations from the primary server. These configurations can be changed either during the replica's creation or after it has been set up.
These replicas can be in the same region or in different regions.
Asynchronous Replication
The difference with the built-in Zonal High Availability is that read replicas are updated asynchronously with the PostgreSQL engine's native physical replication technology, which means there is a small delay between the primary server and the replicas.
Impact on RPO
The RPO depends on the replication lag. In practice, this lag is usually a few seconds to a minute.
Manual Failover
If the primary server fails, you need to manually promote a read replica to become the new primary server.
Documentation Links
?
?
Cross-Region Replication with Failover
If you are concerned about failures in an entire region, you can use cross-region replication.
Cross-Region Read Replicas
By setting up read replicas in different regions, you can protect your database against regional outages.
领英推荐
This is useful for disaster recovery scenarios.
Manual Failover Process
In case of a regional failure, you need to manually promote the cross-region read replica to become the primary server.
Impact on RPO
The RPO depends on the replication lag and the network latency between regions.
Documentation Links
?
Backup and Restore
Finally, you can use backup and restore features as an additional layer of protection.
Automated Backups
Azure Database for PostgreSQL flexible server takes?snapshot backups of data files and stores them securely in redundant storage. The server also backs up transaction logs when the write-ahead log (WAL) file is ready to be archived.
Multiple copies of your backups are stored in Locally redundant, Zone-redundant or Geo-redundant backup storage to help protect your data from planned and unplanned events.
!! After a server is provisioned, you can't change the backup storage redundancy option. !!
You can restore your database to any point in time within the retention period.
A common question is: Can we export these backups? These backup files can't be exported or used to create servers outside Azure Database for PostgreSQL flexible server.
Limitations in Terms of RPO
The RPO can be up to 5 minutes because of how transaction log backups work.
This method is mainly for recovering from data corruption or accidental deletions, not for minimizing downtime during failures.
Documentation Links
?
Summarize of High Availability Options
Here is a comparison of the different options:
Conclusion
While the default transaction log backups might allow for up to five minutes of data loss, using high availability features can reduce this to just a few seconds. Implementing zone-redundant high availability provides automatic failover within the same region with no data loss. Read replicas can also help, especially when set up across regions, although they require manual failover.
?
Recommendations:
For Critical Applications:
? - Use zone-redundant high availability to minimize data loss and downtime.
For Disaster Recovery:
? - Set up cross-region read replicas. Be prepared to manually promote the replica if needed.
For Data Protection:
? - Continue using Azure's backup and restore features to protect against data corruption and accidental deletions.
By combining these options, you can create a solution that fits your needs and reduces the risk of data loss.
Always think about implementing retry policies in your application.
?
Additional Resources
?
?
If you have any questions or experiences to share, please leave a comment below. Let's help each other make the most of Azure's features!
GBB Specialist Data and AI at Microsoft
5 个月Great article Julien.