Enhancing Data Resilience in Azure Database for PostgreSQL Flexible Server: Exploring High Availability Options
Data Resilience in Azure Database for PostgreSQL Flexible Server

Enhancing Data Resilience in Azure Database for PostgreSQL Flexible Server: Exploring High Availability Options

Introduction

I recently spoke with a client who was worried about data loss in their Azure Database for PostgreSQL Flexible Server. They read a statement in the documentation that said:?

"Transaction log backups happen at varied frequencies, depending on the workload and when the WAL file is filled and ready to be archived. In general, the delay (RPO) can be up to 5 minutes."

They were concerned that they might lose up to five minutes of data if something went wrong. I wanted to help them understand that Azure offers several high availability options to reduce/avoid data loss.

In this article, I'll explain these options to help others who might have the same concerns.

What are the High Availability Options??

Azure provides several ways to keep your PostgreSQL database available and minimize unavailability. Let's look at them one by one.

Zone-Redundant High Availability

How Availability Zone HA Works

In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.

Azure keeps a standby replica that is continuously synchronized with the primary server.

If the primary server fails, the system automatically switches to the standby server. This usually happens quickly, without much downtime.

One of the best ways to protect your database is by using zone-redundant high availability within the same Azure region.

Same Zone vs. Zone-Redundant Configurations

- Same Zone Deployment:

A standby replica server is automatically provisioned and managed in the same availability zone with similar compute, storage, and network configuration as the primary server. Data from the primary server is replicated to the standby replica in synchronous mode. In the event of any disruption to the primary server, the server is automatically failed over to the standby replica.

This setup has low latency but doesn't protect against failures in that zone.?

- Zone-Redundant Deployment:

The primary and standby servers are in different availability zones within the same region with automatic failover capability. You can choose the region and the availability zones for both primary and standby servers.

This protects against failures in a single zone. Using zone redundancy greatly improve your system's resilience.

Impact on RPO

The Recovery Point Objective (RPO) is the maximum acceptable amount of data loss measured in time. With same zone and zone-redundant high availability, the RPO is expected to be zero (no data loss).?

Documentation Links

High Availability in Azure Database for PostgreSQL Flexible Server?

Configure Zone-Redundant High Availability

Zone-Redundant High Availability
Zone-Redundant High Availability

?

Read Replicas

Another option is to use read replicas to improve data availability and reduce the load on your primary server.

The read replica feature allows you to replicate data from an Azure Database for PostgreSQL flexible server instance to a read-only replica.

Setting Up Read Replicas

You can create up to five read-only replicas of your primary server.?

When a read replica is created, it inherits specific server configurations from the primary server. These configurations can be changed either during the replica's creation or after it has been set up.

These replicas can be in the same region or in different regions.

Asynchronous Replication

The difference with the built-in Zonal High Availability is that read replicas are updated asynchronously with the PostgreSQL engine's native physical replication technology, which means there is a small delay between the primary server and the replicas.

Impact on RPO

The RPO depends on the replication lag. In practice, this lag is usually a few seconds to a minute.

Manual Failover

If the primary server fails, you need to manually promote a read replica to become the new primary server.

Documentation Links

Read Replicas in Azure Database for PostgreSQL Flexible Server?

Configure Replication

?

Read Replicas
Read Replicas Architecture

?

Cross-Region Replication with Failover

If you are concerned about failures in an entire region, you can use cross-region replication.

Cross-Region Read Replicas

By setting up read replicas in different regions, you can protect your database against regional outages.

This is useful for disaster recovery scenarios.

Manual Failover Process

In case of a regional failure, you need to manually promote the cross-region read replica to become the primary server.

Impact on RPO

The RPO depends on the replication lag and the network latency between regions.

Documentation Links

?Disaster Recovery Strategies

?

Backup and Restore

Finally, you can use backup and restore features as an additional layer of protection.

Automated Backups

Azure Database for PostgreSQL flexible server takes?snapshot backups of data files and stores them securely in redundant storage. The server also backs up transaction logs when the write-ahead log (WAL) file is ready to be archived.

Multiple copies of your backups are stored in Locally redundant, Zone-redundant or Geo-redundant backup storage to help protect your data from planned and unplanned events.

!! After a server is provisioned, you can't change the backup storage redundancy option. !!

You can restore your database to any point in time within the retention period.

A common question is: Can we export these backups? These backup files can't be exported or used to create servers outside Azure Database for PostgreSQL flexible server.

Limitations in Terms of RPO

The RPO can be up to 5 minutes because of how transaction log backups work.

This method is mainly for recovering from data corruption or accidental deletions, not for minimizing downtime during failures.

Documentation Links

Backup and Restore Overview

Geo-Redundant Backups and Disaster Recovery

Perform a Point-in-Time Restore

?


Summarize of High Availability Options

Here is a comparison of the different options:


Conclusion

While the default transaction log backups might allow for up to five minutes of data loss, using high availability features can reduce this to just a few seconds. Implementing zone-redundant high availability provides automatic failover within the same region with no data loss. Read replicas can also help, especially when set up across regions, although they require manual failover.

?

Recommendations:

For Critical Applications:

? - Use zone-redundant high availability to minimize data loss and downtime.

For Disaster Recovery:

? - Set up cross-region read replicas. Be prepared to manually promote the replica if needed.

For Data Protection:

? - Continue using Azure's backup and restore features to protect against data corruption and accidental deletions.

By combining these options, you can create a solution that fits your needs and reduces the risk of data loss.

Always think about implementing retry policies in your application.

?

Additional Resources

Azure Database for PostgreSQL Flexible Server Documentation

Concepts of Reliability

?


?

If you have any questions or experiences to share, please leave a comment below. Let's help each other make the most of Azure's features!

Brian McKerr

GBB Specialist Data and AI at Microsoft

5 个月

Great article Julien.

回复

要查看或添加评论,请登录

Julien M.的更多文章

社区洞察

其他会员也浏览了