Disaster Recovery for Azure SQL DB
Suman Kalyan Biswas
Senior Data Architect/Chief Platform Engineer at Direct Supply
#azuresql #disasterrecovery #databasebackup #Auto-failovergroups #unitedwholesalemortgage
The term?disaster recovery?is used to describe the activities that needs to be done to restore the database in the event of a catastrophic failure. In case of Azure SQL DB we are talking about a region level failure where the Availability Zones within the Azure region are not operational. Datacenter and Availability zone level failures within an Azure Region is handled through High Availability architecture which aims for no data loss and faster recovery.
More details on High Availability setup in Azure SQL:
In case of Disaster recovery there is potential data loss but the goal of this architecture is to recover from region level disaster or to undo some bad updates done by users or applications to the database.
In this article we will talk about the below 2 common DR strategies:
1) Auto Failover Groups
2) Database Backups and Long Term Retention
Auto Failover Groups
The auto-failover groups feature allows you to manage the replication and failover of some or all databases on a?logical server?to another region. You can initiate a geo-failover manually or you can delegate it to the Azure service based on a user-defined policy. The latter option allows you to automatically recover multiple related databases in a secondary region after a catastrophic failure in the primary region. Typically, these are outages that cannot be automatically mitigated by the built-in high availability infrastructure.
You can choose to failover to the secondary manually. By default, a failover group is configured with an automatic failover policy. The system triggers a geo-failover after the failure is detected and the grace period has expired. In both these approaches there is a potential 5seconds data loss
?By configuring?GracePeriodWithDataLossHours, you can control how long the system waits before initiating a forced failover, which may result in data loss. Because verification of the scale of the outage and how quickly it can be mitigated involves human actions, the grace period cannot be set below one hour.
Endpoint redirection
Auto-failover groups provide read-write and read-only listener end-points that remain unchanged during geo-failovers. This means you do not have to change the connection string for your application after a geo-failover, because connections are automatically routed to the current primary.
领英推è
Another very similar approach is using Geo-replication. For Geo-replication you have to update the connection strings manually post failover but ?geo-replication supports multiple RO targets including in the same region, while failover group supports only two SQL instances in different regions, in which one is RW and another is RO
Database Backups and Long Term Retention
Azure SQL Database creates:
- Full backups?every week.
- Differential backups?every 12 or 24 hours.
- Transaction log backups?approximately every 10 minutes.
The exact frequency of transaction log backups is based on the compute size and the amount of database activity
For backup Storage Redundancy we have?Locally redundant storage (LRS), Zone-redundant storage (ZRS) and Geo-redundant storage (GRS) options. GRS?Copies your backups synchronously three times within a single physical location in the primary region by using LRS. Then it copies your data asynchronously three times to a single physical location in the?paired?secondary region
The point-in-time restore functionality allow you to restore your database to a state at a defined date and time. By default you can do this for a 7 day window but you can extend this functionality to 35 days
Retention: Database Backups are retained for 7 days by default. You can enable Long Term retention for up to 520 weeks for compliance reasons.
More details regarding this topic at: