Automating Azure PostgreSQL Point-in-Time Recovery with Terraform
Chandan Bilvaraj
Engineer Digital Innovator | Embracing the Future of Technology with Creativity and Curiosity | Driving Change in the Tech World
Managing Terraform state effectively is crucial when utilizing Azure Database for PostgreSQL Flexible Server's Point-in-Time Recovery (PITR) feature. This process ensures that infrastructure as code (IaC) practices remain consistent and reliable, even during disaster recovery scenarios.
Understanding Azure PostgreSQL Flexible Server PITR
Azure's PITR feature allows for the restoration of a PostgreSQL server to a specific point in time, which is invaluable in cases of accidental data deletion or corruption. However, this functionality comes with certain constraints:
Challenges with Terraform State Management
Terraform maintains a state file to track the real-time status of infrastructure resources. When a new server is created via PITR, it exists outside of Terraform's awareness, leading to potential discrepancies:
Proposed Solution
To harmonize Terraform state management with Azure's PITR process, consider the following architectural approach:
To address the challenges of managing Terraform state during Azure PostgreSQL Flexible Server's Point-in-Time Recovery (PITR), a solution has been developed that utilizes variables and dynamic resource management. This approach aligns with immutable infrastructure principles and avoids direct state manipulation, making it suitable for highly regulated environments.
Key Variables:
领英推荐
Operational Workflow:
Terraform Code Reference
variables.tf (Defines the necessary variables)
variable "restore_mode" {
description = "Toggle between regular server management and performing a PITR"
type = bool
default = false
}
variable "post_restore_mode" {
description = "Enable post-restore configurations such as HA and other inherited settings for the restored server"
type = bool
default = false
}
variable "restore_timestamp" {
description = "Specific recovery point in UTC, e.g., '2025-01-01T23:42:42.8258553Z'"
type = string
default = ""
}
variable "restored_psql_server_name" {
description = "Name of the restored PostgreSQL server"
type = string
}
variable "psql_admin_username" {
description = "Administrator username for PostgreSQL"
type = string
}
variable "psql_admin_pwd" {
description = "Administrator password for PostgreSQL"
type = string
}
variable "allow_current_ip_firewall_rule_name" {
description = "Name of the firewall rule to allow current IP"
type = string
}
variable "test_db_name" {
description = "Name of the test database to be created"
type = string
}
main.tf (Main Terraform Configuration)
resource "azurerm_postgresql_flexible_server" "psql_restore" {
count = var.restore_mode ? 1 : 0
name = var.restored_psql_server_name
resource_group_name = azurerm_resource_group.psql_test_resource_group.name
location = azurerm_resource_group.psql_test_resource_group.location
create_mode = var.post_restore_mode ? "Default" : "PointInTimeRestore"
source_server_id = var.post_restore_mode ? null : azurerm_postgresql_flexible_server.psql_flexible.id
point_in_time_restore_time_in_utc = var.restore_timestamp
delegated_subnet_id = azurerm_subnet.psql_subnet.id
private_dns_zone_id = azurerm_private_dns_zone.psql_private_dns.id
backup_retention_days = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.backup_retention_days : null
version = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.version : null
administrator_login = var.post_restore_mode ? var.psql_admin_username : null
administrator_password = var.post_restore_mode ? var.psql_admin_pwd : null
storage_mb = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_mb : null
storage_tier = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_tier : null
sku_name = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.sku_name : null
auto_grow_enabled = true
public_network_access_enabled = false
zone = "2"
dynamic "high_availability" {
for_each = var.post_restore_mode ? [1] : []
content {
mode = "ZoneRedundant"
standby_availability_zone = 1
}
}
depends_on = [
azurerm_virtual_network.vnet,
azurerm_subnet.psql_subnet
]
tags = {
environment = "development"
}
lifecycle {
ignore_changes = [
create_mode,
source_server_id,
point_in_time_restore_time_in_utc
]
}
}
resource "azurerm_postgresql_flexible_server_firewall_rule" "allow_current_ip_restored" {
count = var.post_restore_mode ? 1 : 0
name = var.allow_current_ip_firewall_rule_name
server_id = azurerm_postgresql_flexible_server.psql_restore[0].id
start_ip_address = "91.89.45.13"
end_ip_address = "91.89.45.13"
depends_on = [
azurerm_postgresql_flexible_server.psql_restore
]
}
resource "azurerm_postgresql_flexible_server_configuration" "custom_param_restored" {
count = var.post_restore_mode ? 1 : 0
name = "logfiles.retention_days"
value = "4"
server_id = azurerm_postgresql_flexible_server.psql_restore[0].id
depends_on = [
azurerm_postgresql_flexible_server.psql_restore
]
}
resource "azurerm_postgresql_flexible_server_database" "additional_db_restored" {
count = var.post_restore_mode ? 1 : 0
name = var.test_db_name
server_id = azurerm_postgresql_flexible_server.psql_restore[0].id
collation = "en_US.utf8"
charset = "utf8"
depends_on = [
azurerm_postgresql_flexible_server.psql_restore
]
}
provider "azurerm" {
features {}
subscription_id = "<SUBSCRIPTION_ID>"
tenant_id = "<TENANT_ID>"
client_id = "<CLIENT_ID>"
client_secret = "<CLIENT_SECRET>"
}
Summary
Effectively managing Terraform state during Azure PostgreSQL Flexible Server's PITR operations requires a thoughtful approach that integrates automation and adherence to immutable infrastructure principles. Implementing automated state imports, utilizing dynamic variables, applying conditional resource management, and segmenting state files, organizations can enable DevOps Engineers to maintain consistency and reliability in their infrastructure as code practices, even during complex disaster recovery scenarios.
As a practice, adhere to the practices to ensure a robust and efficient recovery process,