Automating Azure PostgreSQL Point-in-Time Recovery with Terraform

Automating Azure PostgreSQL Point-in-Time Recovery with Terraform

Managing Terraform state effectively is crucial when utilizing Azure Database for PostgreSQL Flexible Server's Point-in-Time Recovery (PITR) feature. This process ensures that infrastructure as code (IaC) practices remain consistent and reliable, even during disaster recovery scenarios.

Understanding Azure PostgreSQL Flexible Server PITR

Azure's PITR feature allows for the restoration of a PostgreSQL server to a specific point in time, which is invaluable in cases of accidental data deletion or corruption. However, this functionality comes with certain constraints:

  • New Server Creation: PITR necessitates the creation of a new server with a unique name; in-place restoration of the original server isn't supported.
  • Inherited Configurations: The restored server adopts attributes from the original, including pricing tier, compute generation, backup retention, and networking settings.
  • High Availability (HA): Even if the original server was configured with HA, the restored server will not have HA enabled by default.
  • Immutable Networking Settings: During the restore process, networking configurations cannot be altered; switching between private and public access modes isn't permitted.


Challenges with Terraform State Management

Terraform maintains a state file to track the real-time status of infrastructure resources. When a new server is created via PITR, it exists outside of Terraform's awareness, leading to potential discrepancies:

  • State Drift: Terraform's state may not reflect the newly restored server, causing inconsistencies during future deployments.
  • Resource Conflicts: Managing both the original and restored servers can lead to conflicts, especially if Terraform attempts to recreate or modify resources unaware of the PITR process.


Proposed Solution

To harmonize Terraform state management with Azure's PITR process, consider the following architectural approach:

  1. Automated State Import: After initiating a PITR and creating a new server, automate the import of the restored server into Terraform's state using the terraform import command. This action ensures that Terraform recognizes the new server as part of its managed infrastructure.
  2. Dynamic Configuration with Variables: Utilize input variables to manage server names and configurations dynamically. This strategy allows for flexibility without altering the core Terraform configuration, adhering to immutable infrastructure principles.
  3. Conditional Resource Handling: Implement conditional logic within Terraform configurations to manage resources based on the environment (e.g., production, disaster recovery). This method ensures that Terraform applies the appropriate settings without manual intervention.
  4. State File Segmentation: Consider maintaining separate state files for different environments or scenarios. This practice can prevent conflicts and maintain clarity between production and disaster recovery resources.

To address the challenges of managing Terraform state during Azure PostgreSQL Flexible Server's Point-in-Time Recovery (PITR), a solution has been developed that utilizes variables and dynamic resource management. This approach aligns with immutable infrastructure principles and avoids direct state manipulation, making it suitable for highly regulated environments.

Key Variables:

  1. restore_mode: Toggles between standard server management and initiating a PITR.
  2. post_restore_mode: Enables configurations such as High Availability (HA) and other settings for the restored server after PITR.
  3. restore_timestamp: Specifies the exact recovery point, e.g., "2025-01-01T23:42:42.8258553Z".

Operational Workflow:

  • Dynamic Resource Management: The solution employs conditional logic to manage the creation and import of resources based on the restore_mode and post_restore_mode variables. This ensures that the restored server is appropriately configured without manual state manipulation.
  • Sequential Terraform Applications: Due to the inability to enable HA during the initial restore, the process requires two terraform apply executions. The first applies the PITR, creating the new server, and the second configures HA and other inherited settings.
  • Consistent Configuration: The restored server is defined in Terraform to mirror the original server's configuration, ensuring consistency. However, the code can be adjusted to apply different configurations if necessary.


Terraform Code Reference

variables.tf (Defines the necessary variables)

variable "restore_mode" {
  description = "Toggle between regular server management and performing a PITR"
  type        = bool
  default     = false
}

variable "post_restore_mode" {
  description = "Enable post-restore configurations such as HA and other inherited settings for the restored server"
  type        = bool
  default     = false
}

variable "restore_timestamp" {
  description = "Specific recovery point in UTC, e.g., '2025-01-01T23:42:42.8258553Z'"
  type        = string
  default     = ""
}

variable "restored_psql_server_name" {
  description = "Name of the restored PostgreSQL server"
  type        = string
}

variable "psql_admin_username" {
  description = "Administrator username for PostgreSQL"
  type        = string
}

variable "psql_admin_pwd" {
  description = "Administrator password for PostgreSQL"
  type        = string
}

variable "allow_current_ip_firewall_rule_name" {
  description = "Name of the firewall rule to allow current IP"
  type        = string
}

variable "test_db_name" {
  description = "Name of the test database to be created"
  type        = string
}
        

main.tf (Main Terraform Configuration)

resource "azurerm_postgresql_flexible_server" "psql_restore" {
  count               = var.restore_mode ? 1 : 0
  name                = var.restored_psql_server_name
  resource_group_name = azurerm_resource_group.psql_test_resource_group.name
  location            = azurerm_resource_group.psql_test_resource_group.location

  create_mode                   = var.post_restore_mode ? "Default" : "PointInTimeRestore"
  source_server_id              = var.post_restore_mode ? null : azurerm_postgresql_flexible_server.psql_flexible.id
  point_in_time_restore_time_in_utc = var.restore_timestamp

  delegated_subnet_id = azurerm_subnet.psql_subnet.id
  private_dns_zone_id = azurerm_private_dns_zone.psql_private_dns.id

  backup_retention_days = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.backup_retention_days : null
  version                = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.version : null
  administrator_login    = var.post_restore_mode ? var.psql_admin_username : null
  administrator_password = var.post_restore_mode ? var.psql_admin_pwd : null

  storage_mb        = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_mb : null
  storage_tier      = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_tier : null
  sku_name          = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.sku_name : null
  auto_grow_enabled = true

  public_network_access_enabled = false
  zone                          = "2"

  dynamic "high_availability" {
    for_each = var.post_restore_mode ? [1] : []
    content {
      mode                      = "ZoneRedundant"
      standby_availability_zone = 1
    }
  }

  depends_on = [
    azurerm_virtual_network.vnet,
    azurerm_subnet.psql_subnet
  ]

  tags = {
    environment = "development"
  }

  lifecycle {
    ignore_changes = [
      create_mode,
      source_server_id,
      point_in_time_restore_time_in_utc
    ]
  }
}

resource "azurerm_postgresql_flexible_server_firewall_rule" "allow_current_ip_restored" {
  count            = var.post_restore_mode ? 1 : 0
  name             = var.allow_current_ip_firewall_rule_name
  server_id        = azurerm_postgresql_flexible_server.psql_restore[0].id
  start_ip_address = "91.89.45.13"
  end_ip_address   = "91.89.45.13"

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}

resource "azurerm_postgresql_flexible_server_configuration" "custom_param_restored" {
  count     = var.post_restore_mode ? 1 : 0
  name      = "logfiles.retention_days"
  value     = "4"
  server_id = azurerm_postgresql_flexible_server.psql_restore[0].id

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}

resource "azurerm_postgresql_flexible_server_database" "additional_db_restored" {
  count     = var.post_restore_mode ? 1 : 0
  name      = var.test_db_name
  server_id = azurerm_postgresql_flexible_server.psql_restore[0].id
  collation = "en_US.utf8"
  charset   = "utf8"

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}
        

provider.tf

provider "azurerm" {
  features {}

  subscription_id = "<SUBSCRIPTION_ID>"
  tenant_id       = "<TENANT_ID>"
  client_id       = "<CLIENT_ID>"
  client_secret   = "<CLIENT_SECRET>"
}
        


Summary

Effectively managing Terraform state during Azure PostgreSQL Flexible Server's PITR operations requires a thoughtful approach that integrates automation and adherence to immutable infrastructure principles. Implementing automated state imports, utilizing dynamic variables, applying conditional resource management, and segmenting state files, organizations can enable DevOps Engineers to maintain consistency and reliability in their infrastructure as code practices, even during complex disaster recovery scenarios.

As a practice, adhere to the practices to ensure a robust and efficient recovery process,

  • Ensure backups are enabled with an appropriate retention period to meet recovery objectives
  • Use Terraform to manage the creation and restoration of PostgreSQL servers. This ensures consistency and repeatability
  • Avoid hardcoding sensitive information like passwords in Terraform configurations. Use tools like Azure Key Vault or environment variables
  • Define the restored server in Terraform to inherit the original server’s configuration, including version, storage, and performance settings

要查看或添加评论,请登录

Chandan Bilvaraj的更多文章

社区洞察

其他会员也浏览了