登录查看更多内容

Automating Azure PostgreSQL Point-in-Time Recovery with Terraform

Chandan Bilvaraj

Engineer Digital Innovator | Embracing the Future of Technology with Creativity and Curiosity | Driving Change in the Tech World

发布日期: 2025年1月6日

Managing Terraform state effectively is crucial when utilizing Azure Database for PostgreSQL Flexible Server's Point-in-Time Recovery (PITR) feature. This process ensures that infrastructure as code (IaC) practices remain consistent and reliable, even during disaster recovery scenarios.

Understanding Azure PostgreSQL Flexible Server PITR

Azure's PITR feature allows for the restoration of a PostgreSQL server to a specific point in time, which is invaluable in cases of accidental data deletion or corruption. However, this functionality comes with certain constraints:

New Server Creation: PITR necessitates the creation of a new server with a unique name; in-place restoration of the original server isn't supported.
Inherited Configurations: The restored server adopts attributes from the original, including pricing tier, compute generation, backup retention, and networking settings.
High Availability (HA): Even if the original server was configured with HA, the restored server will not have HA enabled by default.
Immutable Networking Settings: During the restore process, networking configurations cannot be altered; switching between private and public access modes isn't permitted.

Challenges with Terraform State Management

Terraform maintains a state file to track the real-time status of infrastructure resources. When a new server is created via PITR, it exists outside of Terraform's awareness, leading to potential discrepancies:

State Drift: Terraform's state may not reflect the newly restored server, causing inconsistencies during future deployments.
Resource Conflicts: Managing both the original and restored servers can lead to conflicts, especially if Terraform attempts to recreate or modify resources unaware of the PITR process.

Proposed Solution

To harmonize Terraform state management with Azure's PITR process, consider the following architectural approach:

Automated State Import: After initiating a PITR and creating a new server, automate the import of the restored server into Terraform's state using the terraform import command. This action ensures that Terraform recognizes the new server as part of its managed infrastructure.
Dynamic Configuration with Variables: Utilize input variables to manage server names and configurations dynamically. This strategy allows for flexibility without altering the core Terraform configuration, adhering to immutable infrastructure principles.
Conditional Resource Handling: Implement conditional logic within Terraform configurations to manage resources based on the environment (e.g., production, disaster recovery). This method ensures that Terraform applies the appropriate settings without manual intervention.
State File Segmentation: Consider maintaining separate state files for different environments or scenarios. This practice can prevent conflicts and maintain clarity between production and disaster recovery resources.

To address the challenges of managing Terraform state during Azure PostgreSQL Flexible Server's Point-in-Time Recovery (PITR), a solution has been developed that utilizes variables and dynamic resource management. This approach aligns with immutable infrastructure principles and avoids direct state manipulation, making it suitable for highly regulated environments.

Key Variables:

restore_mode: Toggles between standard server management and initiating a PITR.
post_restore_mode: Enables configurations such as High Availability (HA) and other settings for the restored server after PITR.
restore_timestamp: Specifies the exact recovery point, e.g., "2025-01-01T23:42:42.8258553Z".

领英推荐

Terraform

Noblejeet Batth 2 个月前

AWS Migration Services: Use Cases

Dr. Rabi Prasad Padhy 2 年前

Building an Inventory of Current Infrastructure for…

Manish Kumar 5 个月前

Operational Workflow:

Dynamic Resource Management: The solution employs conditional logic to manage the creation and import of resources based on the restore_mode and post_restore_mode variables. This ensures that the restored server is appropriately configured without manual state manipulation.
Sequential Terraform Applications: Due to the inability to enable HA during the initial restore, the process requires two terraform apply executions. The first applies the PITR, creating the new server, and the second configures HA and other inherited settings.
Consistent Configuration: The restored server is defined in Terraform to mirror the original server's configuration, ensuring consistency. However, the code can be adjusted to apply different configurations if necessary.

Terraform Code Reference

variables.tf (Defines the necessary variables)

variable "restore_mode" {
  description = "Toggle between regular server management and performing a PITR"
  type        = bool
  default     = false
}

variable "post_restore_mode" {
  description = "Enable post-restore configurations such as HA and other inherited settings for the restored server"
  type        = bool
  default     = false
}

variable "restore_timestamp" {
  description = "Specific recovery point in UTC, e.g., '2025-01-01T23:42:42.8258553Z'"
  type        = string
  default     = ""
}

variable "restored_psql_server_name" {
  description = "Name of the restored PostgreSQL server"
  type        = string
}

variable "psql_admin_username" {
  description = "Administrator username for PostgreSQL"
  type        = string
}

variable "psql_admin_pwd" {
  description = "Administrator password for PostgreSQL"
  type        = string
}

variable "allow_current_ip_firewall_rule_name" {
  description = "Name of the firewall rule to allow current IP"
  type        = string
}

variable "test_db_name" {
  description = "Name of the test database to be created"
  type        = string
}

main.tf (Main Terraform Configuration)

resource "azurerm_postgresql_flexible_server" "psql_restore" {
  count               = var.restore_mode ? 1 : 0
  name                = var.restored_psql_server_name
  resource_group_name = azurerm_resource_group.psql_test_resource_group.name
  location            = azurerm_resource_group.psql_test_resource_group.location

  create_mode                   = var.post_restore_mode ? "Default" : "PointInTimeRestore"
  source_server_id              = var.post_restore_mode ? null : azurerm_postgresql_flexible_server.psql_flexible.id
  point_in_time_restore_time_in_utc = var.restore_timestamp

  delegated_subnet_id = azurerm_subnet.psql_subnet.id
  private_dns_zone_id = azurerm_private_dns_zone.psql_private_dns.id

  backup_retention_days = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.backup_retention_days : null
  version                = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.version : null
  administrator_login    = var.post_restore_mode ? var.psql_admin_username : null
  administrator_password = var.post_restore_mode ? var.psql_admin_pwd : null

  storage_mb        = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_mb : null
  storage_tier      = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.storage_tier : null
  sku_name          = var.post_restore_mode ? azurerm_postgresql_flexible_server.psql_flexible.sku_name : null
  auto_grow_enabled = true

  public_network_access_enabled = false
  zone                          = "2"

  dynamic "high_availability" {
    for_each = var.post_restore_mode ? [1] : []
    content {
      mode                      = "ZoneRedundant"
      standby_availability_zone = 1
    }
  }

  depends_on = [
    azurerm_virtual_network.vnet,
    azurerm_subnet.psql_subnet
  ]

  tags = {
    environment = "development"
  }

  lifecycle {
    ignore_changes = [
      create_mode,
      source_server_id,
      point_in_time_restore_time_in_utc
    ]
  }
}

resource "azurerm_postgresql_flexible_server_firewall_rule" "allow_current_ip_restored" {
  count            = var.post_restore_mode ? 1 : 0
  name             = var.allow_current_ip_firewall_rule_name
  server_id        = azurerm_postgresql_flexible_server.psql_restore[0].id
  start_ip_address = "91.89.45.13"
  end_ip_address   = "91.89.45.13"

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}

resource "azurerm_postgresql_flexible_server_configuration" "custom_param_restored" {
  count     = var.post_restore_mode ? 1 : 0
  name      = "logfiles.retention_days"
  value     = "4"
  server_id = azurerm_postgresql_flexible_server.psql_restore[0].id

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}

resource "azurerm_postgresql_flexible_server_database" "additional_db_restored" {
  count     = var.post_restore_mode ? 1 : 0
  name      = var.test_db_name
  server_id = azurerm_postgresql_flexible_server.psql_restore[0].id
  collation = "en_US.utf8"
  charset   = "utf8"

  depends_on = [
    azurerm_postgresql_flexible_server.psql_restore
  ]
}

provider.tf

provider "azurerm" {
  features {}

  subscription_id = "<SUBSCRIPTION_ID>"
  tenant_id       = "<TENANT_ID>"
  client_id       = "<CLIENT_ID>"
  client_secret   = "<CLIENT_SECRET>"
}

Summary

Effectively managing Terraform state during Azure PostgreSQL Flexible Server's PITR operations requires a thoughtful approach that integrates automation and adherence to immutable infrastructure principles. Implementing automated state imports, utilizing dynamic variables, applying conditional resource management, and segmenting state files, organizations can enable DevOps Engineers to maintain consistency and reliability in their infrastructure as code practices, even during complex disaster recovery scenarios.

As a practice, adhere to the practices to ensure a robust and efficient recovery process,

Ensure backups are enabled with an appropriate retention period to meet recovery objectives
Use Terraform to manage the creation and restoration of PostgreSQL servers. This ensures consistency and repeatability
Avoid hardcoding sensitive information like passwords in Terraform configurations. Use tools like Azure Key Vault or environment variables
Define the restored server in Terraform to inherit the original server’s configuration, including version, storage, and performance settings

要查看或添加评论，请登录

Chandan Bilvaraj的更多文章

Building a Strong Enterprise Data Governance Strategy with OneLake & Microsoft Purview

2025年3月14日

Building a Strong Enterprise Data Governance Strategy with OneLake & Microsoft Purview

Data governance is a critical practice to ensure your data is discoverable, trustworthy, secure, and compliant across…
Securing Enterprise Access in Real Time: Using Continuous Access Evaluation

2025年2月8日

Securing Enterprise Access in Real Time: Using Continuous Access Evaluation

Introduction Standard industry practice involves token expiration and refresh. Client applications use OAuth 2.
Streamlining Data Integration using Microsoft Fabric's OneLake

2024年12月16日

Streamlining Data Integration using Microsoft Fabric's OneLake

Microsoft Fabric's OneLake serves as a unified data lake for organizations, streamlining data storage and analytics. It…
Elevate Your AI Game: New Responsible AI Features in Azure AI

2024年12月1日

Elevate Your AI Game: New Responsible AI Features in Azure AI

A More Comprehensive Model Benchmarking Experience Azure AI Foundry now offers an upgraded model benchmarking…
Modernizing Your SIEM for AI-Powered Cybersecurity

2024年11月6日

Modernizing Your SIEM for AI-Powered Cybersecurity

As cyber threats grow more sophisticated, traditional security tools with manual processes can no longer keep up. These…
Azure OpenAI Shield: Strengthening Security Infrastructure with Advanced Monitoring and Logging for Enterprise Deployments

2024年2月3日

Azure OpenAI Shield: Strengthening Security Infrastructure with Advanced Monitoring and Logging for Enterprise Deployments

In the realm of enterprise solutions, the paramount significance of logging and monitoring cannot be overstated. These…
Building Well-Architected Solutions on Cloud

2022年3月5日

Building Well-Architected Solutions on Cloud

We can easily build, deploy and manage our solutions on the cloud. But, the most challenging part would be building and…
Strict Transport Security

2017年12月1日

Strict Transport Security

The HTTP Strict Transport Security often abbreviated as HSTS is a security enhancement that can be opted by the web…

See all articles

Automating Azure PostgreSQL Point-in-Time Recovery with Terraform

Chandan Bilvaraj

Engineer Digital Innovator | Embracing the Future of Technology with Creativity and Curiosity | Driving Change in the Tech World

Understanding Azure PostgreSQL Flexible Server PITR

Challenges with Terraform State Management

Proposed Solution

领英推荐

Terraform Code Reference

Summary

Chandan Bilvaraj的更多文章

社区洞察

其他会员也浏览了

Configuring the Remote Server for Nagios Monitoring – Setting Up a Master and Remote Server with Nagios on CentOS 7 Using Vagrant – Part 3

Infrastructure as Code(Iac): Terraform vs Pulumi who's the Winner???

Secure PostgreSQL Traffic Management with CloudNativePG and Cilium

Azure Weekly Updates - July 25th, 2022

What is a hybrid backup strategy: GitOps in On-premise and Cloud Environments

Scaling from zero to millions of users - Load balancer

Enhancing Efficiency: A Developer's Manual for Load Balancing Techniques

Step-by-Step Azure Cloud Migration Process

Robust Docker Backups: Mitigating Risks in Production Environments

Ceph Storage Monitoring with Zabbix

Understanding Azure PostgreSQL Flexible Server PITR

Challenges with Terraform State Management

Proposed Solution

领英推荐

Terraform Code Reference

Summary

Chandan Bilvaraj的更多文章

Building a Strong Enterprise Data Governance Strategy with OneLake & Microsoft Purview

Securing Enterprise Access in Real Time: Using Continuous Access Evaluation

Streamlining Data Integration using Microsoft Fabric's OneLake

Elevate Your AI Game: New Responsible AI Features in Azure AI

Modernizing Your SIEM for AI-Powered Cybersecurity

Azure OpenAI Shield: Strengthening Security Infrastructure with Advanced Monitoring and Logging for Enterprise Deployments

Building Well-Architected Solutions on Cloud

Strict Transport Security

社区洞察

其他会员也浏览了

Configuring the Remote Server for Nagios Monitoring – Setting Up a Master and Remote Server with Nagios on CentOS 7 Using Vagrant – Part 3

Infrastructure as Code(Iac): Terraform vs Pulumi who's the Winner???

Secure PostgreSQL Traffic Management with CloudNativePG and Cilium

Azure Weekly Updates - July 25th, 2022

What is a hybrid backup strategy: GitOps in On-premise and Cloud Environments

Scaling from zero to millions of users - Load balancer

Enhancing Efficiency: A Developer's Manual for Load Balancing Techniques

Step-by-Step Azure Cloud Migration Process

Robust Docker Backups: Mitigating Risks in Production Environments

Ceph Storage Monitoring with Zabbix