登录查看更多内容

How to Implement the 4 Golden Signals Alerts in New Relic Using Terraform

Elison G.

Senior Site Reliability Engineer(SRE) | Certified Cloud Engineer | DevOps Engineer | 3x NSE Fortinet | 1x GCP | CKAD(In Progress)

发布日期: 2025年3月6日

Monitoring is essential to maintaining the reliability and performance of modern applications. The Four Golden Signals—Latency, Traffic, Errors, and Saturation—are critical metrics introduced by Google’s Site Reliability Engineering (SRE) principles to monitor system health.

Using New Relic for monitoring and Terraform for Infrastructure as Code (IaC), we can automate the deployment of alerts based on these four signals, ensuring proactive issue detection and faster resolution.

This article will guide you through implementing New Relic alerts for the Four Golden Signals using Terraform.

Prerequisites

Before you begin, ensure you have:

A New Relic account and API key
Terraform installed (>=1.0.0)
The New Relic Terraform provider configured

If you haven’t configured Terraform with New Relic before, create a file called provider.tf and add the following:

terraform {
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
      version = "~> 2.0"
    }
  }
}

provider "newrelic" {
  account_id = var.newrelic_account_id
  api_key    = var.newrelic_api_key
  region     = "US" 
}

Define the required variables in variables.tf:

variable "newrelic_account_id" {}
variable "newrelic_api_key" {}

And in terraform.tfvars:

newrelic_account_id = "YOUR_NEW_RELIC_ACCOUNT_ID"
newrelic_api_key    = "YOUR_NEW_RELIC_API_KEY"

Step 1: Create an Alert Policy

New Relic requires an alert policy to group related alerts. Let’s create one for Golden Signals Alerts in alerts.tf:

resource "newrelic_alert_policy" "golden_signals" {
  name = "Golden Signals Alerts"
  incident_preference = "PER_POLICY"
}

This policy ensures that all incidents follow a per-policy preference, meaning all violations will be grouped into a single incident.

Step 2: Create Alert Conditions for the Four Golden Signals

1. Latency (Response Time)

Latency measures how long requests take to complete. We can monitor response times using an APM metric condition.

resource "newrelic_alert_condition" "latency" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High Response Time"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "response_time_web"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 2000  # 2 seconds
    time_function = "all"
  }
}

This alert will trigger if the average response time exceeds 2 seconds for 5 minutes.

2. Traffic (Request Throughput)

Traffic measures the number of incoming requests per minute (RPM).

resource "newrelic_alert_condition" "traffic" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "Low Traffic"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "throughput_web"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "below"
    priority      = "critical"
    threshold     = 10  # Alert if traffic drops below 10 RPM
    time_function = "all"
  }
}

This ensures we are alerted if the application receives less than 10 requests per minute.

领英推荐

The 9 Breakthrough Strategies for Mastering Complex IT…

Michael Ferrara 3 个月前

IaC - Comprehensive Monitoring from Development to…

Murari Lal Sharma 1 年前

Service Threat Engineering: Taking a Page from Site…

Jason Bloomberg 2 年前

3. Errors (Error Rate)

Monitoring error rates helps detect increasing failures in your application.

resource "newrelic_alert_condition" "errors" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High Error Rate"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "error_percentage"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 5  # Alert if errors exceed 5%
    time_function = "all"
  }
}

This alert triggers if the error rate goes beyond 5% for 5 minutes.

4. Saturation (CPU Utilization)

Saturation refers to resource exhaustion, often represented by CPU or memory usage.

resource "newrelic_alert_condition" "saturation" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High CPU Utilization"
  type       = "infra_metric"
  entities   = ["YOUR_INFRASTRUCTURE_ENTITY_ID"]
  metric     = "cpuPercent"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 90  # Alert if CPU usage exceeds 90%
    time_function = "all"
  }
}

This alert will trigger if CPU usage exceeds 90% for 5 minutes.

Step 3: Configure Alert Notifications

To receive notifications, we must configure a notification channel, such as Slack, email, or PagerDuty. Here’s how to set up a Slack notification channel:

resource "newrelic_notification_channel" "slack" {
  name = "Slack Alerts"
  type = "slack"
  
  config {
    url = "YOUR_SLACK_WEBHOOK_URL"
  }
}

Now, link it to our alert policy:

resource "newrelic_alert_policy_channel" "golden_signals_slack" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  channel_ids = [newrelic_notification_channel.slack.id]
}

Step 4: Deploy the Configuration

Once the configuration is complete, apply it using Terraform:

terraform init
terraform plan
terraform apply

Terraform will create the alerts in New Relic, ensuring automatic monitoring of the 4 Golden Signals.

Conclusion

By implementing New Relic alerts with Terraform, you can proactively monitor application health based on the Four Golden Signals:

? Latency - Detects slow response times

? Traffic - Monitors request throughput

? Errors - Alerts on increased error rates

? Saturation - Tracks high CPU usage

Using Infrastructure as Code (IaC) ensures that your alerting setup is consistent, repeatable, and version-controlled.

Start monitoring your application effectively with New Relic and Terraform today! ??

Spread the Knowledge! ??

If you found this guide helpful, repost it to help others learn how to automate New Relic alerts with Terraform! ??

Let’s empower more developers and SREs to build reliable, well-monitored systems—one alert at a time! ??? #DevOps #Terraform #NewRelic

Wagner Santos

1 周

Valuable information! Learning from experienced professionals is always great.

1 次回应

Fabrício Ferreira

Flutter Software Engineer | Mobile Developer | Flutter | Android & iOS Apps | 6+ Years

2 周

Love this perspective, Elison G. ??

Mauro Marins

3 周

?? Great insight!

Jo?o Paulo Ferreira Santos

3 周

Very informative!

1 次回应

Julio César

3 周

Great advice

1 次回应

查看更多评论

要查看或添加评论，请登录

Elison G.的更多文章

Your Monitoring is Lying to You

2025年3月26日

Your Monitoring is Lying to You

It’s 3 AM. Your system is down.

16 条评论
Understanding the Four Golden Signals and How They Can Improve Your Application

2025年3月25日

Understanding the Four Golden Signals and How They Can Improve Your Application

In today’s fast-paced digital landscape, ensuring the reliability and performance of your applications is crucial…

10 条评论
Deploying Alerts in New Relic Using Terraform to Monitor RabbitMQ Queues

2025年3月21日

Deploying Alerts in New Relic Using Terraform to Monitor RabbitMQ Queues

Introduction Monitoring RabbitMQ queues is crucial for ensuring smooth message processing and preventing system…

17 条评论
How to Solve "admin_ssh_key is not a complete SSH2 Pub Key" in Terraform for Azure

2025年3月12日

How to Solve "admin_ssh_key is not a complete SSH2 Pub Key" in Terraform for Azure

When provisioning an Azure Linux Virtual Machine using Terraform, you might encounter the following error: This issue…

12 条评论
Is Your Company Falling Behind? How DevOps is Revolutionizing Business Value and Innovation

2025年3月9日

Is Your Company Falling Behind? How DevOps is Revolutionizing Business Value and Innovation

"There is no such thing as a DevOps culture..

19 条评论
How I Solved a Google App Engine Challenge: Finding the Root Directory of Your App

2025年3月4日

How I Solved a Google App Engine Challenge: Finding the Root Directory of Your App

As a cloud developer, I frequently encounter challenges in deploying and managing applications. One such issue I…

10 条评论
? Stop Wasting Hours on Debugging: Discover the Git Command That Will Save You Days?

2025年2月28日

? Stop Wasting Hours on Debugging: Discover the Git Command That Will Save You Days?

Have you ever found yourself stuck with a bug that seems impossible to track down? ?? Spent hours or even days trying…

4 条评论
Protect Your Docker Images: The Ultimate Guide to Private Registries in CI/CD

2025年2月28日

Protect Your Docker Images: The Ultimate Guide to Private Registries in CI/CD

In today's fast-paced development world, Docker containers have become an integral part of the software lifecycle…

18 条评论
?? Mastering Docker Anchors: The Key to Cleaner and More Efficient YAML Files

2025年2月23日

?? Mastering Docker Anchors: The Key to Cleaner and More Efficient YAML Files

Why You Should Use Anchors in Docker Compose Managing Docker Compose files can quickly become overwhelming, especially…

20 条评论
?? NRQL Optimization for Observability & SRE: A Smarter Way to Filter HTTP Status Codes

2025年2月11日

?? NRQL Optimization for Observability & SRE: A Smarter Way to Filter HTTP Status Codes

?? NRQL Optimization: A Smarter Way to Filter HTTP Status Codes When writing NRQL queries in New Relic, small tweaks…

14 条评论

See all articles

How to Implement the 4 Golden Signals Alerts in New Relic Using Terraform

Elison G.

Senior Site Reliability Engineer(SRE) | Certified Cloud Engineer | DevOps Engineer | 3x NSE Fortinet | 1x GCP | CKAD(In Progress)

Prerequisites

Step 1: Create an Alert Policy

Step 2: Create Alert Conditions for the Four Golden Signals

1. Latency (Response Time)

2. Traffic (Request Throughput)

领英推荐

3. Errors (Error Rate)

4. Saturation (CPU Utilization)

Step 3: Configure Alert Notifications

Step 4: Deploy the Configuration

Conclusion

Spread the Knowledge! ??

Elison G.的更多文章

社区洞察

其他会员也浏览了

Service Threat Engineering: Taking a Page from Site Reliability Engineering

?? Tips to help you avoid your worst reliability nightmares

Designing for Reliability and Resilience

Top 10 Skills and Activities that Platform Engineers and SRE's rely on every day.

Site Reliability Engineering: Building Reliable Systems for Business Growth

Monitoring, APM, OpenTelemetry, Observability - modern-day requisites for uninterrupted business operations

Automating IT Infrastructure Deployment with GenAI: Because “Did You Try Restarting It?” Isn’t a Strategy

Improving System Reliability with Observability Practices: A KineticSkunk Perspective

Our Performance Optimization Services Uncovered

Why Kubernetes Policies are a Day-0 concern

Prerequisites

Step 1: Create an Alert Policy

Step 2: Create Alert Conditions for the Four Golden Signals

1. Latency (Response Time)

2. Traffic (Request Throughput)

领英推荐

3. Errors (Error Rate)

4. Saturation (CPU Utilization)

Step 3: Configure Alert Notifications

Step 4: Deploy the Configuration

Conclusion

Spread the Knowledge! ??

Elison G.的更多文章

Your Monitoring is Lying to You

Understanding the Four Golden Signals and How They Can Improve Your Application

Deploying Alerts in New Relic Using Terraform to Monitor RabbitMQ Queues

How to Solve "admin_ssh_key is not a complete SSH2 Pub Key" in Terraform for Azure

Is Your Company Falling Behind? How DevOps is Revolutionizing Business Value and Innovation

How I Solved a Google App Engine Challenge: Finding the Root Directory of Your App

? Stop Wasting Hours on Debugging: Discover the Git Command That Will Save You Days?

Protect Your Docker Images: The Ultimate Guide to Private Registries in CI/CD

?? Mastering Docker Anchors: The Key to Cleaner and More Efficient YAML Files

?? NRQL Optimization for Observability & SRE: A Smarter Way to Filter HTTP Status Codes

社区洞察

其他会员也浏览了

Service Threat Engineering: Taking a Page from Site Reliability Engineering

?? Tips to help you avoid your worst reliability nightmares

Designing for Reliability and Resilience

Top 10 Skills and Activities that Platform Engineers and SRE's rely on every day.

Site Reliability Engineering: Building Reliable Systems for Business Growth

Monitoring, APM, OpenTelemetry, Observability - modern-day requisites for uninterrupted business operations

Automating IT Infrastructure Deployment with GenAI: Because “Did You Try Restarting It?” Isn’t a Strategy

Improving System Reliability with Observability Practices: A KineticSkunk Perspective

Our Performance Optimization Services Uncovered

Why Kubernetes Policies are a Day-0 concern