登录查看更多内容

Top 20 Azure Resources to Monitor for Optimal Performance and Troubleshooting

Paul Nichols

Expert Azure Certified Senior Integration Specialist with 25+ years of experience delivering secure, efficient solutions for government and enterprise clients

发布日期: 2023年4月21日

Introduction

In the past, I was tasked with setting up a monitoring strategy for our Azure infrastructure. However, I struggled to find comprehensive information on the internet about the specifics, such as the most important logs to monitor and sensible thresholds for alerts. In this article, I'll share my findings and insights about the top Azure resources you should monitor, along with valuable logs to forward to Log Analytics, metric alerts to set up, and scheduled query alerts when necessary. For dynamic threshold alerts, you can start with low sensitivity to minimize false alarms, while static alerts will have recommended thresholds.

As I mentioned, finding this info was hard, so part of the reason for sharing is to gather other people's thoughts and opinions on the strategy and also if you disagree on any thresholds etc., thanks.

Resources

Virtual Machines

Logs to Forward to Log Analytics:

VMGuestAgent: Guest agent logs from the virtual machine.
VMWindowsEvent: Windows event logs from the virtual machine (for Windows VMs only).
VMLinuxSyslog: Linux syslog data from the virtual machine (for Linux VMs only).

Metric Alerts:

Percentage CPU (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.
Available Memory Bytes (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.
Disk Write Bytes/sec (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.

Scheduled Query Alerts: None

Azure App Services

Logs to Forward to Log Analytics:

AppServiceConsoleLogs: Console logs from the App Service.
AppServiceHTTPLogs: HTTP logs from the App Service.
AppServiceEnvironmentPlatformLogs: Environment platform logs from the App Service.
AppServiceAuditLogs: Audit logs from the App Service.
AppServiceIPSecAuditLogs: IPSec audit logs from the App Service.

Metric Alerts:

Http5xx: (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.
Percentage CPU (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
Memory usage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.

Scheduled Query Alerts:

Azure App Service performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure API Management

Logs to Forward to Log Analytics:

ApiManagementGatewayLogs: Gateway logs from Azure API Management.

Metric Alerts:

TotalRequests (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
RequestDuration (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
ErrorRate (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.

Scheduled Query Alerts:

Azure API Management performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Logic Apps

Logs to Forward to Log Analytics:

WorkflowRuntimeLogs: Runtime logs from Azure Logic Apps.

Metric Alerts:

RunsFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.
TriggersFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.
ActionsFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.

Scheduled Query Alerts:

Azure Logic Apps performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Service Bus

Logs to Forward to Log Analytics:

ServiceBusLogs: Azure Service Bus logs.

Metric Alerts:

ActiveMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
DeadletteredMessages (Static, > 10 messages): Monitor every 5 minutes, Alert if over 10 messages for 10 minutes.
ServerErrors (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.

Scheduled Query Alerts:

Azure Service Bus performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure SQL Databases

Logs to Forward to Log Analytics:

SQLInsights: SQL Server logs.
SqlSecurityAuditEvents: Azure SQL Audit logs.

Metric Alerts:

DtuConsumptionPercent (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
Deadlocks (Static, > 0 deadlocks): Monitor every 5 minutes, Alert if deadlock detected.
LongQueries (Static, > 5 minutes): Monitor every 5 minutes, Alert if queries run longer than 5 minutes.

Scheduled Query Alerts:

Azure SQL Database performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Storage Accounts

Logs to Forward to Log Analytics:

StorageRead
StorageWrite
StorageDelete

Metric Alerts:

Transactions (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
Egress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
Ingress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
Latency (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

Azure Storage Account access failure alerts (Monitor every 15 minutes, Alert if multiple failures detected)

Azure Cosmos DB

Logs to Forward to Log Analytics:

DataPlaneRequests

Metric Alerts:

TotalRequests (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
Http4xx (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes
Http5xx (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes
ConsumedWritePercentage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
ConsumedReadPercentage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

Azure Cosmos DB performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Functions

Logs to Forward to Log Analytics:

AppServiceConsoleLogs
AppServiceHTTPLogs

Metric Alerts:

FunctionExecutionCount (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
FunctionExecutionErrors (Static, > 10 errors): Monitor every 5 minutes, Alert if over 10 errors for 10 minutes
FunctionExecutionUnits (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

Azure Functions performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Kubernetes Service (AKS)

Logs to Forward to Log Analytics:

kube-apiserver
kube-controller-manager
kube-scheduler
kubelet
kube-proxy
kube-audit
container-logs

Metric Alerts:

kube_node_status_condition (Static, unhealthy pods): Monitor every 5 minutes, Alert if any unhealthy pods detected
container_cpu_usage_seconds_total (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
container_memory_usage_bytes (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

Azure Kubernetes Service performance alerts (Monitor every 5 minutes, Alert if multiple performance issues are detected)

Azure Event Hubs

领英推荐

Ignite 2024 Update

John Savill 4 个月前

Azure & .NET Digest #4: new VM Watch in Preview…

Victor Karabedyants 4 个月前

Navigating Cloud Strategy after Azure Central US…

Radu Vunvulea 8 个月前

Logs to Forward to Log Analytics:

ArchiveLog
OperationalLogs

Metric Alerts:

IncomingMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
OutgoingMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
ThrottledRequests (Static, > 10 throttled requests): Monitor every 5 minutes, Alert if over 10 throttled requests for 10 minutes

Scheduled Query Alerts:

Azure Event Hubs performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Redis Cache

Logs to Forward to Log Analytics:

CacheDiagnostics

Metric Alerts:

CacheHits (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
CacheMisses (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
UsedMemory (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
CacheEvictions (Static, > 10 evictions): Monitor every 5 minutes, Alert if over 10 evictions for 10 minutes

Scheduled Query Alerts:

Azure Redis Cache performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Virtual Networks

Logs to Forward to Log Analytics:

NetworkSecurityGroupEvent
NetworkSecurityGroupRuleCounter
NetworkWatcherFlowLog

Metric Alerts:

BytesInTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
BytesOutTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
PacketsInDropped (Static, > 100 dropped packets): Monitor every 5 minutes, Alert if over 100 dropped packets for 10 minutes
PacketsOutDropped (Static, > 100 dropped packets): Monitor every 5 minutes, Alert if over 100 dropped packets for 10 minutes

Scheduled Query Alerts:

Azure Virtual Network connectivity alerts (Monitor every 5 minutes, Alert if multiple connectivity issues detected)

Azure Active Directory

Logs to Forward to Log Analytics:

SignInLogs
AuditLogs

Metric Alerts:

FailedSignInAttempts (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes
UnusualSignInActivities (Static, activities detected): Monitor every 5 minutes, Alert if unusual activities detected
UserAccountLockouts (Static, > 5 lockouts): Monitor every 5 minutes, Alert if over 5 lockouts for 10 minutes

Scheduled Query Alerts:

Azure Active Directory security alerts (Monitor every 5 minutes, Alert if multiple security issues detected)

Azure Load Balancer

Logs to Forward to Log Analytics:

LoadBalancerAlert
LoadBalancerProbeHealthStatus

Metric Alerts:

DataPathBytesInTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
DataPathBytesOutTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
BackendConnectionErrors (Static, > 10 errors): Monitor every 5 minutes, Alert if over 10 errors for 10 minutes
HealthProbeStatus (Static, unhealthy backend pool members): Monitor every 5 minutes, Alert if any unhealthy backend pool members detected

Scheduled Query Alerts:

Azure Load Balancer performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure CDN

Logs to Forward to Log Analytics:

CDNAccessLogs

Metric Alerts:

CacheHitRatio (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if under the threshold for 30 minutes
RequestCount (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
TotalLatency (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

Azure CDN performance alerts (Monitor every 15 minutes, Alert if multiple performance issues are detected)

Azure VPN Gateway

Logs to Forward to Log Analytics:

VPN_Gateway_IKE_Logs
VPN_Gateway_IKE_Errors
VPN_Gateway_P2S_Logs

Metric Alerts:

TunnelStatus (Static, down tunnels): Monitor every 5 minutes, Alert if any down tunnels detected
BytesIn (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
BytesOut (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

Azure VPN Gateway connectivity alerts (Monitor every 5 minutes, Alert if multiple connectivity issues detected)

Azure Batch

Logs to Forward to Log Analytics:

BatchNodeLogs

Metric Alerts:

TaskFailures (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes
TaskExecutionTime (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
NodeFailures (Static, > 1 failure): Monitor every 5 minutes, Alert if over 1 failure for 10 minutes

Scheduled Query Alerts:

Azure Batch performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)

Azure Data Factory

Logs to Forward to Log Analytics:

ADFActivityRun
ADFPipelineRun

Metric Alerts:

FailedPipelineRuns (Static, > 5 failures): Monitor every 15 minutes, Alert if over 5 failures for 30 minutes
PipelineDuration (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
FailedActivityRuns (Static, > 10 failures): Monitor every 15 minutes, Alert if over 10 failures for 30 minutes

Scheduled Query Alerts:

Azure Data Factory performance alerts (Monitor every 15 minutes, Alert if multiple performance issues are detected)

Azure Data Lake Storage

Logs to Forward to Log Analytics:

StorageRead
StorageWrite
StorageDelete

Metric Alerts:

Egress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
Ingress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
Transactions (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

Azure Data Lake Storage access failure alerts (Monitor every 15 minutes, Alert if multiple failures detected)

Conclusion

In conclusion, setting up a comprehensive Azure monitoring strategy is essential for maintaining optimal performance and swiftly addressing any issues that arise. By focusing on the top 20 Azure resources outlined in this article, you'll be better equipped to proactively manage your infrastructure and minimize downtime. Additionally, leveraging Log Analytics and carefully configuring alert thresholds will enable you to stay informed without being flooded with notifications.

References

LinkedIn. (2023). How to Monitor and Alert on Log Data Metrics and Events. Link
Microsoft Learn. (2023). Create alerts with dynamic thresholds in Azure Monitor metric alerts. Link
Azure Monitor. (n.d.). Creating Metric Alerts for Logs in Azure Monitor. Link
Microsoft Learn. (n.d.). Resource logs categories in Azure Monitor. Link
Microsoft Learn. (n.d.). Supported metrics in Azure Monitor. Link

A special shoutout to my AI-powered writing buddy, GPT-Chat4, who assisted in crafting this article. Together, we're committed to bringing you the best in Azure news and coding tips! Thanks, GPT-Chat4

Anmol Buber

Australian Citizen | Microsoft Certified Azure Expert | DevOps | Integration Enthusiast | MS BizTalk

1 年

Very insightful Paul Nichols ... Love it

1 次回应

要查看或添加评论，请登录

Paul Nichols的更多文章

Azure Function App Health Monitoring: A Comprehensive Approach

2025年2月12日

Azure Function App Health Monitoring: A Comprehensive Approach

In today’s fast-paced production environments, ensuring the health and reliability of cloud-based applications like…

3 条评论
Exploring Minimal APIs in .NET 7: Advantages and Comparison with Traditional Web APIs

2023年5月7日

Exploring Minimal APIs in .NET 7: Advantages and Comparison with Traditional Web APIs

Introduction .NET 6 introduced Minimal APIs, a powerful and efficient alternative to traditional web APIs in ASP.
Safeguarding Azure Storage Accounts: Mitigating Risks with Azure Active Directory Authentication and Managed Service Identities

2023年4月13日

Safeguarding Azure Storage Accounts: Mitigating Risks with Azure Active Directory Authentication and Managed Service Identities

Introduction In today's dynamic cloud computing landscape, data security is of utmost importance. A recent "by-design"…
How to Use Policy Fragments to Simplify Your Azure API Management Policies

2023年4月12日

How to Use Policy Fragments to Simplify Your Azure API Management Policies

Introduction Azure API Management is a powerful tool that allows you to create, manage, and secure APIs. One of the key…

1 条评论
My approach to developing .Net Core Web APIs

2020年3月5日

My approach to developing .Net Core Web APIs

Recently I've been working with a company who are a little newer to developing Web APIs and come from more of a BizTalk…

10 条评论
Azure Dev/Test Labs for your next team environment?

2016年3月3日

Azure Dev/Test Labs for your next team environment?

My new article will run you through the features of the new free preview offering from Microsoft called Azure DevTest…

1 条评论
Azure AD: Securing your API with zero code

2015年12月18日

Azure AD: Securing your API with zero code

Read my new blog post and see how Microsoft now enable you to add authentication to your Azure APIs with zero code…

1 条评论

See all articles

Top 20 Azure Resources to Monitor for Optimal Performance and Troubleshooting

Paul Nichols

Expert Azure Certified Senior Integration Specialist with 25+ years of experience delivering secure, efficient solutions for government and enterprise clients

Introduction

Resources

Virtual Machines

Azure App Services

Azure API Management

Azure Logic Apps

Azure Service Bus

Azure SQL Databases

Azure Storage Accounts

Azure Cosmos DB

Azure Functions

Azure Kubernetes Service (AKS)

Azure Event Hubs

领英推荐

Azure Redis Cache

Azure Virtual Networks

Azure Active Directory

Azure Load Balancer

Azure CDN

Azure VPN Gateway

Azure Batch

Azure Data Factory

Azure Data Lake Storage

Conclusion

References

Paul Nichols的更多文章

社区洞察

其他会员也浏览了

Availability Options in Azure VM

Microsoft Azure Network APIs

Knowledge Base on Azure ExpressRoute

IBM i on IBM Power Virtual Server - Time saving tips #3

Create a Windows virtual machine in the Azure portal

Key Licensing and IT Updates from Microsoft, Broadcom, and IBM | December 2024

Azure Weekly Updates - December 11th, 2023

Azure Unleashed: This Week in Cloud (Feb 17 - Feb 24, 2024)

Azure Weekly Updates - February 19th, 2024

Introduction

Resources

Virtual Machines

Azure App Services

Azure API Management

Azure Logic Apps

Azure Service Bus

Azure SQL Databases

Azure Storage Accounts

Azure Cosmos DB

Azure Functions

Azure Kubernetes Service (AKS)

Azure Event Hubs

领英推荐

Azure Redis Cache

Azure Virtual Networks

Azure Active Directory

Azure Load Balancer

Azure CDN

Azure VPN Gateway

Azure Batch

Azure Data Factory

Azure Data Lake Storage

Conclusion

References

Paul Nichols的更多文章

Azure Function App Health Monitoring: A Comprehensive Approach

Exploring Minimal APIs in .NET 7: Advantages and Comparison with Traditional Web APIs

Safeguarding Azure Storage Accounts: Mitigating Risks with Azure Active Directory Authentication and Managed Service Identities

How to Use Policy Fragments to Simplify Your Azure API Management Policies

My approach to developing .Net Core Web APIs

Azure Dev/Test Labs for your next team environment?

Azure AD: Securing your API with zero code

社区洞察

其他会员也浏览了

Availability Options in Azure VM

Microsoft Azure Network APIs

Knowledge Base on Azure ExpressRoute

IBM i on IBM Power Virtual Server - Time saving tips #3

Create a Windows virtual machine in the Azure portal

Key Licensing and IT Updates from Microsoft, Broadcom, and IBM | December 2024

Azure Weekly Updates - December 11th, 2023

Azure Unleashed: This Week in Cloud (Feb 17 - Feb 24, 2024)

Azure Weekly Updates - February 19th, 2024