Top 20 Azure Resources to Monitor for Optimal Performance and Troubleshooting

Top 20 Azure Resources to Monitor for Optimal Performance and Troubleshooting

Introduction

In the past, I was tasked with setting up a monitoring strategy for our Azure infrastructure. However, I struggled to find comprehensive information on the internet about the specifics, such as the most important logs to monitor and sensible thresholds for alerts. In this article, I'll share my findings and insights about the top Azure resources you should monitor, along with valuable logs to forward to Log Analytics, metric alerts to set up, and scheduled query alerts when necessary. For dynamic threshold alerts, you can start with low sensitivity to minimize false alarms, while static alerts will have recommended thresholds.

As I mentioned, finding this info was hard, so part of the reason for sharing is to gather other people's thoughts and opinions on the strategy and also if you disagree on any thresholds etc., thanks.


Resources

Virtual Machines

Logs to Forward to Log Analytics:

  • VMGuestAgent: Guest agent logs from the virtual machine.
  • VMWindowsEvent: Windows event logs from the virtual machine (for Windows VMs only).
  • VMLinuxSyslog: Linux syslog data from the virtual machine (for Linux VMs only).

Metric Alerts:

  • Percentage CPU (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • Available Memory Bytes (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • Disk Write Bytes/sec (Dynamic, Low sensitivity): Warning, monitor every 5 minutes, Alert if over the threshold for 10 minutes.

Scheduled Query Alerts: None


Azure App Services

Logs to Forward to Log Analytics:

  • AppServiceConsoleLogs: Console logs from the App Service.
  • AppServiceHTTPLogs: HTTP logs from the App Service.
  • AppServiceEnvironmentPlatformLogs: Environment platform logs from the App Service.
  • AppServiceAuditLogs: Audit logs from the App Service.
  • AppServiceIPSecAuditLogs: IPSec audit logs from the App Service.

Metric Alerts:

  • Http5xx: (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.
  • Percentage CPU (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • Memory usage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.

Scheduled Query Alerts:

  • Azure App Service performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure API Management

Logs to Forward to Log Analytics:

  • ApiManagementGatewayLogs: Gateway logs from Azure API Management.

Metric Alerts:

  • TotalRequests (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • RequestDuration (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • ErrorRate (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.

Scheduled Query Alerts:

  • Azure API Management performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Logic Apps

Logs to Forward to Log Analytics:

  • WorkflowRuntimeLogs: Runtime logs from Azure Logic Apps.

Metric Alerts:

  • RunsFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.
  • TriggersFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.
  • ActionsFailed (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes.

Scheduled Query Alerts:

  • Azure Logic Apps performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Service Bus

Logs to Forward to Log Analytics:

  • ServiceBusLogs: Azure Service Bus logs.

Metric Alerts:

  • ActiveMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • DeadletteredMessages (Static, > 10 messages): Monitor every 5 minutes, Alert if over 10 messages for 10 minutes.
  • ServerErrors (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes.

Scheduled Query Alerts:

  • Azure Service Bus performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure SQL Databases

Logs to Forward to Log Analytics:

  • SQLInsights: SQL Server logs.
  • SqlSecurityAuditEvents: Azure SQL Audit logs.

Metric Alerts:

  • DtuConsumptionPercent (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes.
  • Deadlocks (Static, > 0 deadlocks): Monitor every 5 minutes, Alert if deadlock detected.
  • LongQueries (Static, > 5 minutes): Monitor every 5 minutes, Alert if queries run longer than 5 minutes.

Scheduled Query Alerts:

  • Azure SQL Database performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Storage Accounts

Logs to Forward to Log Analytics:

  • StorageRead
  • StorageWrite
  • StorageDelete

Metric Alerts:

  • Transactions (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • Egress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • Ingress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • Latency (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

  • Azure Storage Account access failure alerts (Monitor every 15 minutes, Alert if multiple failures detected)


Azure Cosmos DB

Logs to Forward to Log Analytics:

  • DataPlaneRequests

Metric Alerts:

  • TotalRequests (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • Http4xx (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes
  • Http5xx (Static, > 1% error rate): Monitor every 5 minutes, Alert if over 1% error rate for 10 minutes
  • ConsumedWritePercentage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • ConsumedReadPercentage (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

  • Azure Cosmos DB performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Functions

Logs to Forward to Log Analytics:

  • AppServiceConsoleLogs
  • AppServiceHTTPLogs

Metric Alerts:

  • FunctionExecutionCount (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • FunctionExecutionErrors (Static, > 10 errors): Monitor every 5 minutes, Alert if over 10 errors for 10 minutes
  • FunctionExecutionUnits (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

  • Azure Functions performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Kubernetes Service (AKS)

Logs to Forward to Log Analytics:

  • kube-apiserver
  • kube-controller-manager
  • kube-scheduler
  • kubelet
  • kube-proxy
  • kube-audit
  • container-logs

Metric Alerts:

  • kube_node_status_condition (Static, unhealthy pods): Monitor every 5 minutes, Alert if any unhealthy pods detected
  • container_cpu_usage_seconds_total (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • container_memory_usage_bytes (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

  • Azure Kubernetes Service performance alerts (Monitor every 5 minutes, Alert if multiple performance issues are detected)


Azure Event Hubs

Logs to Forward to Log Analytics:

  • ArchiveLog
  • OperationalLogs

Metric Alerts:

  • IncomingMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • OutgoingMessages (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • ThrottledRequests (Static, > 10 throttled requests): Monitor every 5 minutes, Alert if over 10 throttled requests for 10 minutes

Scheduled Query Alerts:

  • Azure Event Hubs performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Redis Cache

Logs to Forward to Log Analytics:

  • CacheDiagnostics

Metric Alerts:

  • CacheHits (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • CacheMisses (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • UsedMemory (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • CacheEvictions (Static, > 10 evictions): Monitor every 5 minutes, Alert if over 10 evictions for 10 minutes

Scheduled Query Alerts:

  • Azure Redis Cache performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Virtual Networks

Logs to Forward to Log Analytics:

  • NetworkSecurityGroupEvent
  • NetworkSecurityGroupRuleCounter
  • NetworkWatcherFlowLog

Metric Alerts:

  • BytesInTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • BytesOutTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • PacketsInDropped (Static, > 100 dropped packets): Monitor every 5 minutes, Alert if over 100 dropped packets for 10 minutes
  • PacketsOutDropped (Static, > 100 dropped packets): Monitor every 5 minutes, Alert if over 100 dropped packets for 10 minutes

Scheduled Query Alerts:

  • Azure Virtual Network connectivity alerts (Monitor every 5 minutes, Alert if multiple connectivity issues detected)


Azure Active Directory

Logs to Forward to Log Analytics:

  • SignInLogs
  • AuditLogs

Metric Alerts:

  • FailedSignInAttempts (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes
  • UnusualSignInActivities (Static, activities detected): Monitor every 5 minutes, Alert if unusual activities detected
  • UserAccountLockouts (Static, > 5 lockouts): Monitor every 5 minutes, Alert if over 5 lockouts for 10 minutes

Scheduled Query Alerts:

  • Azure Active Directory security alerts (Monitor every 5 minutes, Alert if multiple security issues detected)


Azure Load Balancer

Logs to Forward to Log Analytics:

  • LoadBalancerAlert
  • LoadBalancerProbeHealthStatus

Metric Alerts:

  • DataPathBytesInTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • DataPathBytesOutTotal (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • BackendConnectionErrors (Static, > 10 errors): Monitor every 5 minutes, Alert if over 10 errors for 10 minutes
  • HealthProbeStatus (Static, unhealthy backend pool members): Monitor every 5 minutes, Alert if any unhealthy backend pool members detected

Scheduled Query Alerts:

  • Azure Load Balancer performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure CDN

Logs to Forward to Log Analytics:

  • CDNAccessLogs

Metric Alerts:

  • CacheHitRatio (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if under the threshold for 30 minutes
  • RequestCount (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • TotalLatency (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

  • Azure CDN performance alerts (Monitor every 15 minutes, Alert if multiple performance issues are detected)


Azure VPN Gateway

Logs to Forward to Log Analytics:

  • VPN_Gateway_IKE_Logs
  • VPN_Gateway_IKE_Errors
  • VPN_Gateway_P2S_Logs

Metric Alerts:

  • TunnelStatus (Static, down tunnels): Monitor every 5 minutes, Alert if any down tunnels detected
  • BytesIn (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • BytesOut (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes

Scheduled Query Alerts:

  • Azure VPN Gateway connectivity alerts (Monitor every 5 minutes, Alert if multiple connectivity issues detected)


Azure Batch

Logs to Forward to Log Analytics:

  • BatchNodeLogs

Metric Alerts:

  • TaskFailures (Static, > 10 failures): Monitor every 5 minutes, Alert if over 10 failures for 10 minutes
  • TaskExecutionTime (Dynamic, Low sensitivity): Monitor every 5 minutes, Alert if over the threshold for 10 minutes
  • NodeFailures (Static, > 1 failure): Monitor every 5 minutes, Alert if over 1 failure for 10 minutes

Scheduled Query Alerts:

  • Azure Batch performance alerts (Monitor every 5 minutes, Alert if multiple performance issues detected)


Azure Data Factory

Logs to Forward to Log Analytics:

  • ADFActivityRun
  • ADFPipelineRun

Metric Alerts:

  • FailedPipelineRuns (Static, > 5 failures): Monitor every 15 minutes, Alert if over 5 failures for 30 minutes
  • PipelineDuration (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • FailedActivityRuns (Static, > 10 failures): Monitor every 15 minutes, Alert if over 10 failures for 30 minutes

Scheduled Query Alerts:

  • Azure Data Factory performance alerts (Monitor every 15 minutes, Alert if multiple performance issues are detected)


Azure Data Lake Storage

Logs to Forward to Log Analytics:

  • StorageRead
  • StorageWrite
  • StorageDelete

Metric Alerts:

  • Egress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • Ingress (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes
  • Transactions (Dynamic, Low sensitivity): Monitor every 15 minutes, Alert if over the threshold for 30 minutes

Scheduled Query Alerts:

  • Azure Data Lake Storage access failure alerts (Monitor every 15 minutes, Alert if multiple failures detected)


Conclusion

In conclusion, setting up a comprehensive Azure monitoring strategy is essential for maintaining optimal performance and swiftly addressing any issues that arise. By focusing on the top 20 Azure resources outlined in this article, you'll be better equipped to proactively manage your infrastructure and minimize downtime. Additionally, leveraging Log Analytics and carefully configuring alert thresholds will enable you to stay informed without being flooded with notifications.


References

  1. LinkedIn. (2023). How to Monitor and Alert on Log Data Metrics and Events. Link
  2. Microsoft Learn. (2023). Create alerts with dynamic thresholds in Azure Monitor metric alerts. Link
  3. Azure Monitor. (n.d.). Creating Metric Alerts for Logs in Azure Monitor. Link
  4. Microsoft Learn. (n.d.). Resource logs categories in Azure Monitor. Link
  5. Microsoft Learn. (n.d.). Supported metrics in Azure Monitor. Link


A special shoutout to my AI-powered writing buddy, GPT-Chat4, who assisted in crafting this article. Together, we're committed to bringing you the best in Azure news and coding tips! Thanks, GPT-Chat4

Anmol Buber

Australian Citizen | Microsoft Certified Azure Expert | DevOps | Integration Enthusiast | MS BizTalk

1 年

Very insightful Paul Nichols ... Love it

要查看或添加评论,请登录

Paul Nichols的更多文章

社区洞察

其他会员也浏览了