Investigating Performance Issues- Stick to the basics

Investigating Performance Issues- Stick to the basics

Incidents and Problems occur in every organization. To effectively address these, it is crucial to identify and resolve underlying cause of these problems to achieve both long-term and short-term improvements.

While there are several RCA techniques that can be applied to investigate simple to complex problems, when dealing with complex problems involving multiple components such as compute, storage, and network, it is extremely important to follow the basic rules. This approach produces reliable and comprehensive results.

Investigating system performance issues requires a structured approach to identify and address underlying causes. One of the critical steps is analyzing various metrics to identify bottlenecks – it involves looking for any patterns, anomalies, spikes, dips, or trends that could indicate performance bottlenecks.

When investigating performance issues resulting from capacity bottlenecks, relying solely on extended average utilization reports particularly those generated on a daily or weekly basis may be misleading as averages may mask important details and peaks that occur throughout the day or at specific times of the day, potentially leading to inaccurate assessments and misdiagnoses.

As a result, using granular or real-time reports is very important for a more accurate understanding of system performance. These granular or real-time reports provide a more detailed and accurate assessment of system performance, allowing you to more effectively identify and address capacity issues.

Here's why granular or real-time reports are valuable:

  • Identify Patterns: Real-time or granular data allows you to identify patterns, trends, and anomalies that might not be apparent in averaged data. Certain hours of the day or specific events might be causing the bottlenecks.
  • Anomaly Detection: Systems often experience varying levels of demand and usage throughout the day. Averages can smooth out these variations and hide critical peaks or that might be causing the performance issues.
  • Identify underlying cause: By examining data at smaller intervals, you can better isolate the root causes of performance issues.
  • Misinterpretation: Averages might oversimplify complex data, potentially leading to misunderstandings or incorrect conclusions about the underlying patterns.
  • Lack of Granularity: Averages may lack the granularity needed to identify specific patterns or trends that occur at different intervals.
  • Masking Outliers: Averages can mask significant peaks or spikes that could be significant for understanding the overall performance or identifying issues.
  • Capacity Planning: Detailed data helps with accurate capacity planning, as you can precisely determine the necessary resources during peak times and allocate resources more effectively or plan capacity effectively for the peak.

?

No alt text provided for this image
Chart-1

Based on the report (chart-1) detailing hourly data, there is a peak in resource demand from 9 AM to 11 AM, which likely corresponds to the business’s peak hours. However, when considering the overall daily average utilization, it comes to at 43%. This daily average is highly likely to mislead the investigation. Similarly examining the daily reports for the past two weeks doesn't provide clarity either, as the average daily utilization remains consistently below 50%, leading to misleading investigation.?

As a result, one must always rely on real-time reports, paying close attention to the times when users are more likely to complain about slow response.

?

要查看或添加评论,请登录

Wajahat Bashir的更多文章

社区洞察

其他会员也浏览了