Performance Testing Cheatsheet - Diagnosing Server Congestion

Performance Testing Cheatsheet - Diagnosing Server Congestion

When faced with diagnosing performance issues with windows-based servers (especially when perfmon stats are easily available), it pays dividends to start off with the biggest bang-for-buck items. I use a memory-jogging mnemonic to help remember what they are: Most Probably Developer Negligence (MPDN, short for Memory, Processor, Disk and Network).

For each of these focus areas, I usually get most benefit from:

Memory

  • Available MBytes - This is the amount of RAM accessible to the server. The lower this is, the more the server needs to write data to a disk-based page file, which can be many hundreds of times slower than accessing it from RAM. If this is under 10Mb available and page file % usage is high, this generally indicates the amount of server memory is insufficient.
  • Pages/sec - This is related to the number of references to the page file, where the server references memory that has had to be swapped out of RAM and stored on the page file hard disk, so this should be used in conjunction with the Available MBytes counter.
  • Page Faults/sec - Similar to the Pages/sec counter, this is a count of Pages/sec served from the page file on disk but also includes "pseudo" page faults where the memory is still in RAM, but has been moved to a lower priority area or is being referenced and shared by another process.
  • In addition to these, Pool Nonpaged Bytes increasing over time can be a good indicator of memory-leaky applications.

All of these counters should give a clear indication of any memory constraints on the windows-based server.

Processor

The big two counters here are % Processor Time and the Processor Queue Length [System], which when looked at together give a clear view of how busy the server CPU(s) are. Anything consistently over 80-85% Processor Time should be a red flag (or at least an amber one), and when the Processor Queue Length exceeds 2 (per CPU) is indicative of processor congestion. High counts here suggest the CPU is working hard (but remember to view along with the previous memory counters, as the CPU could be stressing due to lack of memory).

If things are truly processor congested, then it is possible to drill down on individual Process and Thread counters to determine if there any in particular that are problematic.

Disk

Physical Disk Avg. Disk Read Queue Length and Avg. Disk Write Queue Length, the same as for the Processor Queue Length counter, should be less than 2. If these are 2 or greater, especially when Physical Disk % Disk Time is around 50% or more shows that the server is spending an awful lot of time reading/writing to slow physical media. If the disk is reaching capacity (> 80% used) then consider a defrag or increasing the disk size, or both.

Network

The Network Interface Current Bandwidth counter shows the network interface bits per second rate (BPS), and in conjunction with the Bytes Total/sec, Bytes Sent/sec and Bytes Read/sec, gives an indication of how congested the network is. Don't fall into the trap of assuming things should be good if the Current Bandwidth and Bytes Total/sec have some headroom - the network can sometimes be a performance constraint when the bandwidth/usage ratio is as low as 50% as collisions, retransmit requests and discard errors increase, causing additional network load.

In summary, there is no silver bullet to diagnosing server-side performance issues, but if you remember that most of your performance testing issues are Most Probably Developer Negligence

you'll have a head start on where to start looking to diagnose server performance issues. ;-)

Darrell Butler

Head of Business Intelligence

9 年

Nice work sir !

回复

要查看或添加评论,请登录

Chris Jones的更多文章

  • Performance is not horsepower alone

    Performance is not horsepower alone

    I’m always banging on about performance being fit-for-purpose rather than just outright speed. At AccessHQ, we’re…

    3 条评论
  • Why do we never learn in IT?

    Why do we never learn in IT?

    I honestly can’t think of any other major industry which consistently over-spends, under-delivers and repeats the same…

  • Think you'd make a good Performance Tester?

    Think you'd make a good Performance Tester?

    “Some men aren't looking for anything logical, like money. They can't be bought, bullied, reasoned, or negotiated with.

  • Performance Is Not Just Speed

    Performance Is Not Just Speed

    A lot of people think performance in IT systems is all about speed. Questions like; “How fast does it go?” and “What’s…

    2 条评论
  • Performance Testing - are you peering into darkness?

    Performance Testing - are you peering into darkness?

    Within big IT projects, often performance tests run with few infrastructure monitors in place, if at all. It's not…

    2 条评论
  • Performance Testing Averages, 90th percentiles or Avg-90%?

    Performance Testing Averages, 90th percentiles or Avg-90%?

    As a performance tester, my role is to ensure my clients clearly understand how their systems perform under load. To…

  • Performance is not just reliability and availability

    Performance is not just reliability and availability

    There's no truer maxims in the world of complex IT systems than Murphy's Law, and its corollary Finagle's Law. These…

  • Why Testing is like Book Publishing

    Why Testing is like Book Publishing

    Testing software is a complex and difficult thing. There are so many opportunities for issues to arise - from major…

  • Open All Hours. 23? x 7; 357 Days a Year

    Open All Hours. 23? x 7; 357 Days a Year

    In today's online market, to coin a phrase, time is money. So, availability and performance are king, right? Well…

  • Life in the (not-so) Fast Lane - Part II

    Life in the (not-so) Fast Lane - Part II

    Following on from my last post -- Life in the (not-so) Fast Lane -- I spent some time thinking a little more about the…

社区洞察

其他会员也浏览了