Performance Testing Cheatsheet - Diagnosing Server Congestion
When faced with diagnosing performance issues with windows-based servers (especially when perfmon stats are easily available), it pays dividends to start off with the biggest bang-for-buck items. I use a memory-jogging mnemonic to help remember what they are: Most Probably Developer Negligence (MPDN, short for Memory, Processor, Disk and Network).
For each of these focus areas, I usually get most benefit from:
Memory
- Available MBytes - This is the amount of RAM accessible to the server. The lower this is, the more the server needs to write data to a disk-based page file, which can be many hundreds of times slower than accessing it from RAM. If this is under 10Mb available and page file % usage is high, this generally indicates the amount of server memory is insufficient.
- Pages/sec - This is related to the number of references to the page file, where the server references memory that has had to be swapped out of RAM and stored on the page file hard disk, so this should be used in conjunction with the Available MBytes counter.
- Page Faults/sec - Similar to the Pages/sec counter, this is a count of Pages/sec served from the page file on disk but also includes "pseudo" page faults where the memory is still in RAM, but has been moved to a lower priority area or is being referenced and shared by another process.
- In addition to these, Pool Nonpaged Bytes increasing over time can be a good indicator of memory-leaky applications.
All of these counters should give a clear indication of any memory constraints on the windows-based server.
Processor
The big two counters here are % Processor Time and the Processor Queue Length [System], which when looked at together give a clear view of how busy the server CPU(s) are. Anything consistently over 80-85% Processor Time should be a red flag (or at least an amber one), and when the Processor Queue Length exceeds 2 (per CPU) is indicative of processor congestion. High counts here suggest the CPU is working hard (but remember to view along with the previous memory counters, as the CPU could be stressing due to lack of memory).
If things are truly processor congested, then it is possible to drill down on individual Process and Thread counters to determine if there any in particular that are problematic.
Disk
Physical Disk Avg. Disk Read Queue Length and Avg. Disk Write Queue Length, the same as for the Processor Queue Length counter, should be less than 2. If these are 2 or greater, especially when Physical Disk % Disk Time is around 50% or more shows that the server is spending an awful lot of time reading/writing to slow physical media. If the disk is reaching capacity (> 80% used) then consider a defrag or increasing the disk size, or both.
Network
The Network Interface Current Bandwidth counter shows the network interface bits per second rate (BPS), and in conjunction with the Bytes Total/sec, Bytes Sent/sec and Bytes Read/sec, gives an indication of how congested the network is. Don't fall into the trap of assuming things should be good if the Current Bandwidth and Bytes Total/sec have some headroom - the network can sometimes be a performance constraint when the bandwidth/usage ratio is as low as 50% as collisions, retransmit requests and discard errors increase, causing additional network load.
In summary, there is no silver bullet to diagnosing server-side performance issues, but if you remember that most of your performance testing issues are Most Probably Developer Negligence
you'll have a head start on where to start looking to diagnose server performance issues. ;-)
Head of Business Intelligence
9 年Nice work sir !