CURT  -- The CPU Usage Reporting Tool

CURT -- The CPU Usage Reporting Tool

This article originally appeared in IBM Systems Magazine. The original paste up has been lost, so this is a posting of the edited, final article I submitted to the magazine.

CURT -- THE “CPU USAGE REPORTING TOOL”

 The AIX CPU Usage Reporting Tool -- or "CURT" – is a part of the bos.perf.tools fileset and ships with every AIX distribution. CURT adds great depth to your performance analysis arsenal and should be included in any deep dive into performance issues.

CURT reports are extracted from raw -- as opposed to formatted -- AIX kernel trace data, so you'll need to run a kernel trace and have the output available for this exercise. The trace data should also include performance information for every CPU in your system (and yes, you can use CURT to zero in on individual CPUs). There are about a million flag combinations to the trace tool; the sample syntax I give here will record information for every CPU, run for five seconds and write its data to the trace.raw file, then self-terminate. This is the syntax I use in my daily performance data collection activities (Note: To issue any of the commands presented in this article, you must be working as root.):

trace -a -fn -o trace.raw -T40M -L80M ; sleep 5 ; trcstop

We also need information about kernel extensions, shared libraries, processes and symbols that CURT uses. So, we need to create a “gensyms” file that stores this information. Creating your gensyms file is easy, just run this command:

gensyms > gensyms.out

This will take a while, so let it chug. Once you have your gensyms.out file, you're ready to extract a CURT report from your raw trace data. Over the years, I've relied on a combination of CURT flags that work well for me, which you'll see here. But by all means, use the flags that you feel most comfortable with and work best in your environment. Here is my preferred CURT extraction syntax:

curt -i trace.raw -n gensyms.out -o curt.all -Ptse

This syntax tells CURT to extract its information from the trace.raw file, use the gensyms.out file for, among other things, name-to-address mappings, write its data to the curt.all ("all" in this case meaning a report will be generated that covers activity on all logical processors), and include detailed process, thread, system call and elapsed time information for system calls and posix threads.

Now that you've generated your CURT report, let's take a look at some of the data it has produced. This first screen full of information is typical of most CURT reports:

Run on Wed Feb 17 14:40:00 2016

Command line was:

curt -i trace.out -n gensyms.out -o curt.all -Ptse

AIX trace file name = trace.out

AIX trace file size = 3053744

Fri Feb 12 17:18:51 2016

System: AIX 7.1 Node: lpar Machine: 00000000000

AIX trace file created = Fri Feb 12 17:18:51 2016

 

Command used to gather AIX trace was:

 trace -a -fn -o trace.out -T40M -L80M

 

                          System Summary

                    --------------

 processing      percent          percent

 total time   total time        busy time

    (msec) (incl. idle)     (excl. idle)    processing category

===========  ===========      ===========    ===================

      5.05         0.01        19.17       APPLICATION

      5.33         0.01        20.20       SYSCALL

      0.65         0.00         2.47       HCALL

      1.86         0.01         7.05       KPROC (excluding IDLE and NFS)

      0.00         0.00         0.00       NFS

     11.67         0.03        44.24        FLIH

      0.33         0.00         1.26       SLIH

      1.48         0.00         5.61       DISPATCH (all procs. incl. IDLE)

      0.71         0.00         2.69       IDLE DISPATCH (only IDLE proc.)

-----------   ----------      -------

     26.37         0.07       100.00 CPU(s) busy time

  36484.19        99.93               IDLE

-----------   ----------

  36510.56                             TOTAL

 

Avg. Thread Affinity = 0.98

Total Physical CPU time (msec) = 30087.32

Physical CPU percentage       = 99.89

First, we have header information that tells you things like the time and date CURT was run, the syntax used to extract the CURT report from the raw trace file -- as well as the trace syntax itself -- and the name of your system with its AIX level. You will then be presented with a system summary that shows you the averages of many important performance counters. You see the amount of time your CPUs spent handling application requests and servicing system calls, as well as the amount of time they spent handling Hypervisor calls. "Kproc" tells you how much time was spent executing kernel processes other than wait and NFS kprocs; NFS has its own summary which is next in the list. First and Second Level Interrupt Handlers (FLIH and SLIH) are next, followed by the amount of time the CPUs spent dispatching all threads. Idle Dispatch tells you how much time was spent dispatching the idle (or "wait") process. Since idle dispatch time obviously doesn't contribute to a CPU's busy time, it's broken out separately from the "dispatch" category. You then have several categories where all of these counters are summed. 

At this point, you should see how intuitive and self-explanatory CURT data is. It takes the study of just a few CURT reports to become proficient with the utility. As noted, I use a certain set of flags with CURT. Now we can look at these, along with some of their output:

1) The -P flag outputs detailed process information:

                  Application Summary (by Pid)

                    ----------------------------

   -- processing total (msec) --         -- percent of total processing time --

  combined application     syscall    combined application     syscall name (Pid)(Thread                                                                                                                            Count)

  ======== ===========     =======    ======== ===========     ======= ===================

    4.9167      2.2622      2.6545      0.0135      0.0062      0.0073 trcstop(5767334)(1)

    1.7363      0.7728      0.9634      0.0048      0.0021      0.0026 sleep(5767332)(1)

    1.0684      1.0684      0.0000      0.0029      0.0029      0.0000 getty(7078106)(1)

    0.7694      0.1377      0.6317      0.0021      0.0004      0.0017 ksh(7471348)(1)

The above screenshot is representative of several different versions of process data in the CURT report. Here, you see the name of your process with the number of threads it's spawned, along with the amount of time that process spent handling application and system calls. You also have that same time expressed in milliseconds; this is especially useful when you're trying to determine is a process is hogging your CPUs at the expense of other perhaps more deserving processes in your system.

2) The -t flag outputs detailed thread information:

-----------------------------------------------------------------------------

Report for Thread Id: 20775057 (hex 13d0091) Pid: 5767332 (hex 5800a4)

Process Name: sleep

---------------------

 Total Application Time (ms): 0.772834

 Total System Call Time (ms): 0.963434

 Total Hypervisor Call Time (ms): 0.000000

 

                          Thread System Call Summary

                          --------------------------

Count  Total Time Avg Time Min Time Max Time Tot ETime Avg ETime Min ETime Max ETime    SVC (Address)

             (msec)   (msec)   (msec)   (msec)    (msec)    (msec)    (msec)    (msec)

======== =========== ======== ======== ======== ========= ========= ========= =========       

18      0.2913   0.0162   0.0006   0.1045    0.3085    0.0171    0.0006    0.1217__loadx(3248380)

 1      0.2591   0.2591   0.2591   0.2591    0.2591    0.2591    0.2591    0.2591 _exit(32389e8)

 1      0.2363   0.2363   0.2363   0.2363    0.2502    0.2502    0.2502    0.2502 execve(32341b8)

.....lines omitted .....

Here, you're presented with detailed info on your working threads, such as their name, the time they spent handling application and system calls and the timings for those individual calls expressed in milliseconds. Thread details are rather lengthy, so I've had to omit some of their data. But thread details also tell you about hypervisor calls the thread made, any errors encountered while handling system calls (which are displayed when you invoke the -s flag) and the all-important affinity counters. The -e flag, which is also represented here, gives us all the elapsed time counters.

This is just a small sampling of the performance data a CURT report provides. Experiment with the other flags; try them in different combinations. Don't worry, you can't break anything. All of the actual processing has already been done by the AIX kernel trace utility.

CURT is invaluable for CPU evaluations. I'm certain once you use it a few times, you'll include itasa matter of routine in all your deep performance evaluations.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了