MustGather information for high CPU problems

The documentation required to diagnose high CPU problems with web server processes includes

  • the results of attempting different types of requests
  • backtraces of the high CPU process
  • system call traces of the high CPU process
  • web server and plug-in configuration files
  • web server and plug-in log files

    The ServerDoc tool provided with ihsdiag automates much of the work of gathering this information. The user runs ServerDoc and provides the IHS installation directory and other information; ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.

    Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.

    Special concerns for threaded processes on older Linux distributions
    Customers may be using single-threaded IHS 1.3 on older Linux distributions such as RedHat 7.x. If third-party modules are loaded which create threads in the IHS processes, this high CPU tool may not be able to analyze those processes due to limitations of the system tools gdb and strace. In addition to this high CPU tool, also run the hang documentation tool, as it may be able to retrieve additional information. Submit both the high CPU doc and hang doc to IHS support.

    identifying high CPU processes

    Operating systems provide special programs for displaying the CPU usage of processes running on the system. These programs include ps, top, topas, nmon, and others. When a high CPU problem is suspected, one of these programs should be used to determine which processes are actively consuming high CPU. If a web server process is the highest consumer of CPU, documentation about the suspected problem should be submitted to IBM HTTP Server support for analysis.

    Because ps is universally available, the example below will use ps to show how to identify the high CPU process.

    ps invocations which show percent CPU

    Platform ps invocation
    AIX
    ps -A -o pid,ppid,pcpu,time,args
    
    Linux
    ps -eo pid,ppid,pcpu,time,fname,cmd
    
    Solaris
    ps -A -o pid,ppid,pcpu,time,comm
    
    HP-UX
    UNIX95=1 ps -A -o pid,ppid,pcpu,time,args
    

    example of using ps to find high CPU processes

    Substitute the appropriate ps command for your platform.

    # ps -A -o pid,ppid,pcpu,time,args
      PID  PPID  %CPU        TIME COMMAND
        0     0   0.0    00:07:56 swapper
        1     0   0.0    00:01:18 /etc/init
      516     0  96.4 89-10:46:02 wait
      774     0   0.0    00:00:09 reaper
     1032     0   0.0    00:00:15 xmgc
     1290     0   0.0    00:00:27 netm
     1548     0   0.0    00:24:25 gil
     1806     0   0.0    00:00:00 wlmsched
     2850     0   0.0    00:00:00 lvmbb
     3170  8258   0.0    00:00:00 /usr/sbin/sshd -D
     3410     1   0.9    19:02:01 /usr/sbin/syncd 60
     3650     1   0.0    00:00:00 /usr/dt/bin/dtlogin -daemon
     3964     1   0.0    00:00:00 /usr/lib/errdemon
     4148  3650   0.0    00:00:00 dtlogin <:0>        -daemon
     4420  3650   0.0    00:07:31 /usr/lpp/X11/bin/X -D /usr/lib/X11//rgb  -T -force :0 -auth /var/dt/A:0-svd
     5160     1   0.0    00:00:00 /usr/ccs/bin/shlap
     5422     1   0.0    00:00:00 /usr/sbin/srcmstr
     5962  5422   0.0    00:05:03 /opt/freeware/sbin/prngd -f /dev/egd-pool -m 666
     6210  5962          00:00:00 <defunct>
     6456  5962          00:00:00 <defunct>
     6714  5962          00:00:00 <defunct>
     7004     1   0.0    00:00:00 /usr/sbin/biod 8
     7226  5422   0.0    00:00:03 /usr/sbin/syslogd
     7482  5422   0.0    00:00:32 sendmail: accepting connections
     7740  5422   0.0    00:00:22 /usr/sbin/portmap
     7998  5422   0.0    00:00:00 /usr/sbin/inetd
     8258  5422   0.0    00:03:50 /usr/sbin/sshd -D
     8516     0   0.0    00:00:01 rtcmd
     8772     0   0.0    00:00:00 kbiod
     9064     1   0.0    00:00:00 /usr/sbin/uprintfd
     9300     1   0.0    00:00:00 /usr/sbin/rpc.statd
     9564     1   0.0    00:00:00 /usr/sbin/rpc.lockd
     9820     1   0.0    00:00:20 /usr/sbin/cron
    10078  5422   0.0    00:00:01 /usr/sbin/qdaemon
    10320     1   0.0    00:06:52 rpc.lockd
    10578     0   0.0    00:00:00 cash
    10836     1   0.0    00:00:26 aump
    11094  5422   0.0    00:00:14 /usr/sbin/automountd
    11384  5422   0.0    00:00:00 /usr/sbin/writesrv
    12400     1   0.0    00:00:00 /usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.c
    13434  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.AuditRMd
    13692     1   0.0    00:00:00 /usr/sbin/getty /dev/console
    13946  5422   0.0    00:00:01 /bin/sh /etc/gsa_updmn
    14450     1   0.0    00:00:00 /usr/sbin/getty /dev/lft0
    14988  5422   0.0    00:00:20 /usr/sbin/rsct/bin/rmcd -r
    15224  5422   0.0    00:22:29 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    15484 15224   0.0    00:00:00 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    15968 13946   0.0    00:00:00 sleep 14400
    16278  7998   0.0    00:00:00 rpc.ttdbserver 100083 1
    16888  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.ERrmd
    17050  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.ServiceRMd
    19404     1   0.0    00:00:00 frca
    20298 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    21772  4148   0.2    02:23:00 dtgreet
    23438     1   0.0    00:00:00 /usr/lpp/diagnostics/bin/diagd
    23920  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.CSMAgentRMd
    24432  5422   0.0    00:00:05 /usr/sbin/rsct/bin/ctcasd
    25988 45448   0.0    00:00:00 emacs ph/ihsdiag/ServerDoc/Platform.java
    28506 25988   0.0    00:00:00 /opt/freeware/lib/emacs/20.7/powerpc-ibm-aix4.3.3.0/emacsserver
    31872 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    32442 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    32662  3170   0.0    00:00:00 /usr/sbin/sshd -D
    33448     1   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    35482 33448  40.0    00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    37564 38136   0.0    00:00:00 -bash
    37854 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    38136 42606   0.0    00:00:00 /usr/sbin/sshd -D
    39222 37564   0.0    00:00:00 ps -A -o pid,ppid,pcpu,time,args
    42606  8258   0.0    00:00:00 /usr/sbin/sshd -D
    45270 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    45448 32662   0.0    00:00:00 -bash
    

    The two processes highlighted are the only ones with a high current CPU utilization (third column in the ps output). This example is from AIX, where wait is a special process which represents all idle CPU on the system. So we have to ignore the wait process. This leaves process 35482, which is an IBM HTTP Server process.

    Some confirmation that 35482 is actively consuming high amounts of CPU can be done by examining the forth column in the ps output over time.

    # ps -A -o pid,ppid,pcpu,time,args | grep 35482
    35482 33448  96.2    00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    
    (wait 10 seconds or so)
    
    # ps -A -o pid,ppid,pcpu,time,args | grep 35482
    35482 33448  96.1    00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    
    (wait 10 seconds or so)
    
    [trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482
    35482 33448  96.6    00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
    

    The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below.

    making sure required support programs are available

    Please refer to these instructions for verifying that required support programs are installed.

    running the tool

    Run the tool as root to avoid any permissions problems with obtaining backtraces or reading files, such as log files and configuration files. (More information about the requirement to run this tool as root is available here.)

    ServerDoc is passed in four parameters for gathering high CPU documentation:

    1. GatherHighCpuDoc
    2. the name of the IHS installation directory (e.g., /usr/HTTPServer)
    3. the process id of the high CPU process to examine
    4. the address of a non-SSL port handled by the web server (e.g., 127.0.0.1:80), or "-" if there is no non-SSL port
    # java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80
    

    Redhat Enterprise Linux 3.0 and SuSE SLES 9 - Some performance tools on these platforms may identify specific thread "pids" for high CPU. The high CPU tool cannot currently accept thread pids on these levels of Linux. Be sure to find the true process id for the entire process with high CPU, and specify that true process id on the invocation of the high CPU tool. The sample ps invocations shown above will only show real process ids which can be used with the high CPU tool.

    The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory.

    determining the value of the non-SSL address parameter

    If the IHS installation only supports SSL, then use - for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.

    Use the following table to determine the value of the non-SSL address parameter based on the form of a non-SSL Listen directive used in your configuration:

    Listen directive looks like this use this for address parameter
    (no non-SSL ports) -
    Listen 80 127.0.0.1:80
    Listen port 127.0.0.1:port
    Listen 192.168.1.15:80 192.168.1.15:80
    Listen ipaddress:port ipaddress:port
    Listen myhostname:80 myhostname:80

    a sample run

    For this example, IHS is installed in /usr/IBMIHS-1.3.28, the high CPU process is 35482, the non-SSL port can be reached from the web server machine on address 127.0.0.1:8080, and ihsdiag was unpacked into directory /root/ihsdiag-1.3.4.

    # java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \
    35482 127.0.0.1:8080
    Tracing process for 10 seconds...
    
    Seconds remaining before gathering information again:
    60...54...48...42...36...30...24...18...12...6...
    
    Tracing process for 10 seconds...
    
    Seconds remaining before gathering information again:
    30...27...24...21...18...15...12...9...6...3...
    
    Tracing process for 10 seconds...
    
    Reports, log files, and configuration files have been saved to directory
      HighCpuDoc.200501201122
    If you have additional log files or configuration files, copy them there
    before packing up the directory.
    Web server log and conf files other than the default will have to be copied manually.
    WebSphere plug-in conf and log files will have to be copied manually.
    
    Hint for packing up the directory:
      tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
      gzip HighCpuDoc.200501201122.tar
    [trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/
    total 11479
    -rw-r--r--   1 root     system        32986 Jan 20 11:22 access_log
    -rw-r--r--   1 root     system         7129 Jan 20 11:22 apachectl
    -rw-r--r--   1 root     system         2251 Jan 20 11:22 error_log
    -rw-r--r--   1 root     system       662246 Jan 20 11:22 httpd
    -rw-r--r--   1 root     system        22835 Jan 20 11:22 httpd.conf
    -rw-r--r--   1 root     system       197962 Jan 20 11:24 log
    -rw-r--r--   1 root     system         1624 Jan 20 11:24 report
    -rw-r--r--   1 root     system      1641481 Jan 20 11:22 trace.0
    -rw-r--r--   1 root     system      1642385 Jan 20 11:24 trace.1
    -rw-r--r--   1 root     system      1650769 Jan 20 11:24 trace.2
    -rwxr-xr-x   1 root     system         1442 Jan 20 11:22 traceprocess.sh
    

    copying other web server and plug-in files

    The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used:

  • any IHS configuration file other than httpd.conf in the IHS install directory
  • any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs
  • the WebSphere plug-in configuration file
  • the WebSphere plug-in log file

    saving the documentation directory

    The last step is to pack up and compress the documentation directory using tar and gzip. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the tar and gzip commands to use.

    a sample run

    # tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
    # gzip HighCpuDoc.200501201122.tar
    

    understanding the root requirement

    When gathering information on high CPU problems, the tool must attach to a live web server process to obtain information about the state of that process.

    If the web server is started as root, then that process will be owned by root or by the web server user id (e.g., nobody or www). Only root has the authority to attach to any of the web server processes, so it is easiest if the tool itself is run as root. If the web server administrator does not have authority to log in or switch user to root, a simple script can be created to gather the high CPU documentation, and the system administrator can give the web server administrator sudo access to that script. sudo is a third-party tool available without cost for all appropriate platforms.

    If the web server is not started as root, there are no such concerns, and the high CPU documentation tool may be run by the user id which starts the web server.

    If the tool is run as non-root and it is unable to gather the required information, the problem will have to be recreated. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.