MustGather information for high CPU problems

Provide feedback on the IBM HTTP Server forum on IBM developerWorks.

The documentation required to diagnose high CPU problems with web server processes includes

The ServerDoc tool provided with ihsdiag automates much of the work of gathering this information. The user runs ServerDoc and provides the IHS installation directory and other information; ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory. This collector also supports IBM Edge Caching Proxy (ibmproxy, WTE) since version 1.4.19

Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.

This tool uses native tools such as strace and truss to obtain system call traces, which include the contents of buffers used to read and write data from the network.

Known issues to check for first

High CPU on 8.0.0.13 and and 8.5.5.11

PI70829 can cause high CPU with logHandshakeError() or log_handshake_transcript() in the backtrace. The 8.5.5.11 and 8.0.0.13 interim fixes for PI76874 include a fix for this (PI76757).

High CPU on z/OS before 8.0.0.13, 8.5.5.11, and 9.0.0.2 due to PI68803

Prior to 8.0.0.13, 8.5.5.11, and 9.0.0.2, IHS on z/OS may use increasing CPU as individual processes age. The CPU usage may show up in BPXVRCAN (async I/O cancel), and BPX4AIO (async I/O poll) operations or the msgrcv() call. This is PI68803.

High CPU with IHS 8.0 or later on AIX

There are known issues in the ICC (IBM Crypto for C-Language) that is part of the GSKit bundled with earlier versions of IBM HTTP Server 8.0 and 8.5.5 that can result in high CPU, delays, or (frontend) timeouts. This is typically noticed by users migrating from IHS 7.0 on AIX or Linux/PPC.

The issues are resolved by GSKit 8.0.50.17 provided by PI09443, which is shipped in IHS 8.0.0.9 or 8.5.5.2. So, the problem no longer exists in those and higher versions, and the additional configuration described below is not required for those.

The recommended solution for earlier versions of IBM HTTP Server is to either upgrade to a minimum of IHS fixpack versions 8.0.0.9 or 8.5.5.2, or to upgrade the GSKit to a minimum version of 8.0.50.17 using the PI09443 interim fix.
NOTE: It is always recommended to use the latest available GSKit. To determine the latest recommended GSKit, you you can refer to the 'Comments' section for your fixpack version in the Recommended fixes for IBM HTTP Server document.

If you are experiencing the problem symptoms described above, but cannot apply IHS fixpacks 8.0.0.9, 8.5.5.2 or newer, nor published interim fix PI09443 or newer providing at least GSKit 8.0.50.17, then you can complete the steps below:

If FIPSEnable is required by your configuration, the above fixes will use a cryptographic library that is NOT  FIPS certified. The protocols and ciphers will however still be restricted to the protocols and ciphers deemed acceptable under FIPS140-2, but the implementation has not (yet) been certified. We currently have no timeframe for when a FIPS certified GSKit with the fixes will be available in an IHS fixpack, as it depends on external FIPS certification. When it is available, this document will be updated with the APAR number.

High CPU with SSL and 8.0.0.4 or earlier

Disable TLS compression: ssl_questions.html#compress

High CPU with the WebSphere Plugin on 6.0.2.33, 6.0.2.35, 6.1.0.23, 6.1.0.25, or 7.0.0.3

If you're using this level of the WebSphere Plugin, and have HTTPS transports configured, upgrade to a level including PK85105 before proceeding.

High CPU with SSL on Sun UltraSparc T1/T2 (Niagra) systems

GSKit v7 is inefficient on the Niagra architecture, but small RSA keysizes may help performance. This is a permanent limitation in IBM HTTP Server 7.0 and earlier. IHS 8.0 and later are more efficient on this platform, but PKCS11 off may still be necessary for suitable performance.

Hardware cryptographic offload, a part of the hardware platform, is documented here. Symmetric cryptographic offload should additionally be enabled with SSLAttributeSet 417 549 on IBM HTTP Server 7.0 and later if supported by the particular level of Niagra hardware/Sotware.

mod_rewrite related performance issues

Recursive regular expression captures

A very rare (usually accidental) pattern used in regular expressions can use excessive CPU. These are most easily identified by searching the configuration file for a repetition character immediately following a closing parenthesis. If a regex matches this pattern, refactor it to not repeat the capture itself.

grep -F ')+' httpd.conf
grep -F ')*' httpd.conf

Excessive mod_rewrite evaluation

Having many hundreds or thousands of mod_rewrite directives evaluated for common requests can be detrimental to performance, even moreso if they are specified in .htaccess.

Limit evaluation by defining them in <VirtualHost> context, using the 'L' flag in early rules to avoid further processing for frequent requests, and making matches in the first parameter of RewriteRule "fail fast".

High CPU with SSL

  1. Verify that SSL sessions are being reused.
  2. Verify that the number of handshakes hasn't increased due to a config change.
    The same SSL sessions documentation referenced above also contains information on counting handshakes. Count for both current and prior configuration for comparison.
  3. Verify that ThreadsPerChild has not been increased far beyond the default of 25.
  4. Verify that a supported, modern OS is being used.
  5. Assess the impact of using larger SSL keys
  6. Assess whether a more computationally expensive cipher is being used (such as 3DES).
    The negotiated cipher can be logged using %{HTTPS_CIPHER}e as described in the ihsdiag ciphers documentation.
  7. Verify sufficient dedicated CPU's are available. Avoid over-subscribed virtual CPU's or very few course for dozens of concurrent handshakes per second.
  8. Review access log to make sure there is a healthy mix of 304 responses for static content, otherwise needless CPU time is spent on encrypting content.

Other configuration issues or suggestions that relate to high CPU

Identifying high CPU processes

Special concerns for threaded processes on older Linux distributions
Customers may be using single-threaded IHS 1.3 on older Linux distributions such as RedHat 7.x. If third-party modules are loaded which create threads in the IHS processes, this high CPU tool may not be able to analyze those processes due to limitations of the system tools gdb and strace. In addition to this high CPU tool, also run the hang documentation tool, as it may be able to retrieve additional information. Submit both the high CPU doc and hang doc to IHS support.

Operating systems provide special programs for displaying the CPU usage of processes running on the system. These programs include ps, top, topas, nmon, and others. When a high CPU problem is suspected, one of these programs should be used to determine which processes are actively consuming high CPU. If a web server process is the highest consumer of CPU, documentation about the suspected problem should be submitted to IBM HTTP Server support for analysis.

While ps is universally available, it averages CPU usage across the lifetime of the process. A very old process that begins to use high CPU will often not show high CPU in ps output. Using "top" or "topas" to show processes with high CPU over some short (1 second) interval is usually best. If you cannot find a high CPU process via something like "top" or "topas", the example below demonstrates how to identify the high CPU process with ps.

ps invocations which show percent CPU

Platform ps invocation
AIX, z/OS
ps -A -o pid,ppid,pcpu,time,args
Linux
ps -eo pid,ppid,pcpu,time,fname,cmd
Solaris
ps -A -o pid,ppid,pcpu,time,comm
HP-UX
UNIX95=1 ps -A -o pid,ppid,pcpu,time,args

example of using ps to find high CPU processes

Substitute the appropriate ps command for your platform.

# ps -A -o pid,ppid,pcpu,time,args
  PID  PPID  %CPU        TIME COMMAND
    0     0   0.0    00:07:56 swapper
    1     0   0.0    00:01:18 /etc/init
  516     0  96.4 89-10:46:02 wait
  774     0   0.0    00:00:09 reaper
 1032     0   0.0    00:00:15 xmgc
 1290     0   0.0    00:00:27 netm
 1548     0   0.0    00:24:25 gil
 1806     0   0.0    00:00:00 wlmsched
 2850     0   0.0    00:00:00 lvmbb
 3170  8258   0.0    00:00:00 /usr/sbin/sshd -D
 3410     1   0.9    19:02:01 /usr/sbin/syncd 60
 3650     1   0.0    00:00:00 /usr/dt/bin/dtlogin -daemon
 3964     1   0.0    00:00:00 /usr/lib/errdemon
 4148  3650   0.0    00:00:00 dtlogin <:0>        -daemon
 4420  3650   0.0    00:07:31 /usr/lpp/X11/bin/X -D /usr/lib/X11//rgb  -T -force :0 -auth /var/dt/A:0-svd
 5160     1   0.0    00:00:00 /usr/ccs/bin/shlap
 5422     1   0.0    00:00:00 /usr/sbin/srcmstr
 5962  5422   0.0    00:05:03 /opt/freeware/sbin/prngd -f /dev/egd-pool -m 666
 6210  5962          00:00:00 <defunct>
 6456  5962          00:00:00 <defunct>
 6714  5962          00:00:00 <defunct>
 7004     1   0.0    00:00:00 /usr/sbin/biod 8
 7226  5422   0.0    00:00:03 /usr/sbin/syslogd
 7482  5422   0.0    00:00:32 sendmail: accepting connections
 7740  5422   0.0    00:00:22 /usr/sbin/portmap
 7998  5422   0.0    00:00:00 /usr/sbin/inetd
 8258  5422   0.0    00:03:50 /usr/sbin/sshd -D
 8516     0   0.0    00:00:01 rtcmd
 8772     0   0.0    00:00:00 kbiod
 9064     1   0.0    00:00:00 /usr/sbin/uprintfd
 9300     1   0.0    00:00:00 /usr/sbin/rpc.statd
 9564     1   0.0    00:00:00 /usr/sbin/rpc.lockd
 9820     1   0.0    00:00:20 /usr/sbin/cron
10078  5422   0.0    00:00:01 /usr/sbin/qdaemon
10320     1   0.0    00:06:52 rpc.lockd
10578     0   0.0    00:00:00 cash
10836     1   0.0    00:00:26 aump
11094  5422   0.0    00:00:14 /usr/sbin/automountd
11384  5422   0.0    00:00:00 /usr/sbin/writesrv
12400     1   0.0    00:00:00 /usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.c
13434  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.AuditRMd
13692     1   0.0    00:00:00 /usr/sbin/getty /dev/console
13946  5422   0.0    00:00:01 /bin/sh /etc/gsa_updmn
14450     1   0.0    00:00:00 /usr/sbin/getty /dev/lft0
14988  5422   0.0    00:00:20 /usr/sbin/rsct/bin/rmcd -r
15224  5422   0.0    00:22:29 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
15484 15224   0.0    00:00:00 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
15968 13946   0.0    00:00:00 sleep 14400
16278  7998   0.0    00:00:00 rpc.ttdbserver 100083 1
16888  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.ERrmd
17050  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.ServiceRMd
19404     1   0.0    00:00:00 frca
20298 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
21772  4148   0.2    02:23:00 dtgreet
23438     1   0.0    00:00:00 /usr/lpp/diagnostics/bin/diagd
23920  5422   0.0    00:00:08 /usr/sbin/rsct/bin/IBM.CSMAgentRMd
24432  5422   0.0    00:00:05 /usr/sbin/rsct/bin/ctcasd
25988 45448   0.0    00:00:00 emacs ph/ihsdiag/ServerDoc/Platform.java
28506 25988   0.0    00:00:00 /opt/freeware/lib/emacs/20.7/powerpc-ibm-aix4.3.3.0/emacsserver
31872 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
32442 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
32662  3170   0.0    00:00:00 /usr/sbin/sshd -D
33448     1   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
35482 33448  40.0    00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
37564 38136   0.0    00:00:00 -bash
37854 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
38136 42606   0.0    00:00:00 /usr/sbin/sshd -D
39222 37564   0.0    00:00:00 ps -A -o pid,ppid,pcpu,time,args
42606  8258   0.0    00:00:00 /usr/sbin/sshd -D
45270 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
45448 32662   0.0    00:00:00 -bash

The two processes highlighted are the only ones with a high current CPU utilization (third column in the ps output). This example is from AIX, where wait is a special process which represents all idle CPU on the system. So we have to ignore the wait process. This leaves process 35482, which is an IBM HTTP Server process.

Some confirmation that 35482 is actively consuming high amounts of CPU can be done by examining the forth column in the ps output over time.

# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.2    00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

(wait 10 seconds or so)

# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.1    00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

(wait 10 seconds or so)

[trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.6    00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below.

Making sure required support programs are available

Please refer to these instructions for verifying that required support programs are installed.

Running the tool

Note: This executable mustgather is not used on Windows nor on z/OS.

Run the tool as root to avoid any permissions problems with obtaining backtraces or reading files, such as log files and configuration files. (More information about the requirement to run this tool as root is available here.)

ServerDoc is passed in four parameters for gathering high CPU documentation:

  1. GatherHighCpuDoc
  2. the name of the IHS installation directory (e.g., /usr/HTTPServer)
  3. the process id of the high CPU process to examine
  4. the address of a non-SSL port handled by the web server (e.g., 127.0.0.1:80), or "-" if there is no non-SSL port
# java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80

Redhat Enterprise Linux 3.0 and SuSE SLES 9 - Some performance tools on these platforms may identify specific thread "pids" for high CPU. The high CPU tool cannot currently accept thread pids on these levels of Linux. Be sure to find the true process id for the entire process with high CPU, and specify that true process id on the invocation of the high CPU tool. The sample ps invocations shown above will only show real process ids which can be used with the high CPU tool.

The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory.

determining the value of the non-SSL address parameter

If the IHS installation only supports SSL, then use - for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.

Use the following table to determine the value of the non-SSL address parameter based on the form of a non-SSL Listen directive used in your configuration:

Listen directive looks like this use this for address parameter
(no non-SSL ports) -
Listen 80 127.0.0.1:80
Listen port 127.0.0.1:port
Listen 192.168.1.15:80 192.168.1.15:80
Listen ipaddress:port ipaddress:port
Listen myhostname:80 myhostname:80

a sample run

For this example, IHS is installed in /usr/IBMIHS-1.3.28, the high CPU process is 35482, the non-SSL port can be reached from the web server machine on address 127.0.0.1:8080, and ihsdiag was unpacked into directory /root/ihsdiag-1.3.4.

# java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \
35482 127.0.0.1:8080
Tracing process for 10 seconds...

Seconds remaining before gathering information again:
60...54...48...42...36...30...24...18...12...6...

Tracing process for 10 seconds...

Seconds remaining before gathering information again:
30...27...24...21...18...15...12...9...6...3...

Tracing process for 10 seconds...

Reports, log files, and configuration files have been saved to directory
  HighCpuDoc.200501201122
If you have additional log files or configuration files, copy them there
before packing up the directory.
Web server log and conf files other than the default will have to be copied manually.
WebSphere plug-in conf and log files will have to be copied manually.

Hint for packing up the directory:
  tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
  gzip HighCpuDoc.200501201122.tar
[trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/
total 11479
-rw-r--r--   1 root     system        32986 Jan 20 11:22 access_log
-rw-r--r--   1 root     system         7129 Jan 20 11:22 apachectl
-rw-r--r--   1 root     system         2251 Jan 20 11:22 error_log
-rw-r--r--   1 root     system       662246 Jan 20 11:22 httpd
-rw-r--r--   1 root     system        22835 Jan 20 11:22 httpd.conf
-rw-r--r--   1 root     system       197962 Jan 20 11:24 log
-rw-r--r--   1 root     system         1624 Jan 20 11:24 report
-rw-r--r--   1 root     system      1641481 Jan 20 11:22 trace.0
-rw-r--r--   1 root     system      1642385 Jan 20 11:24 trace.1
-rw-r--r--   1 root     system      1650769 Jan 20 11:24 trace.2
-rwxr-xr-x   1 root     system         1442 Jan 20 11:22 traceprocess.sh

Copying other web server and plug-in files

The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used:

Saving the documentation directory

The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.

a sample run

# tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
# gzip HighCpuDoc.200501201122.tar

The resulting compressed file is the file to send to IBM support.

Understanding the root requirement

When gathering information on high CPU problems, the tool must attach to a live web server process to obtain information about the state of that process.

If the web server is started as root, then that process will be owned by root or by the web server user id (e.g., nobody or www). Only root has the authority to attach to any of the web server processes, so it is easiest if the tool itself is run as root. If the web server administrator does not have authority to log in or switch user to root, a simple script can be created to gather the high CPU documentation, and the system administrator can give the web server administrator sudo access to that script. sudo is a third-party tool available without cost for all appropriate platforms.

If the web server is not started as root, there are no such concerns, and the high CPU documentation tool may be run by the user id which starts the web server.

If the tool is run as non-root and it is unable to gather the required information, the problem will have to be recreated. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.