The documentation required to diagnose high CPU problems with web server processes includes
The ServerDoc tool provided with ihsdiag automates much of the work of gathering this information. The user runs ServerDoc and provides the IHS installation directory and other information; ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.
Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.
Special concerns for threaded processes on older Linux distributions |
Customers may be using single-threaded IHS 1.3 on older Linux
distributions such as RedHat 7.x. If third-party modules are loaded
which create threads in the IHS processes, this high CPU tool may not
be able to analyze those processes due to limitations of the system
tools gdb and strace . In addition to this
high CPU tool, also run the hang
documentation tool, as it may be able to retrieve additional
information. Submit both the high CPU doc and hang doc to IHS
support. |
Operating systems provide special programs for displaying the CPU
usage of processes running on the system. These programs include
ps
, top
, topas
,
nmon
, and others. When a high CPU problem is suspected,
one of these programs should be used to determine which processes are
actively consuming high CPU. If a web server process is the highest
consumer of CPU, documentation about the suspected problem should be
submitted to IBM HTTP Server support for analysis.
Because ps
is universally available, the example below
will use ps to show how to identify the high CPU process.
ps
invocations which show percent CPUPlatform | ps invocation |
AIX |
ps -A -o pid,ppid,pcpu,time,args |
Linux |
ps -eo pid,ppid,pcpu,time,fname,cmd |
Solaris |
ps -A -o pid,ppid,pcpu,time,comm |
HP-UX |
UNIX95=1 ps -A -o pid,ppid,pcpu,time,args |
ps
to find high CPU processesSubstitute the appropriate ps
command for your
platform.
# ps -A -o pid,ppid,pcpu,time,args PID PPID %CPU TIME COMMAND 0 0 0.0 00:07:56 swapper 1 0 0.0 00:01:18 /etc/init 516 0 96.4 89-10:46:02 wait 774 0 0.0 00:00:09 reaper 1032 0 0.0 00:00:15 xmgc 1290 0 0.0 00:00:27 netm 1548 0 0.0 00:24:25 gil 1806 0 0.0 00:00:00 wlmsched 2850 0 0.0 00:00:00 lvmbb 3170 8258 0.0 00:00:00 /usr/sbin/sshd -D 3410 1 0.9 19:02:01 /usr/sbin/syncd 60 3650 1 0.0 00:00:00 /usr/dt/bin/dtlogin -daemon 3964 1 0.0 00:00:00 /usr/lib/errdemon 4148 3650 0.0 00:00:00 dtlogin <:0> -daemon 4420 3650 0.0 00:07:31 /usr/lpp/X11/bin/X -D /usr/lib/X11//rgb -T -force :0 -auth /var/dt/A:0-svd 5160 1 0.0 00:00:00 /usr/ccs/bin/shlap 5422 1 0.0 00:00:00 /usr/sbin/srcmstr 5962 5422 0.0 00:05:03 /opt/freeware/sbin/prngd -f /dev/egd-pool -m 666 6210 5962 00:00:00 <defunct> 6456 5962 00:00:00 <defunct> 6714 5962 00:00:00 <defunct> 7004 1 0.0 00:00:00 /usr/sbin/biod 8 7226 5422 0.0 00:00:03 /usr/sbin/syslogd 7482 5422 0.0 00:00:32 sendmail: accepting connections 7740 5422 0.0 00:00:22 /usr/sbin/portmap 7998 5422 0.0 00:00:00 /usr/sbin/inetd 8258 5422 0.0 00:03:50 /usr/sbin/sshd -D 8516 0 0.0 00:00:01 rtcmd 8772 0 0.0 00:00:00 kbiod 9064 1 0.0 00:00:00 /usr/sbin/uprintfd 9300 1 0.0 00:00:00 /usr/sbin/rpc.statd 9564 1 0.0 00:00:00 /usr/sbin/rpc.lockd 9820 1 0.0 00:00:20 /usr/sbin/cron 10078 5422 0.0 00:00:01 /usr/sbin/qdaemon 10320 1 0.0 00:06:52 rpc.lockd 10578 0 0.0 00:00:00 cash 10836 1 0.0 00:00:26 aump 11094 5422 0.0 00:00:14 /usr/sbin/automountd 11384 5422 0.0 00:00:00 /usr/sbin/writesrv 12400 1 0.0 00:00:00 /usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.c 13434 5422 0.0 00:00:08 /usr/sbin/rsct/bin/IBM.AuditRMd 13692 1 0.0 00:00:00 /usr/sbin/getty /dev/console 13946 5422 0.0 00:00:01 /bin/sh /etc/gsa_updmn 14450 1 0.0 00:00:00 /usr/sbin/getty /dev/lft0 14988 5422 0.0 00:00:20 /usr/sbin/rsct/bin/rmcd -r 15224 5422 0.0 00:22:29 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg 15484 15224 0.0 00:00:00 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg 15968 13946 0.0 00:00:00 sleep 14400 16278 7998 0.0 00:00:00 rpc.ttdbserver 100083 1 16888 5422 0.0 00:00:08 /usr/sbin/rsct/bin/IBM.ERrmd 17050 5422 0.0 00:00:08 /usr/sbin/rsct/bin/IBM.ServiceRMd 19404 1 0.0 00:00:00 frca 20298 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 21772 4148 0.2 02:23:00 dtgreet 23438 1 0.0 00:00:00 /usr/lpp/diagnostics/bin/diagd 23920 5422 0.0 00:00:08 /usr/sbin/rsct/bin/IBM.CSMAgentRMd 24432 5422 0.0 00:00:05 /usr/sbin/rsct/bin/ctcasd 25988 45448 0.0 00:00:00 emacs ph/ihsdiag/ServerDoc/Platform.java 28506 25988 0.0 00:00:00 /opt/freeware/lib/emacs/20.7/powerpc-ibm-aix4.3.3.0/emacsserver 31872 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 32442 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 32662 3170 0.0 00:00:00 /usr/sbin/sshd -D 33448 1 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 35482 33448 40.0 00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 37564 38136 0.0 00:00:00 -bash 37854 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 38136 42606 0.0 00:00:00 /usr/sbin/sshd -D 39222 37564 0.0 00:00:00 ps -A -o pid,ppid,pcpu,time,args 42606 8258 0.0 00:00:00 /usr/sbin/sshd -D 45270 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 45448 32662 0.0 00:00:00 -bash
The two processes highlighted are the only ones with a high current
CPU utilization (third column in the ps
output).
This example is from AIX, where wait
is a
special process which represents all idle CPU on the system. So we have
to ignore the wait
process. This leaves process 35482, which is an
IBM HTTP Server process.
Some confirmation that 35482 is actively consuming high amounts of
CPU can be done by examining the forth column in the ps
output over time.
# ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.2 00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 (wait 10 seconds or so) # ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.1 00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 (wait 10 seconds or so) [trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.6 00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below.
Please refer to these instructions for verifying that required support programs are installed.
Run the tool as root
to avoid any permissions problems
with obtaining backtraces or reading files, such as log files and
configuration files. (More information about the requirement to run
this tool as root
is available here.)
ServerDoc is passed in four parameters for gathering high CPU documentation:
GatherHighCpuDoc
# java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80
Redhat Enterprise Linux 3.0 and SuSE SLES 9 - Some
performance tools on these platforms may identify specific thread
"pids" for high CPU. The high CPU tool cannot currently accept thread
pids on these levels of Linux. Be sure to find the true process id for
the entire process with high CPU, and specify that true process id on
the invocation of the high CPU tool. The sample |
The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory.
If the IHS installation only supports SSL, then use - for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.
Use the following table to determine the value of the non-SSL
address parameter based on the form of a non-SSL Listen
directive used in your configuration:
Listen directive looks like this | use this for address parameter |
(no non-SSL ports) | - |
Listen 80 |
127.0.0.1:80 |
Listen port |
127.0.0.1:port |
Listen 192.168.1.15:80 |
192.168.1.15:80 |
Listen ipaddress:port |
ipaddress:port |
Listen myhostname:80 |
myhostname:80 |
For this example, IHS is installed in /usr/IBMIHS-1.3.28
,
the high CPU process is 35482, the non-SSL port can be
reached from the web server machine on address
127.0.0.1:8080
, and ihsdiag was unpacked into directory
/root/ihsdiag-1.3.4
.
# java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \ 35482 127.0.0.1:8080 Tracing process for 10 seconds... Seconds remaining before gathering information again: 60...54...48...42...36...30...24...18...12...6... Tracing process for 10 seconds... Seconds remaining before gathering information again: 30...27...24...21...18...15...12...9...6...3... Tracing process for 10 seconds... Reports, log files, and configuration files have been saved to directory HighCpuDoc.200501201122 If you have additional log files or configuration files, copy them there before packing up the directory. Web server log and conf files other than the default will have to be copied manually. WebSphere plug-in conf and log files will have to be copied manually. Hint for packing up the directory: tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122 gzip HighCpuDoc.200501201122.tar [trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/ total 11479 -rw-r--r-- 1 root system 32986 Jan 20 11:22 access_log -rw-r--r-- 1 root system 7129 Jan 20 11:22 apachectl -rw-r--r-- 1 root system 2251 Jan 20 11:22 error_log -rw-r--r-- 1 root system 662246 Jan 20 11:22 httpd -rw-r--r-- 1 root system 22835 Jan 20 11:22 httpd.conf -rw-r--r-- 1 root system 197962 Jan 20 11:24 log -rw-r--r-- 1 root system 1624 Jan 20 11:24 report -rw-r--r-- 1 root system 1641481 Jan 20 11:22 trace.0 -rw-r--r-- 1 root system 1642385 Jan 20 11:24 trace.1 -rw-r--r-- 1 root system 1650769 Jan 20 11:24 trace.2 -rwxr-xr-x 1 root system 1442 Jan 20 11:22 traceprocess.sh
The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used:
The last step is to pack up and compress the documentation directory using tar and gzip. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the tar and gzip commands to use.
# tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122 # gzip HighCpuDoc.200501201122.tar
root
requirementIf the web server is started as root
, then that
process will be owned by root
or by the web server user
id (e.g., nobody
or www
). Only
root
has the authority to attach to any of the web
server processes, so it is easiest if the tool itself is run as
root
. If the web server administrator does not have
authority to log in or switch user to root
, a simple
script can be created to gather the high CPU documentation, and the
system administrator can give the web server administrator
sudo
access to that script. sudo
is a
third-party tool available without cost for all appropriate
platforms.
If the web server is not started as root
, there are no
such concerns, and the high CPU documentation tool may be run by the
user id which starts the web server.
If the tool is run as non-root
and it is unable to
gather the required information, the problem will have to be
recreated. It may not be possible to determine if this problem
occurred until the documentation has been analyzed by IBM HTTP Server
support.