Provide feedback on the IBM HTTP Server forum on IBM developerWorks.
The documentation required to diagnose child process crashes includes
Obtaining and installing the collector, ihsdiag, is documented here
If core dumps are not being saved for the child process crashes, the first step is to perform any necessary operating system and web server configuration so that core dumps are saved. Core dump configuration information is described here.
When a core dump is available, the ServerDoc tool provided with ihsdiag automates much of the work of gathering and formatting the required documentation. The user runs ServerDoc and provides the IHS installation directory and the path to the core file, and ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.
Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.
Note: If IBM HTTP Server has been upgraded to a newer maintenance level since the core dump was generated, the core dump needs to be reproduced with the new level of product code. Otherwise, the crash information will be incorrect since the core dump and the product won't match.
In addition to submitting the documentation described below, we also recommend enabling mod_whatkilledus and mod_backtrace so that key information about each subsequent crash is recorded in the web server error log. This provides additional insight into the crashes without requiring that the steps outlined in this document be followed for each and every crash.
These modules are not supported with very old maintenance levels of
IBM HTTP Server. Check the Supported server versions
section
in the documentation for each module to confirm that the module works
with your level of IBM HTTP Server.
AIX APAR IZ99394 (sysroutes IZ44282 IZ48935 IZ95001 IZ99394 IV13061 IV13834) can cause a system crash running any networking software, such as IBM HTTP Server. Crashes will be in net_kmem_rmlist/net_malloc or related AIX OS code.
64-bit IHS builds on AIX mistakenly shipped with a default MAXDATA setting in bin/envvars that limits overall heap size to around 2GB. While this does not cause a leak, it can turn virtual address space size growth into memory allocation failures (or sometimes, crashes) The line should be commented out on 64-bit IHS installs that use a non-default ThreadsPerChild or otherwise have high heap memory requirements. Be sure that the userid that invokes apachectl has 'ulimit -d unlimited' in their environment, as this rlimit also caps max heap size.
If you have this symptom (OOM and ~1.75GB core file) but a low ThreadsPerChild directive, consider
setting MaxRequestsPerChild 10000
in addition to the bin/envvars fix above.
A similar problem can occur if ulimit -d
is set to anything other than unlimited.
Exceeding 2000 ThreadsPerChild
puts any 32-bit server
into risk for exhausting all address space available in a single process.
When no more memory is addressable, allocations will begin to fail and usually
result in crashes.
2000 is not a magic number, and the exact limits on address space vary by system just as exact address space usage varies by configuration and workload.
GSKit prior to 7.0.4.48/8.0.5.17 can crash or corrupt memory under load.
Circumvention: Set SSLAttributeSet 445 1
in each context with SSLEnable if you cannot move
to 8.0.0.5, 8.5.0.1, or apply interim fix for PM72915.
Will also occur under later releases when SSLCompression ON
is configured before
GSKit is updated to 8.0.14.24 or later.
SSLPKCSDriver
is used, it's probably related to the symptom.
See cryptohw.html for certified adapters and possible debugging tips.
SSLAcceleratorDisable
globally. If this makes the crashes go away,
IHS was unexpectedly using a legacy interface on a modern SSL co-processor and you should remain
in this configuration.
PrimaryServers
tag which causes a runtime crash in the WebServer Plugin.
A CrashDoc
will report a crash string
including <listGetHead<serverGroupGetFirstPrimaryServer<
Make sure required Solaris AF_UNIX fixes have been applied, using one of the patches below or equivalent:
If crashes occur after apachectl restart
or
apachectl graceful
on AIX 5.2, check for the following
LoadModule directives in the configuration file (uncommented):
LoadModule dav_module modules/mod_dav.so ... LoadModule dav_fs_module modules/mod_dav_fs.so
A child process crash can occur after a web server restart on AIX 5.2 if these are enabled. Other affected modules are mod_cache/mod_mem_cache, mod_proxy, and mod_cache however these are not enabled by default.
Perform all of the steps below if your OS is AIX 5.2:
LDR_CNTRL
directive in the
IHSROOT/bin/envvars
file to include
@IGNOREUNLOAD
in the value, as in the following
example
LDR_CNTRL="MAXDATA=0x60000000@IGNOREUNLOAD"
Simply add @IGNOREUNLOAD
to the end of the current
value of MAXDATA
.
Stop IHS then start it again to activate the configuration change. It will not be activated across a restart.
# LoadModule dav_module modules/mod_dav.so ... # LoadModule dav_fs_module modules/mod_dav_fs.so
Stop IHS then start it again to activate the configuration change. A restart is not sufficient due to the nature of this problem.
AIX APAR IY78080 resolves the problem for AIX 5.3 This APAR fix is not available for AIX 5.2, so one of the configuration changes described above must be used.
The most common cause of a SIGBUS crash on these platforms is that a file is truncated while the web server is trying to send it to a client. Some file replacement methods cause the existing file to be truncated and then the new contents written, instead of writing the new contents to a temporary file and then renaming to the proper name.
If you have static files served from IHS which can be modified in place, try EnableMMap Off to see if the problem is resolved.
Note: On Solaris, many other types of crashes result in SIGBUS.
For U40xx or S0C4 abend in LE CELQLIB at httpd child process termination, check for applicability of LE APAR PK34252.
For a S0C4 abend in ATOI at IHS startup with LE trace enabled, check for applicability of LE APAR PK81097.
The PHP manual recommends against using PHP in a multithreaded web server; see "Why shouldn't I use Apache2 with a threaded MPM in a production environment?".
IHS 2.0.42 and higher is multithreaded on all platforms. (IHS 1.3 is multithreaded only on Windows or with certain third-party modules.)
Thread safety problems in PHP applications or third-party libraries referenced by PHP can cause crashes in a threaded web server. The recommended solution is to configure PHP as a FastCGI application and use mod_fastcgi to communicate with it.
On Linux, child process crashes can occur due to address space exhaustion when large numbers of threads are used with the default thread stack size.
A thread stack size of 128KB is sufficient for IBM HTTP Server and the
WebSphere plug-in; however, the system default is typically 8MB or
larger. With the system default and large values for
ThreadsPerChild
, most of the address space can be
consumed by thread stacks. For example, with
ThreadsPerChild
set to 512 and a stack size of 8MB, 2GB
of the address space will be consumed by thread stacks. Memory
allocations during request processing can then exceed the address
space limit, typically 3GB, and result in crashes in arbitrary
components of the webserver.
The system default can be displayed by ulimit -s (or 8MB if the value is 'unlimited')
With high values for ThreadsPerChild
, the ThreadStackSize
directive should be used to specify a much smaller stack size, as in
the following example:
# Default to 128Kb stack size ThreadStackSize 131072
Third-party modules may require a larger thread stack size. We recommend setting it to 256KB when third-party modules are used, unless the vendor is able to specify the exact requirement.
(IBM HTTP Server 2.0 and above)
If a crash occurs while processing apachectl stop
,
apachectl graceful
, or apachectl restart
and
then the crash may be resolved by reversing the order of the LoadModule directives for mod_ibm_ssl and the WebSphere plug-in:
LoadModule
ibm_ssl_module modules/mod_ibm_ssl.so
LoadModule ibm_ssl_module
modules/mod_ibm_ssl.so
) to the bottom of httpd.conf
This problem is resolved with plugin APAR PK57529.
If the http-plugin.log has reached 2GB, remove or truncate it and restart IBM HTTP Server
If you are experiencing crashes while using crypto hardware then refer to the information in the Cryptographic accelerator Questions and Answers / Things to check first section
If none of the known issues above are responsible for the crash, proceed on
to collecting the CrashDoc
mustgather.
A core dump and related information is critical for diagnosing the cause of child process crashes. Without the information, IBM support is limited to suggesting that the customer move to the current level of fixes. With the information, IBM support anticipates being able to make the following initial determination:
In cases where an IBM component crashed, the information often contains enough information to address the root cause of previously unknown problems. Even when the root cause cannot be determined from a particular core dump, the information is used to decide the next step.
In cases where a third-party component crashed, the vendor of that component will need to investigate further; IBM support is unable to diagnose problems in third-party components.
Please refer to these instructions for verifying that required support programs are installed.
Run the tool as root
to avoid any permissions problems
with reading the core file or other files, such as log files and
configuration files. (More information about the requirement to run
this tool as root
is available here.)
ServerDoc is passed three parameters for gathering crash documentation:
GatherCrashDoc
# java -jar ServerDoc.jar GatherCrashDoc /path/to/IHS /path/to/corefile
The tool creates a new directory which contains a timestamp in the name, and the crash documentation will be saved in that directory.
For this example, IHS is installed in /usr/HTTPServer
,
the core dump was written to /tmp/core
, and ihsdiag was
unpacked into /root/ihsdiag-1.1.0
# cd /tmp # java -jar /root/ihsdiag-1.1.0/ServerDoc.jar GatherCrashDoc \ /usr/HTTPServer /tmp/core Reports, log files, and configuration files have been saved to directory CrashDoc.200404121310 If you have additional log files or configuration files, copy them there before packing up the directory. Hint for packing up the directory: tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310 gzip CrashDoc.200404121310.tar # ls -l CrashDoc.200404121310/ total 8136 -rw-r--r-- 1 root system 8779 Apr 12 13:10 access_log -rw-r--r-- 1 root system 7094 Apr 12 13:10 apachectl -rw-r--r-- 1 root system 3593703 Apr 12 13:10 core -rw-r--r-- 1 root system 478483 Apr 12 13:10 core_file_strings -rw-r--r-- 1 root system 14419 Apr 12 13:10 error_log -rw-r--r-- 1 root system 37141 Apr 12 13:10 httpd.conf -rw-r--r-- 1 root system 7500 Apr 12 13:10 log -rw-r--r-- 1 root system 173 Apr 12 13:10 report
The next step is to copy any other web server or plug-in configuration files and logs into the new CrashDoc directory. Here is a list of files to copy if they are being used:
The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.
# tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310 # gzip CrashDoc.200404121310.tar
The resulting compressed file is the file to send to IBM support.
root
requirementWhen gathering information on web server crashes, the tool must
be able to read core files created for web server processes and web
server logs and configuration files. Often the web server logs and
configuration files are readable by normal user ids, but core files
are readable only by root
or by the web server user id
(e.g., nobody
or www
).
If the web server is started as root
, the permissions
on generated core files and log files and configuration files can be
changed to allow a non-root
user to run the crash
documentation tool.
If the web server is not started as root
, there are no
such concerns, and the crash documentation tool may be run by the user
id which starts the web server.
If the tool is run as non-root
and it is unable to
gather the required information, permissions on the core file or other
files can be changed and the tool may be run again. It may not be
possible to determine if this problem occurred until the documentation
has been analyzed by IBM HTTP Server support.