One of the types of problems which can occur with IBM HTTP Server
(IHS) is a child process crash. With the default setting for the AcceptMutex
directive on Solaris, if there are multiple Listen
directives in the IHS configuration and a child process crashes, the
accept mutex may become unusable and IHS cannot accept any more
connections. Clients will be locked out at that point and the web
server is hung. This does not usually occur for every child process
crash.
The information in this document does not apply if you have one of
the following levels of IBM HTTP Server:
If you are already using one of these levels of IBM HTTP Server, proceed with gathering MustGather information for web server hangs. |
An easy way to check for this as the cause of a hang is to examine the IHS error log for a message like the following just prior to the web server hang:
[Tue Feb 22 20:03:14 2005] [notice] child pid 354 exit signal Bus error (10)
The error message may include a signal different than "Bus error
(10)". The key part of the message is child pid nnnn
exit signal
.
A more definitive way to determine the cause of the hang is to use the hang documentation tool during the occurrence of the hang and submit the generated documentation to IHS support. However, if the IHS error log shows that a child process crash occurred at the time the hang began, the crash is almost certainly the cause of the hang.
There are two issues that need to be resolved:
The default setting is AcceptMutex pthread
, which has
mostly good characteristics:
The potential problem with this default setting is that if there are unresolved causes of child process crashes, web server hangs will sometimes occur.
To avoid the potential hang after a child process, switch to
AcceptMutex fcntl
. There is a small risk that Solaris
system tuning will need to be changed. Also, this type of accept
mutex does not perform as well, though the performance
degradation is not noticeable in practice.
Here are the steps to switch to AcceptMutex fcntl
:
AcceptMutex fnctl
to the bottom of the IHS
configuration file.
LockFile /path/to/local/file
to the bottom of the IHS
configuration file.
/path/to/local/file
with a suitable lock file on
a local filesystem which is unique per instance of IHS.
IHS must be restarted to activate these changes.
For the vast majority of the IHS installations, the default Solaris
tuning is sufficient for using AcceptMutex fcntl
.
However, a small number of customers encounter a failure with IHS 2.0
or above and AcceptMutex fnctl
due to an insufficient
Solaris limit on the number of outstanding lock requests. Resolving
that requires changing the Solaris tuning and rebooting.
Because most customers will not encounter the tuning problem, but if they do it requires a system reboot, it may be preferable to defer this system tuning change until it is known that the current settings are insufficient. At that point, the configuration can be restored to the default (which is prone to hangs), and the tuning change and reboot can be deferred until a more suitable time.
If the problem occurs, IHS will fail with this message in the IHS error log:
(46)No record locks available: couldn't grab the accept mutex
If that occurs, the number of outstanding lock requests supported by the kernel must be increased. The current value can be displayed as follows:
# mdb -k > tune_t_flckrec/D tune_t_flckrec: tune_t_flckrec: 512
This can be increased by adding the following to
/etc/system
and rebooting:
set tune_t_flckrec=1024
If the current Solaris setting is different than 512 or there is
already a setting for tune_t_flckrec
in
/etc/system
then double the current value.
A core dump needs to be generated for the crash, and documentation on the core dump needs to be sent to IHS support for analysis. Please refer to these reference materials: