A coredump is a special file which represents the memory image of a process. Many operating systems have the capability of saving a core dump when the application crashes. The core dump is an important part of diagnosing the cause of the crash, since the data which the application was accessing at the time is in the core dump, along with information about which part of the application was running at the time of the crash.
There are various configuration requirements which must be met in order for the operating system to save a core dump when IBM HTTP Server crashes. This document describes the common configuration requirements.
Later sections of this document provide more information. Here is a quick checklist to consider. For z/OS information, refer to this document
CoreDumpDirectory
.
CoreDumpDirectory /tmp
Note: In some rare situations, core files will be larger than 2GB. They will be truncated unless the filesystem has large file support. By default, JFS filesystems don't support such files; large file support has to be enabled explicitly when the filesystem is created. Also check ulimit -f if your IHS processes are larger than 1GB to prevent the core files from being truncated to 1GB (the ulimit -f default).
ulimit -c unlimited ulimit -f unlimited
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
CoreDumpDirectory /tmp
Note: On Solaris, /tmp is often mounted on paging space (swap device). If there is a potential paging space shortage, create another directory on a physical file system, make sure that the web server user id can write to it, and set CoreDumpDirectory to point to that new directory.
# coreadm -e global-setid -e proc-setid -e global
ulimit -c unlimited ulimit -f unlimited
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
CoreDumpDirectory /tmp
ulimit -c unlimited ulimit -f unlimited
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
Make sure that the process for which a coredump is needed has permission to write a coredump. For example, with Apache/IHS, the default location of the coredump is the Apache/IHS install directory or the directory specified by the CoreDumpDirectory directive. The user id associated with Apache/IHS must have permission to write files there. For most processes created by Apache/IHS, that user id and group id is specified by the User and Group directives in httpd.conf. This is often "nobody." A quick work-around to permission problems is to specify "CoreDumpDirectory /tmp" in httpd.conf.
Make sure there is plenty of room (possibly many megabytes)
available on the partition/mount/volume where you expect the core file
to be placed. If you get a core dump which is unusable for some
reason, check available disk space with df -k
on the
partition/mount/volume containing the core after the core dump has
been written to ensure that the system did not run out of space.
Note that with Apache/IHS, the core file will almost always be placed in the directory specified by the CoreDumpDirectory directive.
Make sure your ulimit is set appropriately so that you don't hit a limit in the size of the core file (some default limits have the size limited to zero bytes :) ).
There are two parts: 1) the hard limits imposed by your system or system administrator and 2) the soft limits you can manipulate via the shell.
Please note that the limits in force for the user that starts the
server (usually root
) are what is important. When the
server starts as root
switches user ids, the limits in
force do not change.
smitty user select "Change / Show Characteristics of a User" enter user name set "Hard CORE file size"
For bash or ksh, ulimit -a
will display the limit and
ulimit -c unlimited
will let you get as much as your system
[administrator] allows.
On AIX, a soft limit can be set per user in smit.
smitty user select "Change / Show Characteristics of a User" enter user name set "Soft CORE file size"Note that ulimit manipulation in the shell is still effective.
The default location for coredumps is the directory specified by
the ServerRoot
directory. When the web server is started
as root
, the child processes run under a different user
id, which does not have permission to write to that directory. This
is handled by using the CoreDumpDirectory
directive to
specify an alternate location, such as /tmp
.
Some platforms provide a mechanism for specifying an alternate
coredump location. This will override the value of the
CoreDumpDirectory
directive.
syscorepath
command on AIX 5.2 and aboveAIX 5.2 and above provides the syscorepath
command for
specifying an alternate coredump directory which affects all
applications on the system. If the web server was
started without the CoreDumpDirectory
directive and that
is preventing core dumps from being written because the default
directory has unsuitable permissions, the syscorepath
command can be used to specify a directory with the appropriate
permissions, and coredumps can then be written without restarting the
web server.
When syscorepath
is used to specify an alternate
directory, the file name of the coredump is no longer
core
, but instead includes the process id of the process
which crashed, and the time of day that the crash occurred.
Refer to the syscorepath
manpage for further
information.
coreadm
command on SolarisSolaris provides the coreadm
command which controls
several coredump settings, including an alternate coredump directory
and the format of the name of the coredump.
Refer to the coreadm
manpage for further
information.
With the Linux 2.4 kernel, if a thread crashes you'll get two coredumps: one for the main process, named core.pid, and one for the bad thread, named core.fakepid.
When IHS or Apache starts as root on Unix-like systems, it switches identity to the user and group specified in the configuration file. Sticky-bit programs and programs which start as root and then set their user id to something else have special issues for getting coredumps on some operating systems.
man
coreadm
). When all types of core dumps are enabled, it will
display something like this:
% coreadm global core file pattern: /coredumps/core.%f.%p init core file pattern: /coredumps/init-core.%f.%p global core dumps: enabled per-process core dumps: enabled global setid core dumps: enabled per-process setid core dumps: enabled global core dump logging: enabledThis will turn on most types of core dumps:
# coreadm -e global-setid -e proc-setid -e globalThis will set the global core file pattern:
coreadm -g /tmp/core.%f.%pNote: when you include a directory in the core file pattern, Apache's CoreDumpDirectory directive cannot override that.
prctl(PR_SET_DUMPABLE, 1)
-- which will enable coredumps
for that application. This syscall works only on Linux 2.4 and
later kernels.
Important note: There are reports that some 2.4 kernels from some vendors may have the prctl() feature broken, such that a core dump is not written even when the prctl() call is issued.
Most versions of IHS 1.3 do not make the appropriate call to
prctl()
, but a special module, mod_prctl, is available to
make the call so that core dumps can be taken by the kernel.
Here is the documentation for
the version of mod_prctl for IHS 1.3.
The feature is in IHS 2.0 starting with releases or PTFs after IHS 2.0.42.2. For prior releases, a special module, mod_prctl, is available to make the call so that core dumps can be taken by the kernel. Here is the documentation for the version of mod_prctl for IHS 2.0.42.
This is an issue involving programs that run first as root and then switch to another user. The solution is to poke the kernel. Specifically, set an undocumented kernel parameter called dump_all (works for 11.11, but not for 11.0). Here's how to activate dump_all:
# echo "dump_all/W 1" | adb -w /stand/vmunix /dev/kmem dump_all: 0 = 1To deactivate use:
# echo "dump_all/W 0" | adb -w /stand/vmunix /dev/kmem dump_all: 1 = 0
Note: This issue is resolved automatically with the following levels of IHS on AIX: |
AIX has a system-wide "full core" option which must be enabled in
order for "user data" areas of memory to be written to the coredump.
Without these areas of memory in the coredump, many types of problems
cannot be diagnosed. It will also result in dbx
having
problems analyzing the coredump of a threaded process. It is very
important to enable the "full core" option so that all the necessary
information is in the coredump.
Here is an example scenario from a dump which was not recorded
properly because Enable full CORE dump
was
false
:
[trawick@gorthaur platform_test]$ dbx ./a.out /tmp/core Type 'help' for help. warning: The core file is truncated. You may need to increasethe ulimit for file and coredump, or free some space on the filesystem. reading symbolic information ... [using memory image in /tmp/core] warning: Unable to access address 0xf0203a48 from core pthdb_session.c, 487: 1 PTHDB_CALLBACK (callback failed) k_thread.c, 2124: PTHDB_CALLBACK (callback failed) Segmentation fault in sig_coredump at line 24 24 kill(ap_my_pid, sig); (dbx) up warning: Unable to access address 0x8 from core not that many levels (dbx)Once the "full core" option was enabled the proper information was recorded and dbx could be used to determine the cause of the segfault.
Run this command:
# lsattr -El sys0 -a fullcore
The desired output is
fullcore true Enable full CORE dump True
If either the second word or last word of output are not "true" then the full core option is not currently enabled.
(Under some conditions, the full core option may not take affect
immediately if set from smitty chgsys
.)
Run this command to enable the option immediately:
# chdev -l sys0 -a fullcore=true
Important note: If the full core option took effect after the crashing application was started, the application should be stopped and then started again so that full core dumps are written.
Again, verify with the lsattr
command above that the
setting took effect.
Occasionally, core dumps will exceed 2GB in size. Thus, the directory for coredumps must support large files. This is specified during the creation of the JFS filesystem.
# lsfs -q
should report bf: true
/etc/security/limits
and set everything to
-1, then have the user log out and back in. Since the server is
normally started as root
, the user of interest is
normally root
.
Here is what the settings for the user should look like:
trawick: fsize = -1 core = -1 cpu = -1 data = -1 rss = -1 stack = -1 nofiles = -1
ulimit -a
should show something like this:
trawick@tetra:~/wrk/port/testtool/platform_test% ulimit -a core file size (blocks) unlimited data seg size (kbytes) unlimited file size (blocks) unlimited max memory size (kbytes) unlimited open files unlimited pipe size (512 bytes) 64 stack size (kbytes) unlimited cpu time (seconds) unlimited max user processes 128 virtual memory (kbytes) unlimited