Troubleshooting Guide

Additional Enterprise - Extended Edition Considerations

There are other considerations that may assist you with problem determination and troubleshooting in a partitioned database environment. The following sections cover these considerations.

First Failure Data Capture

Information on FFDC specific to the EEE environment is presented here.

In an EEE environment, if your DIAGPATH is NFS-mounted or shared then a noticeable slow down will occur with the higher diagnostic levels chosen. Level 3 should be the highest level used in this environment. The only time level 4 should be considered for use is during:

Initial installation or configuration
Implementing a configuration change
Times when errors are recurring or when many different errors are encountered.

In an EEE environment, since the sqllib directory is mounted on all systems there is one db2diag.log file for all partitions or nodes. By changing the DIAGPATH database manager configuration parameter for all partitions to point to a local path, a separate db2diag.log file is kept for each partition. Each db2diag.log file then has log information specific to each partition making each log file easier to read. However, should all of the db2diag.log file information need to be seen, then gathering and integrating all the data from all the partitions can be difficult.

DB2 Trace Facility

Information on the DB2 Trace Facility specific to the EEE environment is presented here.

In an EEE environment, you may need to take a trace on more than one partition to determine the error. In a multiple logical node (MLN) environment, only one trace per physical node is required. Determining which partition or node on which to run the DB2 Trace Facility (db2trc) is done using SQLCA results following an error. In the SQLCA format, SQLERRD(6) presents the node number.

Other Methods

The db2_call_stack command can be used to assist in the identification of loops or hangs in the EEE environment. The db2_call_stack command sends a "terminate" signal to every DB2 process on every node recorded in the db2nodes.cfg file.

The db2_call_stack command should be run several times with some time in between to determine if there is a loop or hang; or, if the work is being done at all. See Trap Files for more information on db2_call_stack.

The DB2MEMDBG registry variable enables the tracking of memory allocation including the allocations of memory sets, memory pools, and memory blocks. It can be used to help identify memory corruption or memory leakage conditions. Memory corruption examples are:

Allocating fewer bytes than is required
Accessing a data structure after it was freed
Using the wrong data pointer to access a data structure.

Memory leakage examples are:

A slow resource leakage
An out-of-pool memory.

The process for using this registry variable is:

(Following a suspected memory problem, contact DB2 Customer Support for a set of options to use with this registry variable.)
db2stop
db2set DB2MEMDBG=<options>
db2start

The output from this procedure is placed in the db2dump subdirectory of the sqllib directory. The memdbg.log file contains log entries for all processes that are enabled for memory debugging. The p_<PID>.mem files contain per process memory debug information including corruption information.

When using LIST APPLICATIONS to collect data about a problem, remember to run this command on all partitions in your EEE environment. Running LIST APPLICATIONS from the catalog node gives information about all coordinator nodes.

When attempting to restart your database, always restart the catalog partition first and by itself. Once restarted, attempt a parallel restart of all the remaining partitions. If errors still occur during the parallel restart, then attempt to restart the partitions one at a time.

General Instructions When in a Bad State

If after checking your environmental status and you are sure that your system is completely hung, you should, with caution, consider the following process:

Use db2_kill to bring down the database.
Issue a db2start to bring up the database manager instance.
Issue a db2 restart database at the catalog partition/node to begin the process of bringing up the database.
Issue a db2 restart database at all the other partitions/nodes to complete the process of bringing up the database.

If after checking your environmental status and you determine that only one or two partitions/nodes are not working in your EEE environment, with caution, consider the following process:

Issue a db2start nodenum n to bring up the hung partition/node. Repeat this command where "n" is the node number of the partition that was hung until all of the partitions are up.
Issue a db2start database to complete the process of bringing up the database.

Hierarchy of Shutdowns

Note:

You should be cautious when proceeding to the latter commands to shutdown your database environment. Using the latter commands can remove the problem evidence so that DB2 Customer Support may not be able to determine the cause of your problem. Without determining the current problem, there is little or no chance for a preventative approach to be implemented in your environment. Your environment may then be susceptible to the same problem occurring again.

Here is the list of ways to shutdown your database environment from least to most forceful:

db2stop
db2stop force
db2_kill
kill -9 for processes, and ipcrm for IPC resources.

You should only use the most forceful commands under the direction of DB2 Customer Support.

[ Top of Page | Previous Page | Next Page ]