gtpd1m19 | Database Reference |
The capture and restore utility is controlled by the ZFCAP and ZFRST commands, respectively. The commands support options to start, abort, pause, restart, add or delete tape devices, and request status. A single purpose, multiple-entry concept is used to ensure concurrently overlapping I/O operations. The messages are verified and sent to the participating processors where the appropriate routines are started. It is important that you identify the participating processors before the capture function starts. Once the function starts, you cannot add more processors to the participation list. Use the ZPROT command to add and assign processors for the capture and restore utility. See TPF Operations for more information about the capture and restore utility commands.
Automatic tape mounting is suspended for capture and restore, except for devices used for exception recording and keypoint capture and restore. If capture processing starts on a device that is enabled for automatic tape mounting and the device contains an ALT tape, the suspend processing dismounts the ALT tape without unloading it.
Capture has a message analyzer to:
The restore message analyzer validates the message and starts the restore processor activity.
Special purpose programs are called to provide specific elements of the capture and restore utility. Notable routines handle start/continue/EOJ, read/write, restart, abort/pause, tape add/delete, status, exception recording/logging, security, and error recovery.
Data filed out as part of normal capture (as opposed to exception recording) is written to general tapes referred to as capture tapes. These tapes can be subsequently read by the restore function. Tape drives and tape drive pairs for use by capture or restore are listed on the ZFCAP or ZFRST command. Additional drives or pairs can be added during capture or restore. Each tape drive or pair of drives can be used to capture 1 module at a time. Channel and control unit utilization limits determine how many of the assigned tape drives may be in use by capture at one time. See Capture Processing for more information about channel and control unit utilization.
One disk module can require more than 1 tape for its capture. If the module is being captured to a tape pair, at the end of the first tape, data continues to be filed out to a tape on the second drive in that pair (called the alternate drive). The drives in a tape pair flip-flop in this way until the entire disk module is captured. If the tape drive was assigned to capture as a single tape drive, and not as a tape drive pair, then capture first attempts to find another single drive (that was assigned to capture) that is not currently in use. If no other drive is available, the system prompts the operator to mount a standby capture tape.
Restore works in a similar way, with all tapes for a given module loaded to a single drive or to a pair of drives.
Real-time tapes are used for exception recording. Exception recording tapes must be mounted by the operator for all active processors before capture can be started. To restore the records on these tapes, the tapes used for all processors active at the time of capture must be mounted for a single processor so that the records can be read from tape and filed in the correct order.
Either all or selected online disks can be captured. Duplicate copies of duplicated records on each disk are not captured. Many modules can be preserved simultaneously. The modules to be captured and their respective tapes should be chosen to obtain the maximum possible channel separation. The total number of disk captures that are active at any one time depends on a particular system's configuration and the user-specified limits on load balancing. This number, however, can be changed by a command. To improve operator efficiency, the capture program selects a tape on which to capture a specific disk, assigns a symbolic tape name to each tape, and causes the tape to be internally mounted.
During disk capture, you control the load balancing across channel paths and control units by limiting the maximum number of captures that can occur simultaneously on a:
Use of a channel path and control unit are limited to the values set for the DASDCU, DASDCH, TAPECU, and TAPECH parameters. For example, if a channel group is handling the maximum amount of captures for tapes, no additional captures will be allocated to the tape devices that are serviced by that group until the number of captures are reduced.
Capture sets the value for DASDCU to 1; you cannot change this value. This ensures that no group of channel paths or control units is overused and keeps DASD and tape module queues at manageable lengths. You can change the value for DASDCH, TAPECU, and TAPECH with the ZFCAP CHANGE command.
A typical configuration of DASD and tapes channel paths has the DASD and
tape control units on separate paths.
The maximum values for the preceding figure are:
DASDCU: 1 TAPECU: 1 DASDCH: 1 TAPECH: 1
In other words, 1 DASD can participate in a capture under any given control unit and only 1 DASD can be started on a channel path, or CHPID, at a time; similarly for tape.
If an installation takes advantage of multipathing, the channel paths can
run through the same control units to provide the additional pathway
flexibility.
The maximum value for this installation is:
DASDCU: 1 TAPECU: 1 DASDCH: 1 TAPECH: 1
This is the same as the previous configuration, but in this case multipathing definitions have been put into the IOCP generation providing more flexibility.
There is a special consideration when a channel path (CHPID) is shared
between a tape device and a DASD on the same processor. The DASDCH or
TAPECH values must include any DASD or tape running on the same channel
path. If the number of captures is incremented, the second kind of
capture (DASD or tape) requested is not started until the first is
completed. This is shown in the following figure:
The maximum value for this installation is:
DASDCU: 1 TAPECU: 1 DASDCH: 2 TAPECH: 2
Notice that the number of captures on the channel path for both DASD and tape is the number of control units on that path that can possibly be expected to be running capture.
The relationship between the DASDCU and DASDCH, and the corresponding TAPECU and TAPECH parameters provides a means for balancing capture resource requirements across numerous devices, control units, and channel paths. For example, in Figure 6, a single channel path or CHPID is shown with 2 DASD control units (C1 and C2) and 3 devices (D1, D2, D3). DASDCU is set to 1. This means that only a single device will operate at a time under a given control unit. The DASDCH parameter allows you to specify how many devices can be run under a single CHPID.
For example, when DASDCU equals 1 and DASDCH equals 1, you can capture device D1. Devices D2 and D3 are not available for capture at the same time. If you leave DASDCU equal to 1 but set DASDCH equal to 2, you can capture 2 devices, D1 and D2, at the same time. You cannot capture device D3 at the same time because DASDCU is 1. DASDCU is restricted to 1 for performance reasons. (An example of TAPECU > 1 appears later.) If you leave DASDCU equal to 1 but now set DASDCH equal to 3, 3 devices can be captured at the same time on the CHPID. The configuration to do this is for another control unit to appear below C2, call it C3, with the third device attached to it. Device D3 in this example would not participate in the capture at the same time as D1 (because DASDCU equals 1).
Figure 6. Capture and Restore Load Balancing Using DASDCU and DASDCH
There are configuration considerations for the techniques used for balancing capture loads for the following IBM 3990 models:
The LLF static switch may cause even and odd addresses to be placed on different channels. For example, if your configuration defines device addresses 4E0, 6E1, 4E2, 6E3, and so on, the even addresses are placed on channel 4 and the odd addresses are placed on channel 6. However, for correct load balancing, all the device addresses of these control units must have the same channel address. For example, the addresses in the previous example can be defined as 4E0, 4E1, 4E2, 4E3, and so on. This is not a concern for the IBM 3990 Model 3 running record cache.
Load balancing is directly related to site resource requirements and the degree of multipathing found in the configuration. The configuration is described during Input/Output Configuration Program (IOCP) generation. See ES/9000, ES/3090 Input/Output Configuration Program User's Guide and ESCON Channel-to-Channel Reference for more information about IOCP generation.
The TAPECU and TAPECH parameters function similarly. The TAPECU parameter is not restricted to one with tape, however.
The top part of Figure 7 shows 1 CHPID connecting a processor with a single tape control unit. Under this control unit there are 4 tape devices. If TAPECU equals 1 and TAPECH equals 1, only 1 of these tape devices can participate in capture at a time. If TAPECU equals 4 and TAPECH equals 1, only 1 tape can participate at a time. If TAPECU equals 4 and TAPECH equals 4, all 4 tape devices can run capture at the same time.
The bottom part of this figure shows 4 CHPIDs connecting the single tape control unit. This is a multipathing configuration. Using the multipathing aspect you can set TAPECU equal to 4 (to allow 4 tape devices to operate under the same control unit) but set TAPECH equal to 1 (thereby limiting the number of tape devices operating at the same time on the CHPID to 1). The multiple channel paths to the same control unit satisfy this restriction, but still allow 4 tape devices to be run at the same time. So, the two parts of this figure provide the same results, which is 4 tape devices participating in capture at the same time.
The first part shows the TAPECH parameter set to allow more tapes on the CHPID (TAPECH=4). In the second part the additional CHPIDs are used to run those same 4 tape devices while restricting each CHPID to a single tape device. Load balancing helps to balance the capture resource requirements.
Figure 7. Capture and Restore Load Balancing Using the TAPECU and TAPECH Parameters
The capture functions are independent entities for each subsystem in a multiple subsystem environment. Multiple subsystem capture can occur simultaneously. However, from a performance standpoint, it should be limited to a single subsystem per processor.
There are 2 options available when starting capture. The first option causes the capture of all online disk packs. The second option results in the capture of specified modules. For a capture all (the ZFCAP ALL command) request, specify tape devices from all tape channels. The capture tapes must be pre-initialized and placed on the tape drive for each participating processor and made ready. The capture function assumes that all tape drives specified to it as available are always made ready with a capture tape. The capture function selects a tape drive to achieve the best possible channel separation, mounts the tape internally, handles switching to multiple reels if necessary, and causes the tapes to be rewound and unloaded when completed. The capture function selects the module to be captured to a particular tape based on disk channel and control unit separation.
In a loosely coupled environment, capture is started from a single processor. Processor participation is determined by active entries in the processor resource ownership table (PROT) that are assigned before starting capture with the ZPROT command. The individual option of file capture is limited to a single processor. See TPF Operations for more information. Before capture starts, the input message is verified to determine the following:
When the previous conditions are met, the capture function is started in the designated processors and exception recording (XCP) is started in all active processors. Once started, each processor proceeds independently in selecting modules for capture so as not to interfere with the modules already in progress by other processors. When a participating processor exhausts all selection possibilities, a PROCESSOR COMPLETE message is displayed and the capture function ends for that processor.
In loosely coupled and multiple database function (MDBF) systems, the control unit tables are processor-shared, though the values for DASDCU and TAPECU are not shared. These values are held in processor-unique, capture working keypoints. This means that apparent inconsistencies can occur. For example, if TAPECU equals 5 on CPU B and TAPECU equals 10 on CPU C, 10 captures can be started on a control unit, even though processor B sets the limit to 5. In effect, the highest DASDCU or TAPECU value across all of the processors prevails. The same apparent inconsistency can occur for subsystems also. For example, if TAPECH equals 5 in the basic subsystem (BSS) and TAPECH equals 10 in SS2 (a non-BSS), 10 captures can be started on the CHPID tape despite the limit in the BSS. There are no restrictions to prevent subsystems from sharing CHPIDs or control units.
Capture is designed to run in a real-time environment. Therefore, records can be updated on disk files that have already been written to the capture tape. When file capture is started, an indicator is set for the control program, which examines the address of each record that is written to disk file when this indicator is on. If the address is behind the current capture location, a copy of the record is written to an exception recording (XCP) tape. You have the option of not performing exception recording certain record types. For example, you may choose to ignore short-term record updates in order to reduce the exception recording (XCP) tape load on the TPF system. This is done by turning off these record exception recording indicators in the record ID attribute table (RIAT). Exception recording is not stopped automatically when capture is completed in order to allow the captured data to reflect the state of the online files at later time. Discontinue exception recording by entering a command. If an IPL-restart occurs, the capture function itself must be restarted by a console order but exception recording automatically continues.
After a capture of all modules, and while exception recording is still in progress, individual modules can be recaptured and exception recording will continue on all modules. This is done by using the individual mode of capture and can be done as many times as necessary, but only following a capture of all modules. A capture of all modules cannot be started while exception recording is in progress.
The exception recording and logging tapes are real-time tapes. Therefore, tape drive utilization can be improved by mounting an alternate (ALT) tape instead of a standby output tape.
Record logging allows you to log to tape, at any time, all updates to user-specified record types on file. These tapes are the real-time logging tapes. Logging is started and discontinued by entering a command.
When multiple subsystems exist, record logging is started on a subsystem level. The real-time logging tapes may be unique or shared by the different subsystems.
In a loosely coupled environment, the start and stop logging functions affect all active processors and thereby require active logging tapes on all processors for the function to start.
If an IPL-restart occurs, logging continues automatically.
The keypoint capture function is provided to capture the system keypoint records, tape label directory (TPLD) records, and tape label mask records (TLMR) to the KPC tape. Keypoint capture is started automatically at the end of the file capture function when the stop exception recording message is displayed following a capture ALL request. The subsystem keypoints for all generated processors are captured to the KPC tape. Keypoints that are shared between subsystems are captured only when keypoint capture is processing in the basic subsystem (BSS). Configuration-dependent keypoints (keypoints 1, 6, and I) will not be captured. Keypoint capture can also be started by command input.
There are 3 phases to restore processing:
In phases 1 and 2, duplicated records are restored on both the prime and duplicate modules, if any, if the following conditions exist:
If none of these conditions exist, duplicate tracks are not restored on either the primary or duplicate modules.
The 3 phases for restore are described in more detail in the following sections.
Phase 1 restores the capture tape records. When a restore is required, you have three options for the restore of the capture tape file (CAP):
The optional parameter (D) may be added to the end of each message to indicate the duplicate restore option. In options 2 and 3, duplicated records are restored whether or not both the primary and duplicate copies lie in the area to be restored.
You may also specify the optional FZ and BP parameters. The FZ parameter means that zeroed records are to be filed. The BP parameter means that reel sequence checking will be bypassed.
If more than 1 tape is required, as in the case of a restore ALL request, tape drives should be selected from as many different channels as possible. This will minimize restore time. The drives specified in the command should be separated by slashes (/). The mounted tapes should be for disk modules on the maximum number of different channels. Subsequent restore tapes should be placed on the available drives in the same way.
If multireel tapes are to be restored, tape device pairs should be used to minimize delays because of tape rewind. The 2 drives must be on the same channel and control unit. The first reel should be mounted on the active drive and the next reel on the alternate drive. Subsequent reels would be mounted in a flip-flop fashion. When the rewind of the next-to-last reel is completed, that drive should be made ready with the first (or only) reel of another module to be restored (if any modules still remain to be restored).
You only need to ready the tape drives. Restore ensures that the tapes are mounted and will rewind and unload them when finished. However, restore assumes that all drives that are made available to it have CAP tapes on them. If a tape mount error occurs, the drive should be made ready with the correct input tape but you should not enter the tape mount command.
In a loosely coupled environment, as previously described, file restore is started from a single processor. Processor participation in the restore function is determined by active entries in the PROT table. See TPF Operations for more information about the ZPROT command.
Once started, each processor proceeds independently restoring from the tape devices assigned to that processor. As in capture, the processors that are to participate in the restore must be active when the function is started. No provisions are available to allow more processors to participate once the function is started. The selective module restore function is limited to a single processor. The restoration of data can be to a single disk or a number of disks concurrently. As with capture, the number of disks to be operated on at any one time depends on the system configuration.
Image-related records will not be restored when records associated with TPF images are bypassed. Records that are bypassed include:
You can restore these records to a module with the ZIMAG COPY command or by using one of the loaders (general file loader or auxiliary loader).
In addition, the tape label directory (TPLD) records and tape label mask records (TLMR) will not be restored because they are actively used during the restore function. These records are captured on the KPC tape by the keypoint capture function, and you can restore these records to a module using the ZFRST KPT command.
Phase 2 restores logging and exception recording tape records.
The records that are logged to the real-time logging and exception recording tapes can be restored to all modules, a specific module, or an address range in a module or encompassing many modules. Time parameters are required that specify where logging restore will start and where it will stop, determined by time-stamp records on the restore tapes. The time parameter for exception recording (XCP) specifies where XCP restore will start. This time is generally the time when capture was started.
The restore of logging and exception recording (XCP) tapes in a loosely coupled environment requires that you mount tapes for all processors that were active at the time of capture. This function is not shared by the participating processors but can be started in any one of them.
Phase 3 restores the keypoint records.
Keypointing is prevented while a restore is in progress to ensure the integrity of the captured database. While keypointing is prevented by restore, updates are still allowed to keypoints that are not captured (see TPF Operations). Because these keypoints cannot be restored, data corruption during restore is not a concern. Data corruption of restored keypoints (including globals) is also not a concern during a module restore because a subsequent IPL is not expected. Therefore, keypointing is prevented only by the following input messages:
Keypointing remains disabled until you perform the subsequent mandatory IPL. If you perform the IPL before the logging records are restored, another IPL is required to enable keypointing. IPL as soon as possible after the restore to prevent possible data corruption by the time-initiated keypoint update routines. In a loosely coupled environment, the complex must be collapsed down to 1 processor before you re-IPL. This ensures that the proper initialization of the keypoint records takes place at system restart.