IBM Books

Replication Guide and Reference


Storage planning

In addition to the storage required for DB2, replication requires storage for:

Database log and journal data
The additional data logged to support the replication of data.

Active log file size for Capture for VSE and VM and current receiver size for Capture for AS/400
You need to ensure the data needed for replication remains on the active log, rather than on archived logs.

Target tables and control tables
The replicated user data and control tables (including change data tables).

Spill files
The Apply program requires temporary space to store data. Apply for OS/390 can use memory rather than disk space for the spill files; the Apply program for all other operating-system environments uses disk space for the spill files.

If there is insufficient disk space for the spill files, the Apply program terminates. If you specify that Apply for OS/390 should use memory, but there is not enough memory for the spill files, the Apply program abends; in this case, specify that the Apply program should use disk space and restart it.

All of the sizes given in the following sections are estimates only. To prepare and design a production-ready system, you must also account for such things as failure prevention. For example, the holding period of data (discussed in Target tables and control tables) might need to be increased to account for potential line outage.

If storage estimates seem unreasonably high, reexamine the frequency interval of the Apply program (how often your subscriptions run) and pruning. Trade-offs frequently must be considered between storage usage, capacity for failure tolerance, and CPU overhead.

Database log and journal data

Before you can replicate a table, you must create it (or alter it) with the DATA CAPTURE CHANGES keywords. One of the effects of these keywords is that DB2 logs full-row images for each UPDATE statement. For a replica table (in an update-anywhere scenario), DB2 also logs the before-images for each update to the table. Another increase to the log or journal volume comes from DB2's logging insertions to and deletions from the unit-of-work (UOW) and change data (CD) tables.

Although estimating the increase in the log or journal volume is not easy, in general you will need an additional three times the current log volume for all tables involved in replication.

To make a more accurate estimate, you must have detailed knowledge of the updating application and the replication requirements. For example, if an updating application typically updates 60% of the columns in a table, the replication requirements could cause the log records to grow by more than half compared to a similar table that is not replicated. One of the replication requirements that adds the most to the log is the capturing of before- and after-images (as in the case of update-anywhere replication scenarios). One way to reduce the log volume is to reduce the number of columns defined for the replication source.

In addition to logging for the source database, there is also logging for the target database, where the rows are applied. Because the Apply program does not issue interim checkpoints, you should estimate the maximum amount of data that the Apply program will process in one time interval and adjust the log space (or the space for the current receiver for AS/400) to accommodate that amount of data.

Active log file size for Capture for VSE and VM and current receiver size for Capture for AS/400

For VM and VSE, when the active log is full, DB2 archives its contents. For AS/400, when the current receiver is full, you need to switch to a new one, and optionally save and delete the oldest one. When a system handles a large number of transactions, the Capture program can occasionally lag behind. If the log is too small, some of the log records could be archived before they are captured. Capture for VSE and VM running with DB2 for VSE & VM cannot recover archived log records. 6

For DB2 for VSE & VM, ensure that your log is large enough to handle at least 24 hours of transaction data. For DB2 for AS/400, ensure that the current receiver is large enough to handle at least 24 hours of data.

Target tables and control tables

The space required for a target table is usually no greater than that of the source table (or tables), but can be much larger if the target table is denormalized or includes before images (in addition to after-images) or history data. The following also affect the space required for a target table: the number of columns replicated, the data type of columns replicated, any row subsets defined for the subscription-set member, and data transformations performed during replication.

The CD tables and the UOW table also affect the disk space required for a target database. The space required for the replication control tables is generally small because each requires only a few rows.

The CD tables grow in size according to the amount of data replicated until the Capture program prunes them. To estimate the space required for the CD tables, first determine how long you want to keep the data before pruning it, then specify how often the Capture program should prune these tables or how often you issue the prune command. To determine the minimum size for the CD table, use the following formula:

minimum_CD_size = 
  ( (21 bytes) + sum(length of all registered columns) ) *
  (number of inserts, updates, and deletes to source table) *
  (exception factor)
 

When calculating the number of bytes of data replicated, you need to include 21 bytes for overhead data added to the CD tables by the Capture program. In the formula, determine the number of inserts, updates, and deletes to the source table within the interval between capturing and pruning of data. The exception factor allows for such things as network failures or other failures that prevent the Apply program from replicating data. Use a value of 2 initially, then refine the value based on the performance of your replication environment.

Example: If the Capture program prunes applied rows from the CD table once daily, your interval is 24 hours. If the rows in the CD table are 100 bytes long (plus the 21 bytes for overhead), and 100,000 updates are applied during a 24-hour period, the storage required for the CD table is about 12 MB.

The UOW table grows and shrinks based on the number of rows inserted in a particular time interval (the number of commits issued within that interval by transactions that update source tables or by Capture for AS/400). You should initially overestimate the size required and monitor the space actually used to determine if any space can be recovered. The size of each row in the UOW table is fixed at 79 bytes (except for DB2 for AS/400, where it is 109 bytes). For a first approximation of the space needed for the UOW table, multiply 79 bytes (or 109 bytes) by the number of updates applied during a 2-hour period. Use a formula similar to the one given above for CD tables to obtain a better estimate for the space needed for the UOW table. For more information, see Unit-of-work table.

Spill files

The Apply program stores updates to target tables in temporary files called spill files. 7 These files hold the updates until the Apply program applies them to the target tables. The Apply program uses multiple spill files for subscription sets with multiple subscription-set members: one spill file for each target table. The Apply program stores the spill file on disk for every operating-system environment, but Apply for OS/390 can use virtual memory instead. Unless you have virtual memory constraints, store the spill files in virtual memory rather than disk.

The size of the spill file is equal to the size of the data selected for replication during each replication interval. You can estimate the size of the spill file by comparing the frequency interval (or data-blocking interval; see Data blocking for large volumes of changes) planned for the Apply program with the volume of changes in that same time period (or in a peak period of change). The spill file's row size is the target row size, including any DB2 DataPropagator overhead columns. This row size is not in DB2 packed internal format, but is in expanded, interpreted character format (as fetched from the SELECT). The row also includes a row length and null terminators on individual column strings.

Example: If change volume peaks at 12,000 updates per hour and the Apply program frequency is planned for one-hour intervals, the spill file must hold one-hour's worth of updates, or 12,000 updates. If each update represents 100 bytes of data, the spill file will be about 1.2 MB.


Footnotes:

6
Capture for OS/390 running with DB2 for OS/390 V4 or higher and DB2 Universal Database V5 or higher can recover archived log records.

7
If you are using the ASNLOAD utility, you have a load input file instead of a load spill file.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]