Replication Guide and Reference

Staging data

During replication, DB2 DataPropagator stages changed data; that is, the Capture program captures changes to a source table only once and inserts changed rows into a change data (CD) table. The Apply program then retrieves the changes from the CD tables. The Capture program can also automatically prune changed rows from CD tables after they have been processed by all Apply programs and are no longer needed (if PRUNE is enabled). ¹⁰

This section describes how you can stage data during replication and the role that consistent change data (CCD) tables play.

CD tables

A CD table receives an arbitrary number of changed data rows from the Capture program: one for each INSERT, UPDATE, and DELETE statement executed against the source table. The CD table does not know whether the transactions issuing the updates are committed, are incomplete, or are in flight. The Apply program joins the CD tables with the unit-of-work (UOW) table to determine which changes are committed and can therefore be replicated to target tables. Uncommitted changes are eventually pruned, depending on the retention limit that you define in the tuning parameters control table.

CCD tables

The Apply program joins the CD and UOW tables when determining what data is committed and can be replicated, but it does not save the results of the join; the Apply program can save the results of the join in consistent change data (CCD) target tables. The benefit of saving the join of the CD and UOW tables is that several subscriptions can refer to that join without having to perform it for each subscription cycle. CCD tables hold captured changes from INSERT, UPDATE, or DELETE operations against a source table. ¹¹

Because update-anywhere replication requires the most current change data for conflict detection, ¹² CCD tables are not used in update-anywhere replication. For update-anywhere replication, every update should be replicated to the target table in the order in which it occurs; when you use CCD tables, the order in which the rows are replicated is not guaranteed. Also, the delays imposed on replication by using CCD tables can increase the likelihood of update conflicts' not being detected. Thus, if you define an internal CCD table, it is ignored by the Apply program when processing a subscription set with a replica as a target.

As described in Overview of data replication, there are many types of CCD table, each of which has a different use. The following sections describe these types, how to define them, and what uses they have.

When using the DB2 Control Center to create your CCD tables, be sure to review the generated SQL statements to ensure the Control Center will create the type of CCD table, subscription, and registration you want.

Local and remote CCD tables

A CCD table is first classified by its location: local or remote. A local CCD table resides in the source database. A remote CCD table resides remote from the source database, that is, in any database in the network that the Apply program can access.

When you specify the target table in the Subscription Definition window of the DB2 Control Center, you can choose whether the CCD table should be local or remote.

Complete and noncomplete CCD tables

In addition to its location, a CCD table can be classified by its contents. The first classification based on contents is whether the CCD table is complete or noncomplete. A complete CCD table contains all rows that satisfy the source view and subscription predicates from the source table or view. A noncomplete CCD table contains only modified rows from the source table. Thus, a noncomplete CCD table is initially empty and is populated as changes are made to the source table.

A complete CCD table is automatically registered as a replication source, so you will see it in the Replication Sources folder in the DB2 Control Center.

You specify that you want a CCD table to be complete by selecting the Used as source for future copies check box in the Advanced Subscription Definition window of the DB2 Control Center. If you do not select this check box, the CCD table is noncomplete.

Condensed and noncondensed CCD tables

The second classification of a CCD table based on its contents is whether it should be condensed or noncondensed. A condensed CCD table contains only the most current value for each row from the source table. A noncondensed CCD table contains all changes made to each row in the source table, that is, it represents the history of changes to each row.

For a condensed CCD table, the Apply program updates the values for a row; for a noncondensed CCD table, the Apply program inserts a new row to the CCD table for the updated row in the source table. For this reason, a condensed CCD table must have unique key values for each row, but a noncondensed CCD table can have multiple rows with the same key values. And because of the differences in key uniqueness, a condensed CCD table must have a unique index defined for it, whereas a noncondensed CCD table must not have a unique index.

If you select the Used as source for future copies check box in the Advanced Subscription Definition window of the DB2 Control Center, your CCD table will not only be complete, but also condensed. Likewise, if you do not select this check box, the CCD table will be both noncomplete and noncondensed.

If you select the Include Unit-of-Work (UOW) table columns check box in the Advanced Subscription Definition window of the DB2 Control Center, your CCD table will be noncomplete and condensed and it will include extra columns from the UOW table. If you do not select this check box, the CCD table will be both noncomplete and noncondensed and will not contain extra columns from the UOW table.

The DB2 Control Center provides no option for directly choosing whether a CCD table is condensed or noncondensed, but DJRA does. If you do not use DJRA, you must modify the register control table in the database where the CCD table resides and the subscription targets member table in the control database to specify that a CCD table should be condensed or noncondensed.

Internal and external CCD tables

The sequence in which you create CCD tables determines whether they are internal or external. The first local, noncomplete CCD table that you create is an internal CCD table; all other CCD tables are external. All remote CCD tables are external.

You cannot use an internal CCD table as an explicit source for a subscription set, but if an internal CCD table is present, the Apply program uses it to replicate changes instead of using the CD table.

If you perform a full refresh on an external CCD table, the Apply program performs a full refresh on all target tables that use this external CCD table as a replication source. This process is often referred to as a cascade full refresh.

Uses of CCD tables

In addition to minimizing the effects of joining the CD and UOW tables, you can use CCD tables to improve the efficiency and flexibility of your replication environment. The following examples show some of the uses for CCD tables:

Maintaining complete histories of changes
Use noncomplete, noncondensed CCD tables to keep a history of updates to a source table or to maintain an audit trail of database usage. For improved auditing capability, include the extra columns from the UOW table.
Replicating data to multiple target tables
Use a remote CCD table to reduce network traffic from the source server to the target servers. Changes to the source table are copied to the remote CCD, which acts as the source table for multiple target tables, thus potentially saving multiple network connections to the source server.
Using remote CCD tables in this way is analogous to three-tier client/server configurations, where the source table acts as the first tier, the remote CCD table acts as the middle tier, and the target tables act as the third tier.

Use a local, internal, noncomplete CCD table to control the time consistency of updates replicated to multiple sites. Then define one or more subscription sets using this CCD table as the replication source. This CCD table can shield its target tables from the volatility of the source tables if the replication to the CCD table is infrequent enough. Using a CCD table in this way is an example of a two-tier model.
Condensing hot-spot updates
Use a condensed CCD table to keep only the most current values for each row of the source table as it is continually updated. A hot spot develops when your application programs update a particular row many times in a short time interval. By keeping only the most current updates to each row in a condensed CCD table, you can reduce the network traffic because you do not have to replicate all of the updates to the target tables, instead you replicate only the most current update.
In this case, ensure that the Apply program does not replicate too frequently (or as frequently as the hot spot develops) so you can benefit from using a condensed CCD table.
Maintaining transaction-consistent replication
Use condensed CCD tables to maintain transaction-consistent replication, that is to replicate only the net effect of all transactions that update the source table. To implement transaction-based replication, use noncondensed CCD tables to replicate every update from every transaction that updates the source table. Transaction-based replication is necessary for update-anywhere scenarios.
Using CCD tables as replication sources
A complete CCD target table is automatically registered as a replication source at the target server, and you can use this table when defining subscription sets. Using CCD tables as replication sources is useful, for example, for data warehousing and information repository scenarios.
Using CCD tables as local caches for committed changes
Use an internal CCD table as a local cache for committed changes to a source table. The Apply program replicates changes from an internal CCD table, rather than from CD tables, if one exists.

Using CCD tables for nonrelational data sources

Changes captured by application programs or other tools, such as DataPropagator NonRelational, can be defined as sources for subscription sets. The application program must create and maintain a complete CCD table. This CCD table must be external, but can be condensed or noncondensed. For example, DataPropagator NonRelational captures changes to IMS DB segments and updates its CCD table. You define the CCD table as a replication source using the DB2 Control Center or DJRA. You can then define subscription sets using this CCD table, regardless of where the original updates occur.

Pruning the CD and CCD tables

The Capture program can prune CD tables based on information inserted into the pruning control table by the Apply program. You control whether the Capture program prunes CD tables by using the PRUNE or NOPRUNE parameter. You can also control when the pruning takes place and how the prune interval is set by modifying the tuning parameters control table.

Several of the types of CCD table can continue to grow in size, especially noncondensed CCD tables. Pruning of these tables is not automatic; you must prune them manually or use an application program. For some types of CCD table, you may want to archive them and define new ones, rather than prune them.

When the source table is a non-IBM table, the Capture triggers prune the CCD table based on a synchpoint that the Apply program writes to the pruning control table.

Footnotes:

¹⁰: The Capture program does not, however, prune changed rows from consistent change data (CCD) tables. You must prune them manually.
¹¹: This statement is only true for noncondensed CCD tables; see Condensed and noncondensed CCD tables.
¹²: Also, referential constraints on the replica tables might not tolerate condensing that CCD tables allow.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]