Replication Guide and Reference

DB2 data replication concepts

This section introduces some of the important concepts of DB2 data replication. You should read the whole section to get a complete overview.

A Replication source is a user table or view from which you want data copied. Before you can replicate data, you must define a replication source to describe the information that the change-capture mechanisms will use. When you define a replication source, you define the following attributes:

Which columns you want to replicate
Whether you want before-image values for a column
Whether you want to replicate without change capture (full-refresh only)
Whether you want updates treated as UPDATE operations or DELETE and INSERT operations
For update-anywhere replication (where a replication source has read/write target tables), what level of conflict detection to use

Full-refresh and differential-refresh copying

The Apply program copies data from the source to the target either by full-refresh only or differential-refresh copying.

During full-refresh only copying, the Apply program performs these steps:

Empties (deletes) all of the rows from the target table
Reads all of the rows from the source table
Copies the rows to the target table

During differential-refresh copying, the Apply program copies only the changed data to the target table.

After-image columns and before-image columns

An after-image column contains the value of a data column in a source table after the value in that data column is updated. A before-image column contains the value of a data column in a source table before that data column is updated. When you define a replication source, you can choose to capture only the after-image or both the after-image and the before-image. Your decision will depend both on the way in which you plan to use the data and on the types of tables that you are using.

Before-image columns are useful if your applications require auditing or rollback capability. Some restrictions apply to how you use these columns, and they are discussed later in this book (Replicating before and after images).

Levels of conflict detection

Conflict detection pertains only to update-anywhere replication configurations. It is the process of detecting if the same row was updated in the source and target tables during the same replication cycle. With standard conflict detection, the Apply program searches for conflicts in rows that are already captured in the CD tables. With enhanced conflict detection, the Apply program locks all of the target tables, thus preventing further transactions until the current changes are captured. Row-replica conflict detection applies only to tables that are maintained by DataPropagator for Microsoft Jet; where conflicts are detected on a row-by-row basis instead of a transaction-by-transaction basis.

Subscription sets and subscription-set members

Before you can replicate data from the replication source, you must associate the replication source with the target to which you want the changes replicated. You define this information using subscription sets and subscription-set members. The information that you provide is stored in various replication control tables.

A subscription set contains the attributes of a replication subscription. When you create a subscription set, you define the following attributes:

The relationship between the source server and the target server
The association between the Apply qualifier and the subscription set name
When to start replication, how often to replicate, and whether to use interval or event timing
Data blocking, if you expect large volumes of changes

A subscription set must have one subscription-set member for each target table or view. When you create a subscription-set member, you define the following attributes:

The relationship between a source table or view and a target table or view
The structure of the target table or view
The columns that you want replicated (subselect columns)
The rows that you want replicated (SQL predicates)

Subscription sets ensure that all subscription-set members are treated alike during replication: either changes are applied to all targets or to none of them. The changed data for all the subscription-set members in a subscription set is replicated to the specified target tables in a single transaction. Subscription sets optimize performance because the target tables in a set are processed in one transaction against the target server. Subscription sets also preserve referential integrity.

Each subscription set is processed by one Apply program; however, each Apply program can process many subscription sets.

Apply qualifier

The Apply qualifier associates an Apply program and one or more subscription sets. The Apply program is also associated with one control server, which contains the control tables for the subscription sets. The control tables are shared by more than one instance of the Apply program. You specify a case-sensitive string as the value for the Apply qualifier when you define a subscription set. ⁵

By using more than one Apply qualifier, you can run more than one instance of the Apply program from a single user ID. The Apply qualifier is used to identify records at the control server that define the work load of an instance of the Apply program; whereas the user ID is for authorization purposes only. For example, assume that you want to replicate data from two source databases to the target tables on your computer. The data in source table A is replicated using full-refresh copying to target table A, and the data in source table B is replicated using differential-refresh copying to target table B. You define two subscription sets (one for table A and one for table B), and you use separate Apply qualifiers to allow two instances of the Apply program to copy the data at different times.

Data manipulation

You might want to replicate only a subset of your source table, use a simple view to restructure the data from the source table to the target table, or use more complex joins and unions. You might also want to write applications to manipulate staged changes before they replicate, such as including some columns from file records.

Subsets of source tables

You can replicate certain columns or rows from the source table instead of replicating the whole source table. This process, which is sometimes known as table partitioning, is called column subsetting and row subsetting in this book.

Use column subsetting if you want to replicate only a subset of all of the columns from the source. This type of subsetting is appropriate, for example, if some of the columns in the source are very large, such as large objects (LOBs), or if the column data types are not supported by the intended target table.

Use row subsetting if you want to replicate only some of the rows from the source database. For example, when you are replicating data to more than one regional office, you might want to replicate only records that are relevant to that particular regional office. To subset rows, use the WHERE clause when defining the subscription-set member.

Views as sources

Simple views are useful in data warehouse scenarios if you want to restructure copies so that data in target tables is easily queried.

Joins and unions for targets

You can create and maintain target tables with contents that are joins or unions of existing source tables.

You can use the following types of joins:

Simple inner-joins over one or more defined replication sources, possibly in combination with other tables or views that are not themselves replication sources.
Simple inner-joins over CCD tables that are defined as replication sources. These CCD tables can be maintained by the Apply program or they can be maintained by another application for external data sources (such as DataPropagator NonRelational).

Specifically, you can use joins and unions to manipulate data in the following ways:

Joins of tables from a single DB2 source server (by defining a DB2 view as the join of certain tables)
Unions of tables from one source server (by using multiple subscription-set members in a set where each member has the same target table)
Unions of tables from multiple source servers, sometimes referred to as multisite unions (by creating multiple subscription-set members in multiple subscription sets because there are multiple source servers)

Target tables

When you define a subscription-set member, you must specify the type of target table that you want to use. The following types of tables are available:

User copy tables
Point-in-time tables
Aggregate tables
Consistent-change-data (CCD) tables
Replica or row-replica tables
User tables

The following sections describe the unique characteristics of each type of target table.

User copy tables

These tables are read-only copies of the replication source with no replication control columns added. They look like regular source tables and are a good starting point for replication. They are the most common type of target table.

Point-in-time tables

These tables are read-only copies of the replication source with a timestamp column added. The timestamp column is originally null. When changes are replicated, values are added to indicate the time when updates are made. Use these types of tables if you want to keep track of the time of changes.

Aggregate tables

These are read-only tables that use SQL column functions (such as SUM and AVG) to compute summaries of the entire contents of the source tables or of the recent changes made to the source table data. Rows are appended to aggregate tables over time. There are two kinds of aggregate tables: base aggregate tables and change aggregate tables.

Base aggregate tables summarize the contents of a source table. Use these types of tables for tracking the state of a source table on a regular basis. For example, assume that you want to know the average sales for a store on a day-by-day basis. If your source table has a row for each sales person, you could average the daily sales for everyone from the store in a base aggregate table. If the average sales for employees on Friday was $1 000, and the store was closed on Saturday and Sunday so that the average sales on each of those two days was $0, Table 1 shows the entries that would be made to a base aggregate table:

Table 1. Hypothetical entries in a base aggregate table

Friday Saturday Sunday
1 000 1 000 1 000

Change aggregate tables work with the change data in the control tables, not with the contents of the source table. Use these types of tables for tracking the changes made between each Apply program cycle. By setting a filter you can track recent UPDATE, INSERT, and DELETE operations collectively or individually. For example, assume that you want to know day-to-day changes for average sales for a branch. You could average the changes and store them in a change aggregate table. Therefore, if you use the average daily sales from the previous example ($1 000 on Friday, $0 on Saturday and Sunday), and store the averages of the changes in the change aggregate table, you would get the entries shown in Table 2.

Table 2. Hypothetical entries in a change aggregate table

Friday Saturday Sunday
1 000 0 0

Notice that the entries in the change aggregate table differ from those in the base aggregate table, even though the source data is the same.

Consistent-change-data (CCD) tables

These read-only tables contain data from committed transactions. CCD tables contain different data if they are condensed, noncondensed, complete, or noncomplete. A condensed CCD table contains only the most current value for a row, while a noncondensed CCD table contains a history of changes to the row. The Apply program appends rows to noncondensed tables; while it only updates rows that are already in condensed tables. A complete CCD table contains all the rows that you want to replicate from the source table; whereas a noncomplete CCD table is empty when it is created and has rows appended as changes are made to the source table. Use the type of table that meets your needs:

Condensed CCD tables are useful for staging changes to remote locations and for summarizing hot-spot updates before they are replicated to targets. Hot-spot updates are updates made repeatedly to the same rows over a short period of time. Condensing ensures that only the net change for a row is replicated to the target; therefore, it reduces the load on your network and prevents a hot-spot problem from occurring on the target. Use condensed, complete CCD tables if you want to initialize and maintain remote targets without accessing the original source each time that it is updated. Use condensed, noncomplete CCD tables to maintain remote targets by accessing the original source each time that it is updated.
Noncondensed CCD tables are used for auditing purposes. To collect audit information from only the time that you start collecting change data, use noncondensed, noncomplete CCD tables. These tables are initially empty, and information is appended by each UPDATE, INSERT, and DELETE operation. To collect complete audit information, use noncondensed, complete CCD tables. These tables start out as full copies of the source tables and are appended by each UPDATE, INSERT, and DELETE operation. They retain all of the information about the source tables and can be used if you need to make historical queries (for example, as of last Tuesday, a month ago, or yesterday).

Also, use consistent-change-data (CCD) tables if you want your change data maintained by these other change-capture mechanisms, instead of the Capture program:

DataPropagator NonRelational for change data from IMS
Data Difference utility for change data from VSAM
Capture triggers for change data from non-IBM data sources

Replica or row-replica tables

These are the only target tables that your applications can update directly. Changes made to replicas and row-replicas are replicated to the associated source table; the source table in turn replicates the changes to other replicas. Replicas are supported only in DB2 databases. A row-replica table is a special type of replica table for DB2 DataPropagator for Microsoft Jet. Use the replica table types for update-anywhere replication.

User tables

You don't actually specify a user table as a target; however, in update-anywhere replication, a user table is automatically a target for the replicas or row-replicas that are associated with it. The user table is the parent replica, and its copies are dependent replicas. The parent replica receives updates from a dependent replica and, if there are no conflicts detected, it replicates the changes to the other dependent replicas. The parent replica is the primary source of data. If there are any update conflicts detected, the contents of the parent replica prevail. Typically your applications access the dependent replica tables; however, they connect to the server containing the user table when the replicas are not available.

Schedule for applying updates

Synchronous replication delivers updates continually. When a change is made to the source data, it is immediately applied to the target database. A change is committed to the source database only after the change is replicated to the target database. If for some reason the change cannot be replicated to the target database, the change is not made to the source database. This type of replication is also called real-time replication. If your application requires synchronous updates, code your applications to update tables in a single, distributed transaction instead of using the products described in this book.

Asynchronous replication delivers updates in stages. When a change is made to the source data, it is stored temporarily for a preset interval and forwarded to the target at a later time. The interval can be a measure of time (seconds, minutes, hours) or can represent a prescribed event (midnight, or some other time of day). If changes cannot be made to a target database (for example, if the target database is down or the network is down), they are stored and applied later, in the order in which they were made to the source. This type of replication provides many benefits over synchronous replication: better use of network resources, less database contention, and the opportunity to enhance data before it reaches the target database.

DB2 DataPropagator performs asynchronous replication; therefore, changes made to the source are not made immediately to the targets. You can control how frequently the changes are applied to the target by specifying time intervals, events, or both. For environments that have occasionally connected clients, you can replicate data on demand.

Interval timing

This is the simplest method of controlling the timing of replication. To use interval timing, you choose a date and time for the Apply program to start replicating data to the target, and set a time interval that describes how frequently you want the data replicated. When the Apply program stops, it will not start again until the time interval passes. The time interval can be a period of time (from one minute to one year), or it can be continuous. A continuous time interval means that the Apply program starts replication cycles one after the other, with only a few seconds delay in-between (you can control the delay with the start parameter). The intervals that you provide are approximate. The interval actually used by the Apply program depends on the number of updates that the Apply program has to replicate and on the availability of resources (that is, database table, table space).

Event timing

This is the most precise method of controlling the timing of replication. To use event timing, you specify the name for an event when you define the subscription set, and you set the time when you want that event processed. Optionally, you can set an end-of-period time; the Apply program will not replicate any transactions committed after this time, but will defer their replication until a future date.

The information that you provide for event timing is stored in the subscription events table. The Apply program searches the subscription events table for the event name and the associated time and end-of-period information.

On-demand timing

For replication configurations that include occasionally connected systems, the Capture and Apply programs do not run continuously. You can replicate data to and from occasionally connected systems on demand by using the ASNCOPY or ASNSAT programs to operate the Capture and Apply components. These programs are described in Occasionally connected environments.

Footnotes:

⁵: The Apply qualifier appears in many control tables; therefore, do not attempt to change its value after you set it.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]