IBM Books

Replication Guide and Reference


DB2 data replication concepts

This section introduces some of the important concepts of DB2 data replication. You should read the whole section to get a complete overview.

Replication sources

A Replication source is a user table or view from which you want data copied. Before you can replicate data, you must define a replication source to describe the information that the change-capture mechanisms will use. When you define a replication source, you define the following attributes:

Full-refresh and differential-refresh copying

The Apply program copies data from the source to the target either by full-refresh only or differential-refresh copying.

During full-refresh only copying, the Apply program performs these steps:

  1. Empties (deletes) all of the rows from the target table

  2. Reads all of the rows from the source table

  3. Copies the rows to the target table

During differential-refresh copying, the Apply program copies only the changed data to the target table.

After-image columns and before-image columns

An after-image column contains the value of a data column in a source table after the value in that data column is updated. A before-image column contains the value of a data column in a source table before that data column is updated. When you define a replication source, you can choose to capture only the after-image or both the after-image and the before-image. Your decision will depend both on the way in which you plan to use the data and on the types of tables that you are using.

Before-image columns are useful if your applications require auditing or rollback capability. Some restrictions apply to how you use these columns, and they are discussed later in this book (Replicating before and after images).

Levels of conflict detection

Conflict detection pertains only to update-anywhere replication configurations. It is the process of detecting if the same row was updated in the source and target tables during the same replication cycle. With standard conflict detection, the Apply program searches for conflicts in rows that are already captured in the CD tables. With enhanced conflict detection, the Apply program locks all of the target tables, thus preventing further transactions until the current changes are captured. Row-replica conflict detection applies only to tables that are maintained by DataPropagator for Microsoft Jet; where conflicts are detected on a row-by-row basis instead of a transaction-by-transaction basis.

Subscription sets and subscription-set members

Before you can replicate data from the replication source, you must associate the replication source with the target to which you want the changes replicated. You define this information using subscription sets and subscription-set members. The information that you provide is stored in various replication control tables.

A subscription set contains the attributes of a replication subscription. When you create a subscription set, you define the following attributes:

A subscription set must have one subscription-set member for each target table or view. When you create a subscription-set member, you define the following attributes:

Subscription sets ensure that all subscription-set members are treated alike during replication: either changes are applied to all targets or to none of them. The changed data for all the subscription-set members in a subscription set is replicated to the specified target tables in a single transaction. Subscription sets optimize performance because the target tables in a set are processed in one transaction against the target server. Subscription sets also preserve referential integrity.

Each subscription set is processed by one Apply program; however, each Apply program can process many subscription sets.

Apply qualifier

The Apply qualifier associates an Apply program and one or more subscription sets. The Apply program is also associated with one control server, which contains the control tables for the subscription sets. The control tables are shared by more than one instance of the Apply program. You specify a case-sensitive string as the value for the Apply qualifier when you define a subscription set. 5

By using more than one Apply qualifier, you can run more than one instance of the Apply program from a single user ID. The Apply qualifier is used to identify records at the control server that define the work load of an instance of the Apply program; whereas the user ID is for authorization purposes only. For example, assume that you want to replicate data from two source databases to the target tables on your computer. The data in source table A is replicated using full-refresh copying to target table A, and the data in source table B is replicated using differential-refresh copying to target table B. You define two subscription sets (one for table A and one for table B), and you use separate Apply qualifiers to allow two instances of the Apply program to copy the data at different times.

Data manipulation

You might want to replicate only a subset of your source table, use a simple view to restructure the data from the source table to the target table, or use more complex joins and unions. You might also want to write applications to manipulate staged changes before they replicate, such as including some columns from file records.

Subsets of source tables

You can replicate certain columns or rows from the source table instead of replicating the whole source table. This process, which is sometimes known as table partitioning, is called column subsetting and row subsetting in this book.

Use column subsetting if you want to replicate only a subset of all of the columns from the source. This type of subsetting is appropriate, for example, if some of the columns in the source are very large, such as large objects (LOBs), or if the column data types are not supported by the intended target table.

Use row subsetting if you want to replicate only some of the rows from the source database. For example, when you are replicating data to more than one regional office, you might want to replicate only records that are relevant to that particular regional office. To subset rows, use the WHERE clause when defining the subscription-set member.

Views as sources

Simple views are useful in data warehouse scenarios if you want to restructure copies so that data in target tables is easily queried.

Joins and unions for targets

You can create and maintain target tables with contents that are joins or unions of existing source tables.

You can use the following types of joins:

Specifically, you can use joins and unions to manipulate data in the following ways:

Target tables

When you define a subscription-set member, you must specify the type of target table that you want to use. The following types of tables are available:

The following sections describe the unique characteristics of each type of target table.

User copy tables

These tables are read-only copies of the replication source with no replication control columns added. They look like regular source tables and are a good starting point for replication. They are the most common type of target table.

Point-in-time tables

These tables are read-only copies of the replication source with a timestamp column added. The timestamp column is originally null. When changes are replicated, values are added to indicate the time when updates are made. Use these types of tables if you want to keep track of the time of changes.

Aggregate tables

These are read-only tables that use SQL column functions (such as SUM and AVG) to compute summaries of the entire contents of the source tables or of the recent changes made to the source table data. Rows are appended to aggregate tables over time. There are two kinds of aggregate tables: base aggregate tables and change aggregate tables.

Base aggregate tables summarize the contents of a source table. Use these types of tables for tracking the state of a source table on a regular basis. For example, assume that you want to know the average sales for a store on a day-by-day basis. If your source table has a row for each sales person, you could average the daily sales for everyone from the store in a base aggregate table. If the average sales for employees on Friday was $1 000, and the store was closed on Saturday and Sunday so that the average sales on each of those two days was $0, Table 1 shows the entries that would be made to a base aggregate table:

Table 1. Hypothetical entries in a base aggregate table
Friday Saturday Sunday
1 000 1 000 1 000

Change aggregate tables work with the change data in the control tables, not with the contents of the source table. Use these types of tables for tracking the changes made between each Apply program cycle. By setting a filter you can track recent UPDATE, INSERT, and DELETE operations collectively or individually. For example, assume that you want to know day-to-day changes for average sales for a branch. You could average the changes and store them in a change aggregate table. Therefore, if you use the average daily sales from the previous example ($1 000 on Friday, $0 on Saturday and Sunday), and store the averages of the changes in the change aggregate table, you would get the entries shown in Table 2.

Table 2. Hypothetical entries in a change aggregate table
Friday Saturday Sunday
1 000 0 0

Notice that the entries in the change aggregate table differ from those in the base aggregate table, even though the source data is the same.

Consistent-change-data (CCD) tables

These read-only tables contain data from committed transactions. CCD tables contain different data if they are condensed, noncondensed, complete, or noncomplete. A condensed CCD table contains only the most current value for a row, while a noncondensed CCD table contains a history of changes to the row. The Apply program appends rows to noncondensed tables; while it only updates rows that are already in condensed tables. A complete CCD table contains all the rows that you want to replicate from the source table; whereas a noncomplete CCD table is empty when it is created and has rows appended as changes are made to the source table. Use the type of table that meets your needs:

Also, use consistent-change-data (CCD) tables if you want your change data maintained by these other change-capture mechanisms, instead of the Capture program:

Replica or row-replica tables

These are the only target tables that your applications can update directly. Changes made to replicas and row-replicas are replicated to the associated source table; the source table in turn replicates the changes to other replicas. Replicas are supported only in DB2 databases. A row-replica table is a special type of replica table for DB2 DataPropagator for Microsoft Jet. Use the replica table types for update-anywhere replication.

User tables

You don't actually specify a user table as a target; however, in update-anywhere replication, a user table is automatically a target for the replicas or row-replicas that are associated with it. The user table is the parent replica, and its copies are dependent replicas. The parent replica receives updates from a dependent replica and, if there are no conflicts detected, it replicates the changes to the other dependent replicas. The parent replica is the primary source of data. If there are any update conflicts detected, the contents of the parent replica prevail. Typically your applications access the dependent replica tables; however, they connect to the server containing the user table when the replicas are not available.

Schedule for applying updates

Synchronous replication delivers updates continually. When a change is made to the source data, it is immediately applied to the target database. A change is committed to the source database only after the change is replicated to the target database. If for some reason the change cannot be replicated to the target database, the change is not made to the source database. This type of replication is also called real-time replication. If your application requires synchronous updates, code your applications to update tables in a single, distributed transaction instead of using the products described in this book.

Asynchronous replication delivers updates in stages. When a change is made to the source data, it is stored temporarily for a preset interval and forwarded to the target at a later time. The interval can be a measure of time (seconds, minutes, hours) or can represent a prescribed event (midnight, or some other time of day). If changes cannot be made to a target database (for example, if the target database is down or the network is down), they are stored and applied later, in the order in which they were made to the source. This type of replication provides many benefits over synchronous replication: better use of network resources, less database contention, and the opportunity to enhance data before it reaches the target database.

DB2 DataPropagator performs asynchronous replication; therefore, changes made to the source are not made immediately to the targets. You can control how frequently the changes are applied to the target by specifying time intervals, events, or both. For environments that have occasionally connected clients, you can replicate data on demand.

Interval timing

This is the simplest method of controlling the timing of replication. To use interval timing, you choose a date and time for the Apply program to start replicating data to the target, and set a time interval that describes how frequently you want the data replicated. When the Apply program stops, it will not start again until the time interval passes. The time interval can be a period of time (from one minute to one year), or it can be continuous. A continuous time interval means that the Apply program starts replication cycles one after the other, with only a few seconds delay in-between (you can control the delay with the start parameter). The intervals that you provide are approximate. The interval actually used by the Apply program depends on the number of updates that the Apply program has to replicate and on the availability of resources (that is, database table, table space).

Event timing

This is the most precise method of controlling the timing of replication. To use event timing, you specify the name for an event when you define the subscription set, and you set the time when you want that event processed. Optionally, you can set an end-of-period time; the Apply program will not replicate any transactions committed after this time, but will defer their replication until a future date.

The information that you provide for event timing is stored in the subscription events table. The Apply program searches the subscription events table for the event name and the associated time and end-of-period information.

On-demand timing

For replication configurations that include occasionally connected systems, the Capture and Apply programs do not run continuously. You can replicate data to and from occasionally connected systems on demand by using the ASNCOPY or ASNSAT programs to operate the Capture and Apply components. These programs are described in Occasionally connected environments.


Footnotes:

5
The Apply qualifier appears in many control tables; therefore, do not attempt to change its value after you set it.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]