IBM Books

Replication Guide and Reference


Deciding what to replicate

As part of planning for replication, you need to consider how the data will be used at the target site. Often, the source data will need to be subsetted, transformed, or enhanced for decision-support or data-warehousing applications. This section sorts these requirements into those that are easily fulfilled by using the DB2 Control Center and those that require direct manipulation of the control tables.

The Control Center and DJRA support the following data manipulations:

The following sections describe the data manipulations that you can perform using the Control Center. This chapter also describes replicating large-object (LOB) data, limits for column names for before-image data, and data-type restrictions.

Subsetting columns and rows

IBM Replication supports both column (vertical) and row (horizontal) subsetting of the source table. This means that you can specify that only a subset of the source table columns and rows be replicated to the target table, rather than all of the columns and rows:

Column subsetting
In some replication scenarios, you might not want to replicate all columns to the target table, or the target table might not support all data types defined for the source table. You can define a column subset that has fewer columns than your source table. Column subsetting is available for all tables except replica tables.

You can define column subsetting at one of two times:

Recommendation: When you define a replication source, select all columns (that is, do not subset any of them). Create your column subsets when you define subscription sets. By defining your column subsets in the subscriptions rather than in the replication sources, you will not have to redefine your replication sources if your subscription requirements change.

Row subsetting
In some replication scenarios, you might want to replicate different data from a source table to several target tables. You can define a row subset that contains rows matching a certain condition (a WHERE clause), for example, all rows for department "J35".

Use the advanced subscription options to define a WHERE clause when you define the subscription. All target table types support row subsetting.

If the primary key values of the target table will be updated, or the table (or view) contains a logical partitioning column that will be updated, you must specify replication logical-partitioning-key support when you define the replication source. Replication logical-partitioning-key support performs an UPDATE as a DELETE followed by an INSERT. See Enabling replication logical-partitioning-key support for more information.

Replicating joins using views

Join views fill many requirements, both for denormalizing (restructuring) copies in data-warehouse scenarios, which enables simpler queries of copied data, and also for addressing the routing problem, sometimes called the database partitioning problem in distributed computing scenarios. 8 Views are also useful when you need to specify predicates for a row subset that exceed 512 bytes (the capacity of the PREDICATES column of the subscription-targets-member control table). Thus, you can choose to manage your subset predicates using views rather than as part of the subscription-set definition.

To define a join view as a replication source, first define all tables that participate in the join as replication sources (you do not need to define subscriptions for them). Then, use the Control Center or DJRA to define the join. If the replication sources defined in the join have CD or CCD tables, the Control Center or DJRA creates a CD view from the replication sources' CD tables.

IBM Replication supports the following types of view definitions:

Replicating before and after images

You can define both before and after images in your replication sources and subscriptions. A before-image column is a copy of a column before it is updated, and an after-image column is a copy of a column after it is updated. DB2 logs both the before-image and after-image columns of a table for each change to that table. Replicating before images can be useful for auditing purposes, and is required for update-anywhere scenarios.

The before and after images have different values for different actions, as shown below:

Action
Column Value

Full refresh
All before-image columns have a NULL value.

Insert
The before-image column has a NULL value.

Update
Column values before the change are captured in the before-image columns; values after the change are in the after-image columns.

When you enable logical-partitioning key support, the before-image column appears in the deleted column and the after-image column appears in the inserted column. See Enabling replication logical-partitioning-key support for more information.

Delete
Both the before-image and after-image columns contain the before-image value.

Before images do not make sense for base aggregate target-table types (there is no before image for computed columns). All other target-table types can make use of before-image columns.

Renaming columns

You can rename columns for point-in-time and user-copy target-table types. For other table types, you must define views in order to rename columns.

Creating computed columns

Using SQL, you can derive new columns from existing source columns. For aggregate target-table types, you can define new columns by using aggregate functions such as COUNT or SUM. For other table types, you can define new columns using SQL expressions.

You can also create computed columns by specifying user-defined functions when you create a table using the DB2 Control Center.

Using stored procedures for before and after run-time processing

You can define run-time processing statements using SQL statements or stored procedures that can run before or after the Apply program processes a subscription set. Such statements can be useful for pruning CCD tables and controlling the sequence in which subscription sets are processed. You can run the run-time processing statements at the source server before a subscription set is processed, or at both the source and target servers before or after a subscription set is processed. For example, you can execute SQL statements before retrieving the data, after replicating it to the target tables, or both.

Stored procedures use the SQL CALL statement without parameters. If the source table is in a non-IBM database, DB2 DataJoiner processes the SQL statements. The procedure name must be eight characters or less in length. The run-time procedures of each type are executed together as a single transaction. You can also define acceptable SQLSTATEs for each statement.

Depending on the DB2 platform, the SQL before and after processing statements can perform other processing, such as calling stored procedures.

Replicating large objects

DB2 Universal Database and DB2 for OS/390 V6 support large object (LOB) data types, which include: binary LOB (BLOB), character LOB (CLOB), and double-byte character LOB (DBCLOB). This section refers to all of these types as LOB data.

The Capture program reads the LOB descriptor to determine if any data in the LOB column has changed and thus should be replicated, but does not copy the LOB data to the CD tables. When a LOB column changes, the Capture program sets an indicator in the CD tables. When the Apply program reads this indicator, it then copies the entire LOB column (not just the changed portions of LOB columns) directly from the source table to the target table.

To allow the Capture program to detect changes to LOB data, you must include the DATA CAPTURE CHANGES keywords when you create (or alter) the source table.

Because a LOB column can contain up to two gigabytes of data, you must ensure that you have sufficient network bandwidth for the Apply program. Likewise, your target tables must have sufficient disk space to accommodate LOB data.

Restrictions:

Limits on column names for capturing before-image data

DB2 DataPropagator limits column names to 17 characters. Because DB2 DataPropagator adds a before-image column identifier (usually X) to target tables and because you must ensure that each column name is unique, you cannot use longer column names for those tables you replicate. For tables you do not plan to replicate, you can use longer column names, but consider using 17-character names in case you might want to replicate these tables in the future. For tables in DB2 for OS/390, you can use 18-character column names, but DB2 DataPropagator will replace the 18th character with the before-image column identifier in target tables, so you must ensure that the first 17 characters of the name are unique.

Data restrictions

Currently, DB2 DataPropagator cannot replicate certain types of data. The major restrictions are:


Footnotes:

8
For example, knowing where to send a bank account update may require a join of the account table with the customer table to determine which branch of the bank the customer deals with. Typically, production databases are normalized so that the geographic details, such as branch-number, are not stored redundantly throughout the database.

9
The CCD tables for simple inner-joins must be complete and condensed. See Staging data.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]