Tivoli Header

Administrator's Guide


Cluster Components

A cluster is a group of independent computers working together. When you use cluster configurations, you enhance the availability of your servers. Clustering allows you to join two to four Windows servers, or nodes, using a shared disk subsystem. This provides the nodes with the ability to share data, which provides high server availability.

Clusters consist of many components such as nodes, cluster objects, Microsoft Cluster Server (MSCS) virtual servers, and even the hardware and software. If any one of these components is missing, the cluster cannot work.

Understanding Nodes in a Cluster

Nodes have the following characteristics:

When a node starts, it searches for active nodes on the networks designated for internal communication. If it finds an active node, it attempts to join the node's cluster. If it cannot find an existing cluster, it attempts to form a cluster by taking control of the quorum resource. The quorum resource stores the most current version of the cluster database, which contains cluster configuration and state data. A server cluster maintains a consistent, updated copy of the cluster database on all active nodes.

A node can host physical or logical units, referred to as resources. Administrators organize these cluster resources into functional units called groups and assign these groups to individual nodes. If a node fails, the server cluster transfers the groups that were being hosted by the node to other nodes in the cluster. This transfer process is called failover. The reverse process, failback, occurs when the failed node becomes active again and the groups that were failed over to the other nodes are transferred back to the original node.

Cluster Objects

Nodes, resources, and groups are three kinds of cluster objects. The others are networks, network interfaces, and resource types. All server cluster objects are associated with a set of properties, with data values that describe an object's identity and behavior in the cluster. Administrators manage cluster objects by manipulating their properties, typically through a cluster management application such as Cluster Administrator (part of the MSCS application).

MSCS Virtual Servers

MSCS lets you place TSM server cluster resources into a virtual server. A virtual server is an MSCS cluster group that looks like a Windows server. The virtual server has a network name, an IP address, one or more physical disks, and a service. A TSM server can be one of the virtual services provided by an MSCS virtual server.

The virtual server name is independent of the name of the physical node on which the virtual server runs. The virtual server name and address migrate from node to node with the virtual server. Clients connect to a TSM server using the virtual server name, rather than the Windows server name. The virtual server name is implemented as a cluster network name resource and maps to a primary or backup node. The mapping is dependent on where the virtual server currently resides. Any client that uses WINS or directory services to locate servers can automatically track the virtual server as it moves between nodes. Automatically tracking the virtual server does not require client modification or reconfiguration.

As mentioned earlier, each virtual server has its own disk as part of a cluster resource group. Therefore, they cannot share data. Each TSM server that has been implemented as a virtual server has its database, recovery log, and set of storage pool volumes on a separate disk owned by that virtual server.

Because the server's location is transparent to client applications, this affords TSM the maximum ease of failover and failback, while minimizing the impact on the TSM clients.

Note:
MSCS only supports an IP Address as a resource. This means that any TSM server running on a cluster must limit its supported communication method to just TCP/IP. Any client not using TCP/IP as a communication method will not be able to reach the virtual server if it should failover to the other cluster node.

The following example demonstrates the way the MSCS virtual server concept works.

Assume a clustered TSM server called TSMSERVER1 is running on node A and a clustered TSM server called TSMSERVER2 is running on node B. Clients connect to the TSM server TSMSERVER1 and the TSM server TSMSERVER2 without knowing which node currently hosts their server. The MSCS concept of a virtual server ensures that the server's location is transparent to client applications. To the client, it appears that the TSM server is running on a virtual server called TSMSERVER1.

Figure 39. Clustering with TSMSERVER1 as Node A and TSMSERVER2 as Node B

Clustering with TSMSERVER1 as Node A and TSMSERVER2 as Node B

When one of the software or hardware resources fails, failover occurs. Resources (for example: applications, disks, or an IP address) migrate from the failed node to the remaining node. The remaining node takes over the TSM server resource group, restarts the TSM service, and provides access to administrators and clients.

If node A fails, node B assumes the role of running TSMSERVER1. To a client, it is exactly as if node A were turned off and immediately turned back on again. Clients experience the loss of all connections to TSMSERVER1 and all active transactions are rolled back to the client. Clients must reconnect to TSMSERVER1 after this occurs. The location of TSMSERVER1 is transparent to the client.

Figure 40. Clustering with TSMSERVER2 as Node B and assuming the role of TSMSERVER1 as Node A

Clustering with TSMSERVER2 as Node B and Assuming the Role of TSMSERVER1 as Node A

Hardware and Software Considerations

Generally, the following considerations are accounted for during the cluster planning stage and before actual installation. However, because of their importance to the overall success of a working cluster, these considerations are restated here.

  1. Decide the cluster configuration you need to use with servers using disk devices. Each virtual server needs a separate set of disk resources on the shared disk subsystem. Therefore, you may have problems if you configure the I/O subsystem as one large array when configuring a two server cluster and later decide to expand to a four server cluster.
  2. Identify the disk resources to be dedicated to TSM. A shared disk should not be divided into multiple partitions with each partition assigned to a different application and thus a different cluster group. For example, Application A, a stable application, could be forced to failover due to a software problem with Application B if both applications use partitions that are part of the same physical disk. This could happen, for example, when a software problem with Application B occurs. This problem causes the Cluster Services to failover Application B and its corequisite disk resource. Because the partitions exist on the same physical drive, Application A is also forced to failover. Therefore, we recommend that you dedicate a shared disk as a single failable resource along with the Tivoli Storage Manager application.
  3. Ensure that you have enough IP addresses. Microsoft recommends at least seven addresses to set up a cluster involving two TSM virtual servers.
  4. Obtain network names for each TSM server instance in the configuration. For a cluster involving two TSM virtual servers, two network names are required and they must be associated with the IP Addresses set aside for each TSM server.
  5. Each TSM server instance requires a cluster resource group. Initially, the group should contain only disk resources. You can create a new group and move disk resources to it. You may choose just to rename an existing resource group that contains only disk resources.
  6. TSM is installed to a local disk on each node in the cluster. Determine the disk to be used on each node. We strongly recommended that the same drive letter be used on each machine.
  7. MSCS does not provide for resource management for SCSI tape devices. However, TSM provides tape failover support that requires an additional shared SCSI bus between two nodes in the cluster. If you choose not to use TSM tape failover support, you can attach tape devices in either of the following configurations:
    Attach to the node on which the TSM server instance is currently active. Attach to a third, nonclustered system on which an additional instance of the TSM server is active.
    This configuration allows high performance backup and restore. However, it is not entirely automated. Operator intervention is required to service a failover where repair delays take more than 2 days. This configuration may not be acceptable in installations with low bandwidth communications between the servers in the cluster and the tape device controller server.
    Define enough disk-based data volume space to keep more than 2 days worth of average data. Define enough disk-based data volume space to keep more than 2 days worth of average data.
    Set up a storage pool hierarchy so that data is migrated efficiently to the tape device. Use the virtual volumes to enable migration of the data from the local disk volumes to the tape device.
    When a failover occurs, manually disconnect the tape device and reattach it to the node on which the server was newly active. When a failover occurs, no operator intervention is required; the newly active server continues to use the virtual volumes as before.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]