![]() |
![]() |
A cluster is a set of independent computers working together as a single system. This grouping ensures that mission-critical applications and resources are as highly available as possible. The cluster is managed as a single system and is specifically designed to tolerate component failures in a way that is transparent to users. Clustered systems have several advantages: fault-tolerance, high availability, and simplified management.
Microsoft Cluster Server (MSCS) is a feature of Windows. It is software that supports the connection of two or more computers into a cluster. The software provides services such as failure detection, recovery, and the ability to manage the cluster as a single system.
An MSCS cluster consists of nodes, individual computers complete with their own processor, memory, and system disks. Nodes in an MSCS cluster must have access to at least one shared disk. The data files, Internet Protocol (IP) addresses, network shares, and other parts of the installed server applications on the nodes are the cluster's resources. A resource can be active on only one node at a time. When the cluster detects that a resource has failed, it relocates the failed resource to a different node.
MSCS organizes resources into relational groups. One type of relationship is a dependency relationship. For example, an application requires a network name and Transmission Control Protocol/Internet Protocol (TCP/IP) address to become active before the service comes online. You can specify the TCP/IP address resource, network name resource, and service as resources that belong to the same group. When MSCS detects a resource failure, it moves all the resources in the failed resource's group to a different node and restarts the failed resource. In addition, you can establish dependencies between various resources in the same group so that they come online in a specific order.
When configuring a cluster resource group, you can designate which node in the cluster takes over when another node in the cluster fails. You can assign one or more nodes as possible owners of the group being failed over. In addition, you can indicate the order in which the cluster should select the new owner. In this way you are defining the failover pattern for the resource group.
From the outside, an MSCS cluster appears to be one computer because MSCS supports the concept of virtual servers. MSCS creates a virtual server to represent a particular application. When MSCS moves a virtual server from a failed system to a working node in the cluster, clients are not aware of the change. This is because they are talking to the virtual server. Clients do not talk to the node to which the virtual server is mapped. As a result of the move, a client might notice only a pause in service.
The TSM server is configured as an MSCS virtual server. The cluster resource group that is the virtual server contains a Network Name resource, and IP resource, one or more physical disk resources, and a TSM server resource. The virtual server name is independent of the name of the physical node on which the virtual server runs. The virtual server name and address migrate from node to node with the virtual server. Because the virtual server cannot share data, each virtual server has a separate database, recovery log, and set of storage pool volumes.
Each TSM virtual server must contain a TSM server instance that is unique across all the nodes of the cluster. The TSM wizards used to configure the TSM virtual server group enforce this restriction. However, you can configure multiple TSM virtual servers into the cluster. In addition each TSM virtual server must have a private set of disk resources. Although nodes can share disk resources, only one node can actively control a disk at a time.
Typically a TSM server uses tape devices a great deal. MSCS does not support the failover of tape devices. This can become a problem that is difficult to handle and reduces the effectiveness of the cluster. To solve this problem, TSM supports tape device failover. Rather than provide a generic solution, TSM has provided a solution based on a specific hardware and software combination that will support tape failover.
TSM tape failover support does not change the basic concept of a TSM virtual server. All the same resources are still required. Tape failover support is additional capability that can be added to a TSM virtual server. Even though Windows 2000 Datacenter Server supports a 4 node cluster, TSM Tape Failover support will only function on 2 nodes of the cluster.
TSM makes use of a shared SCSI bus connecting the two nodes of the cluster that will host the TSM virtual server. This requires that each node must contain an additional SCSI adapter card. The tape devices (library and drives) are connected to this shared bus. When failover occurs, the TSM server issues a SCSI bus reset during initialization. The bus reset is expected to clear any SCSI reserves held on the tape devices. This allows the server to acquire the devices after the failover.
To set up a cluster requires considerable planning by the Administrator. You need to answer the following questions and it is recommended you record the critical information on the Cluster Configuration worksheet (described a little latter in this write-up).
Only certain versions of Windows will support more than a two node cluster. The use of tape failover support also affects the pattern.
You should consider how tape devices will be used by the TSM virtual server. Remember that this limits the number of nodes in the failover pattern to two.
Attach to the node on which the TSM server instance is currently active. | Attach to a third, non-clustered system on which an additional instance of the TSM server is active. |
---|---|
This configuration allows high performance backup and restore. However, it is not entirely automated. Operator intervention is required to service a failover where repair delays take more than 2 days. | This configuration may not be acceptable in installations with low bandwidth communications between the servers in the cluster and the tape device controller server. |
Define enough disk-based data volume space to keep more than 2 days worth of average data. | Define enough disk-based data volume space to keep more than 2 days worth of average data. |
Set up a storage pool hierarchy so that data is migrated efficiently to the tape device. | Use the virtual volumes to enable migration of the data from the local disk volumes to the tape device. |
When a failover occurs, manually disconnect the tape device and reattach it to the node on which the server was newly active. | When a failover occurs, no operator intervention is required; the newly active server continues to use the virtual volumes as before. |
This section describes a specific hardware and software configuration for
TSM tape failover support. Currently, it is the only configuration that
has been tested and that is officially supported. Other configurations
might also work, but they have not yet been tested by IBM. Table 24 describes the hardware and software tested for use with TSM
tape failover.
Table 24. Hardware and Software Supported for Tape Failover
Operating System | Windows 2000 Advanced Server or Datacenter Server |
SCSI Adaptor | Adaptec AHA-2944UW PCI SCSI Controller |
SCSI Tape Library | IBM 3590-B11 or
IBM 3570-C12 |
To use TSM tape failover, do the following before installing and configuring TSM:
The following presents methods for teminating the shared SCSI bus. You must terminate the shared SCSI bus as part of the initial setup of SCSI tape failover. Also, the shared SCSI bus must be terminated before you bring a server back online.
There are several different methods that can be used to terminate the shared SCSI bus:
SCSI controllers have internal termination that can be used to terminate the bus, however this method is not recommended with Cluster Server. If a node is offline with this configuration, the SCSI bus will not be properly terminated and will not operate correctly.
Storage enclosures also have internal termination. This can be used to terminate the SCSI bus if the enclosure is at the end of the SCSI bus.
Y cables can be connected to devices if the device is at the end of the SCSI bus. A terminator can then be attached to one branch of the Y cable in order to terminate the SCSI bus. This method of termination requires either disabling or removing any internal terminators the device may have.
Trilink connectors can be connected to certain devices. If the device is at the end of the bus, a trilink connector can be used to terminate the bus. This method of termination requires either disabling or removing any internal terminators the device may have.
To install TSM in a cluster, requires that the cluster be fully functional. Thus MSCS must be installed and configured on your servers. It is not the intent of this publication to duplicate the MSCS documentation that explains how this is done. Instead you will want to check the end results of that installation by doing the following:
Prepare one or more cluster groups. Each TSM server instance requires a cluster resource group. Initially, the group must contain only disk resources. You can create a new group and move disk resources to it. You can choose to rename an existing resource group that contains only disk resources. Use the Cluster Administrator program on the computer that owns the shared disk or tape resource to prepare your resource group.
As you construct your resource groups consider the following:
On every node that will host a TSM virtual server, install TSM.
The TSM cluster configuration procedure must be performed on the set of nodes that will host a TSM virtual server. However, the steps in the procedure vary depending upon which node you are currently configuring. When configuring the first node in the set, the TSM server instance is actually created and configured. When configuring the remaining nodes in the set, each node is updated in such a way that permits it to host the TSM server instance created on the first node. A TSM server must be installed on the first node in the set before configuring the remaining nodes in the set. Violating this requirement will cause your configuration to fail. It is also recommended that you completely configure one virtual server before moving on to the next when configuring multiple virtual servers. Because you are dealing with separate IP addresses and network names for each virtual server, you lessen the possibility of mistakes by configuring each virtual server separately.
From within the TSM Console:
The first page of the wizard is an introduction page. Your input starts on the second page displayed which is titled Select the Cluster Group.
After completing the Initial configuration, you will stop the server instance and get ready to configure the next server in the set of nodes.
After you complete a TSM install on the first node, you can configure TSM on the remaining nodes in the set.
The TSM Cluster Configuration Wizard has started and the second page displayed is the Select the Cluster Group page.
At this point, you have completed the configuration of another node in the set of nodes. If this is the last node in the set, the cluster has been updated and the resource group has been completed. The TSM virtual server is now functional and must be managed from the Cluster Administrator.
To create additional TSM virtual servers, you need to create an additional cluster group with a unique name. You must also provide a unique set of cluster resources for the new virtual server.