Clustering Q&A: Deploying Clusters

Clustering Q&A: Deploying Clusters

Last updated on April 12, 1999

Cluster Deployment

Deployment
Software Licensing
Troubleshooting
Developer Issues

Deployment

What are the support services available for MSCS?
Microsoft^® Windows^® NT Server Enterprise Edition, including MSCS, is eligible for support from all of Microsoft's customer support resources, including Enterprise Phone Support, Premier Support Technical Account Managers, and Microsoft Consulting Services. In addition, MSCS customers can acquire training from Microsoft Authorized Training and Education Centers (ATECs), support services from the system vendors providing MSCS-validated cluster configurations, and value-added services from Microsoft Solution Providers that choose to offer MSCS-related services.

Will Microsoft extend the Microsoft Certified Professional (MCP) program to include certification of cluster-related skills?
Microsoft has no immediate plans to include cluster-related certification in the MCP program. Cluster-related certification is being considered for future updates to the program.

In the two-server cluster configuration, should the second server be a "hot standby," or can the two servers be running separate jobs up until the time when one fails and the other takes over?
MSCS provides true "active/active clustering," which means every machine in the cluster is available to do real work, and each machine in the cluster is also available to recover the resources and workload of any other machine in the cluster. Thus, there is no need to have a wasted, idle server standing by waiting for a failure. Of course, a customer might choose to run a light workload or a noncritical function that can be easily preempted on one of the machines in an MSCS cluster if they want to make sure there's sufficient processing power available for recovery of performance-sensitive workload.

Besides clustering, what else should be done to provide highly available Windows NT Server services?
MSCS complements other high-availability techniques such as data mirroring, RAID disk protection, uninterruptible power supplies, and duplicated hardware such as fans and network interface cards. The availability role of MSCS is to automatically restore user access to data and services following the failure of individual applications or servers. MSCS and other high-availability technology should be used in concert with prudent IT administration procedures for data backup and disaster-site recovery to ensure continuous availability of mission-critical IT resources.

Does client software have to be updated to take advantage of an MSCS cluster?
No. MSCS does not require any special software on the client for transparent recovery of services that connect to clients through standard IP protocols, such as Web sites or Windows file shares. Note that, since server resources and applications can potentially be unavailable for up to a minute or so during MSCS recovery procedures, the client component of a client/server application should ideally be able to gracefully handle pauses in service. However, that characteristic is already common in Microsoft client software, browsers, and most modern packaged applications.

Does an application need to be installed separately on both servers in a cluster?
It depends on the application. However, typically each application that is part of a cluster group must be installed separately on both nodes so that it can be started on either node during a failover. This can be done by (1) "failing over" the application's disks on the shared SCSI bus to the first server, (2) installing the application on the first server using those disks for application files, (3) failing the disks over to the other server, and (4) repeating the installation process on the second server, using the same disks. Increasingly, customers will find that enterprise applications come with setup routines that automate the installation on MSCS clusters. Examples of this type of product today include Microsoft SQL Server Enterprise Edition, Microsoft Exchange Server Enterprise Edition, and Oracle Failsafe.

Suppose there are several services running on one node (say, Internet Information Server (IIS), Windows NT Server's built-in Web server; SQL; and Exchange). On the failure of that node, can you set up the cluster so that one service fails over to the second node?
Yes. Only the services you set up in the MSCS cluster administration console will failover. If you only set up one service to failover, then the other two will not failover.

Should servers in a cluster be directory service (domain) controllers?
Domain controllers already have their own high availability backup capability, so there are no additional restrictions or issues related to clusters. For example, without an MSCS cluster:

If you have a Primary Domain Controller (PDC) and a Backup Domain Controller (BDC) and one of them fails, the other is still available to process logons.
If you have two BDCs and one of them fails, the other is still available.
If you have a single PDC and it fails, then you have no domain controller.

All of this is true if the servers are in a cluster. MSCS neither adds nor subtracts from the current high availability capabilities of Windows NT Directory Services.

Should servers in an MSCS cluster use the Microsoft Distributed File System (Dfs)?
With the current release 4.0 of Windows NT® Server Enterprise Edition, clustered servers should not use Dfs for file shares that are setup as cluster resources for high availability. Dfs directory failover is not supported at this time. Microsoft plans to enhance Dfs in Windows 2000 Server so that its directory will failover when used in an Enterprise Edition cluster.

How can MSCS help do load balancing between Web servers?
The two most common techniques used to load-balance between multiple mirrors of a Web site are Network Address Translation (NAT) routing, and DNS round-robin routing. Cisco and other vendors sell routers that use NAT as well as some sort of load balancing. A site has one URL and one IP address. If a server goes down, the router sees this and stops sending requests to the Web server. This offers good performance and easy manageability, but these NAT routers can be expensive.

An easier, less expensive technique is to use simple round-robin DNS to split requests among a number of Web servers that all have the same data on them. A site has one URL, but several IP addresses, and loads are randomly distributed across all of the IP addresses. A problem with round-robin DNS is that, if a server goes down, someone typically has to manually remove the IP address from the DNS round-robin list.

MSCS can complement round-robin routing by eliminating the need to manually remove failed IP addresses from the round-robin list. You set up an MSCS cluster running IIS on each server with each site's Web files on the shared SCSI bus. You synchronize the data between the two sites. If one of the servers fails, the virtual root of the failed machine is transferred to the other server in the cluster along with its IP addresses, so both sites continue to serve customers. And, once the failed server resumes operation, MSCS can automatically "fail back" its virtual root to rebalance the workload.

I need to create many file shares. Is there an alternative to doing them one at a time through the MSCS New Resource Wizard?
One answer would be to write a resource DLL patterned after the SMB Share sample to manage the shares. It would use API calls to create the shares when coming online and "destroy the shares when going offline.

What are the criteria for running a resource in a separate cluster resource monitor?
The tradeoff is extra isolation from application/resource failures, versus more consumption of server resources by MSCS. You should run a resource in a separate resource monitor when testing a new resource DLL. This assures that, if the resource DLL compromises the resource monitor, it won't affect the core cluster services of MSCS.

Should the quorum disk be on a separate physical disk?
The quorum disk does not have to be on a separate physical disk. You can use the quorum disk for applications, also. However, if you want to allocate a specific volume for this role you can do so. This will, in some cases, marginally improve failover time.

Would a shared solid state drive provide higher availability than standard disk drives?
Perhaps. Solid state drives reduce the seek and rotational latency that is associated with conventional DASD. This performance can be leveraged by applications to minimize possibilities of data loss by essentially writing through the cache without totally destroying system performance. But even in such a case, there remains the possibility for cable, operator, and other failures that can result in inconsistent data. No matter how quickly the data is written to the media, there is a window of vulnerability. For this reason, applications still need to provide some model for persistence to insure that state can be recaptured. A good example of this is the transaction semantics used by database management systems to maintain the integrity of their on-disk data.

What all is required to setup a file share for failover?
A file share cluster group must have all four of the following resources: a network name, an IP address, one or more disk drives, and one or more file shares. Users will use the network name to access the file share. The network name resource must be in the group so that it is always available on failover. This means you can use the cluster name as the network name in one file share group if you wish, but any other file share group must use a separate network name. Each file share should have a Dependency on the appropriate disk(s). The disks and network name should have a Dependency on the IP address. Assuming you use NTFS permissions to authorize access to the file share, then the user account that is used to start the Cluster Service must have at least Read access to the directory. If not, MSCS will not be able to bring the file share resource online.

Page 68 of the Cluster Administrators Guide mentions a "utility for backing up the clustering solution". There is no info on this utility in the ReadMe file. Where is it?
This unsupported utility was not ready in time to ship with Windows NT Server/E 4.0. It will be made available for download from the Microsoft web site in the first quarter of 1998. Watch for a link from the cluster web site here. The utility enables an administrator to backup and restore much of the setup information associated with cluster resources and groups. This capability is currently planned to be a fully supported feature of Windows 2000 Server/E.

Can you use DHCP to assign IP addresses on a cluster?
No. Cluster IP Address resources must be configured with static IP addresses. All adapters attached to networks that are enabled for cluster use, as shown in the Network properties in the Cluster Administrator, must also be configured with static IP addresses. These adapters may not be configured using DHCP. Any adapters attached to networks that are not enabled for cluster use may be configured using DHCP.

How long should it take to failover the quorum disk?
It takes a minimum of 10.5 seconds to failover the quorum disk. It may take longer depending on the time required for the SCSI bus to stabilize following a reset.

How long should it take to start an IP address and a network name resource?
When starting an IP Address resource, MSCS performs 4 pings at 1-second intervals, so it takes a minimum of 4 seconds to bring online. This is done to prevent duplicate IP addresses on the net. When starting a network name resource, MSCS also performs validation that there are no duplicate names on the net, and this takes an additional second or two.

We want to multi-home a service running on a cluster so clients can access it via multiple IP addresses representing different VLANs. The Windows NT Server Enterprise Edition 4.0 release notes say, "A Network Name Resource cannot depend upon more than one IP address." How do we do this?
The release notes are incorrect. This issue was resolved shortly before release. You can add multiple IP address resources to the cluster group, and make the network name dependent on all of the IP address resources.

How do you do a "rolling upgrade" to an MSCS cluster? For example, how would you add NICs to a server in a cluster?
The procedure to do this as a "rolling upgrade" is:

Unload the server: Wait until the workload on the server has dropped to a point where you can move the workload to the other server while maintaining required response time. Then use the cluster administrator console to move the workload, unloading the server you want to upgrade.
Setup the cluster for manual re-start: Go to the Services applet in the Control Panel, select the Cluster Service, and change it from "Automatic" to "Manual". This is to make sure the cluster service does not start up again until you are ready for it.
Install the NIC: Shut down the unloaded server, and physically install the NIC. Re-start the server. Complete the installation, following the installation instructions from the NIC's vendor. (This may require another re-boot.)
Rejoin the cluster: Go to the Services applet in the Control Panel. Change the Cluster Service back to "Automatic", and then start the service. When the cluster service re-starts it automatically re-joins the cluster and automatically detects the additional network interface. Note that failback may take place at this time if you've enabled failback on any cluster groups and the time is within their failback window.
Define the NIC's role: When initially installing the cluster you used the graphical installation wizard to define a "role" for each network interface card (private network, public network, or both.) You must now do that manually, using the CLUADMIN command-line interface. Instructions for doing this are in the MSCS documentation.
Define cluster resources that use the NIC: Finally, use the graphical cluster administrator's console to setup IP addresses and related resources that will use the NIC. Once defined, use the admin console to bring the resources on-line.

Can applications use the cluster's private network (i.e., its interconnect) to communicate between the servers in a cluster?
In general, no. The private network and its IP addresses are reserved for the cluster service. Applications would only be able to use the private network via the communications services provided via the cluster API.

Is it possible to hide the cluster and real node names from the browser lists, only showing the virtual names I've created for the cluster groups?
No. The NetBios namespace is flat, and doesn't lend itself well to the context of hierarchical names within a node. For that reason, it's best to use a naming convention that tells your users which names to use (the virtual names) and which not to use (the physical node names.)

Are there any special backup requirements for a cluster?
Since disks on the shared SCSI bus might be connected to either server, create a cluster file share resource for the disks, and backup using the share name. Other than that you can continue to backup using today's tools if you prefer. You may also want to investigate backup tools that have been enhanced for operation with Microsoft Cluster Server.

How do you setup a disaster-site mirroring solution like Octopus with an MSCS cluster so that the mirroring solution doesn't conflict with MSCS recovery of a failed server?
Disaster-site mirroring solutions that include heartbeats and recovery options should be setup so that loss of a heartbeat does not trigger automatic recovery, since that might conflict with on-site recovery by MSCS. This typically means setting the mirroring solution to its manual recovery option (instead of automatic recovery), or setting automatic recovery so that it simply executes a batch routine which fires off an alert to the operator at the disaster site.

How do you migrate from Digital Clusters for Windows NT to MSCS?
The DIGITAL Clusters for Windows NT/Migration Wizard may be downloaded from the Digital web site at http://www.windows.digital.com/clusters/migration/. For customers choosing to migrate from DIGITAL Clusters for Windows NT V1.1 to Windows NT Server Enterprise Edition (Microsoft Cluster Server (MSCS)), the Migration Wizard facilitates a smooth migration.

Have any Quick Fix Engineering (QFE) modifications for MSCS been issued since its release?
Yes. QFEs for MSCS can be found at ftp://ftp.microsoft.com/bussys/winnt/winnt-public/fixes/usa/NT40/hotfixes-postSP3/roll-up/cluster/.

How do you upgrade the version of Microsoft Transaction Server (MTS) included with Windows NT Server/E (MTS 1.1) to the enhanced version that is included in the Windows NT Option Pack (MTS 2.0)?
MTS 2.0 setup automatically upgrades MTS 1.1. You must run MTS setup on each computer in the cluster. Do not run MTS setup in parallel on cluster nodes—completely install MTS on one node, then install MTS on the second node without rebooting the first node. When all nodes have MTS installed, reboot all nodes. The MS DTC must be offline before running setup. You should remove any package resources because they are no longer needed. To install MTS 2.0 on an existing Windows NT Server/E cluster:

First install MTS on the node that is the owner of the shared disk. See "Setting Up Microsoft Transaction Server" in the MTS ReadMe file for more information.
When MTS detects that MSCS is installed, it will display a dialog box. Select the virtual server name for the cluster.
In the same dialog box, specify the location for the MS DTC log file on the shared disk.
Click OK to continue MTS setup.
Failover the shared disk to the other server. Then install MTS on the other computer in the cluster. You will not be prompted for the virtual server and log file location during setup.

Software Licensing

How does Microsoft license MSCS?
MSCS is a built-in feature of Windows NT Server, Enterprise Edition (Windows NT Server/E), so customers must license Windows NT Server/Enterprise Edition for both servers in a cluster.

Are Client Access Licenses required for accessing an MSCS cluster?
The question of whether a Client Access License (CAL) is required is unaffected by whether a server is standalone or in an MSCS cluster. For example, the standard Microsoft End User License Agreement for Windows NT Server requires a CAL for each client that access the shared file services of Windows NT Server. This is true whether the client is accessing a file share on a standalone server, or on an MSCS cluster. Put another way: there is no special CAL requirement related to accessing an MSCS cluster.

How are applications licensed on MSCS clusters?
Each application vendor will determine their own licensing policies for applications running on MSCS clusters. Microsoft's standard policy for server application licensing applies to MSCS clusters: an application must be separately licensed for each server on which it is installed. In an MSCS cluster, if an application is to run on both servers, or even if it only runs on one server at a time but must be installed on both servers to permit failover, then the application must be licensed for both servers.

How are Microsoft Client Access Licenses for BackOffice applications handled on MSCS clusters?
If the customer is using "per-seat" Client Access Licenses for the application, then those licenses apply when a client is accessing the application on either server in the cluster. If the customer is using "per-server" (or "concurrent use") Client Access Licenses for the application, then each machine in the cluster should have a sufficient number of per-server Client Access Licenses for the expected peak load of the application on that machine. (Note that "per-server" Client Access Licenses do not "failover" from one machine in the cluster to the other.)

Troubleshooting

When diagnosing problems that appear to be cluster-related, how can I determine what is happening in the cluster services?
For problem reporting with the initial release of MSCS, you must use the "cluster log". (Future releases will make greater use of the Windows NT Server Event Monitor.) To turn on the cluster log, you should set an environment variable called "clusterlog" in the system environment for your system, and set clusterlog to the path for a file called something.log. For example, have the environment variable set clusterlog to %windir%\cluster\cluster.log and then reboot. When the cluster service starts, it will log failure reasons and other info in the clusterlog file. That way it will be easier to diagnose the problem.

Should the cluster administration console be connected to the cluster name, or to a node name?
Connect using the node name instead of the cluster name, as documented in the Cluster administrator's guide. If you connected to the cluster name you would utilize the RPC service to the cluster endpoint mapper. Since this gets failed over, your RPC session for cluster admin has to wait to timeout, which can take a relatively long time. When you connect using the node name, the cluster does not thrash in the event of such a failure. Instead, it simply arbitrates for ownership of the quorum device. After this is settled, one cluster node remains where the appropriate failover services are running. You can then reconnect the cluster administration console to the surviving server.

From CMD shell on one server, if you try to access a drive owned by the other server you get "Incorrect function." Why?
MSCS is a "shared nothing" environment, meaning that disk resources are owned by only one server at any point in time. "Incorrect function" is the message you get when trying to do local access to disks that are owned by a different server.

How come stopping the server service on either cluster node does not cause failover?
It appears that the cluster software is not monitoring the server service but just the local cluster objects directly, not through the server service.

MSCS does not explicitly check for the server service, but it does monitor the LanManServer. Therefore, with SMB shares, it will fail these over in the event that the LanManServer service failed or was stopped. If you want MSCS to monitor and restart the server service also, you can easily do so using the admin wizard to set it up as a "generic service."

A corporate network failure didn't cause failover of any resources. The Cluster Admin tool fails with an error dialog stating that the cluster service has stopped. How do the cluster nodes identify when a net failure occurs?
This is a case where clustering by itself cannot eliminate every potential single point of failure in a system. Just as highly available clusters should employ hardware RAID to protect against loss of physical disk drives, they should also include dual-path SCSI and redundant NICs to protect against loss of a single SCSI controller or network interface card.

How do you move the quorum resource to another disk?
This is done in Cluster Admin by selecting the cluster and right-clicking. One of the three tabs is Quorum Resource, which allows you to modify this entry.

If the heartbeat link is down and both machines are performing quorum, how should the machine that cannot reserve the SCSI bus react? In the normal case, should only the machine that can reserve the SCSI bus survive and the other machine go down?
First, both nodes cannot have the quorum resource. However, both nodes can be operating in the cluster if one node has the quorum resource and the second node joins the cluster. When a partition is discovered (that is, the servers cannot communicate with each other), both nodes arbitrate for the quorum resource. One node wins the arbitration (if they are still partitioned) and the other node loses. The loser shuts down the cluster service, the winner fails over all groups and continues to operate.

What's the recommended procedure if you want to run CHKDSK on a disk connected to the shared SCSI bus of a cluster?
CLUSSVC has start options where the service can be started without quorum logging. This is either at a command prompt or from the service panel, with the noquorumlogging option. At that point, the storage devices on the shared SCSI bus can be checkdisked.

Under what conditions does MSCS automatically run CHKDSK? Can this happen during setup or while the cluster is running?
MSCS automatically checks for volume and file system corruption whenever a disk resource is brought online. If corruption is detected, it runs CHKDSK. It can also run CHKDSK if it detects corruption in any of the files in the \mscs directory (that is, the quorum directory), which may happen during setup if the quorum disk is faulty.

How long should you wait after starting the cluster service before you can startup the cluster administrator's console? I tried starting the console right after a re-boot and got an RPC error. After waiting a minute or so, the console started right up. Is this normal?
It typically takes the cluster service between 30 seconds and a minute to startup (i.e., initiate the cluster service and bring the name and IP address resources online that are required for connecting the administrator's console.) If you try to connect before it's ready, you'll usually get an error saying the RPC Service isn't available. If this happens, and assuming there are no other problems, just wait a minute and then try again.

I changed an IP address resource, and it took a long time before some clients could find resources dependent on the new address. Why did it take so long?
If you're using WINS (Windows Internet Name Service) to map network names to dynamic IP addresses, it can take time for new mappings to be replicated out to all of your WINS servers. The time required cannot easily be estimated since it's dependent on your particular network and WINS setup.

Developer Issues

Customers and software vendors are interested in developing DLLs to make applications "cluster aware." Is there any documentation, sample code, and so on, to assist them in the process?
Yes, there is a Software Development Kit (SDK) for MSCS. The MSCS SDK has an SMB file share example DLL (with code). Developers can take this as a template and fill in their own application specific code in the specific routines (Online, Offline, Is Alive, Looks Alive, and so on). There is also a white paper on writing Cluster Resource DLLs that can be downloaded from the Microsoft web site located here.

How does Microsoft distribute the MSCS Software Development Kit (SDK)?
The Microsoft Cluster Server SDK can be obtained through a Microsoft Developer Network (MSDN) Universal subscription, or can be downloaded in the Windows Base Services of the Microsoft Platform SDK. To download, go to http://microsoft.com/msdn/sdk/winbase.htm—the headers and libraries are part of the "Build Environment" and the samples are in the "Windows Base Services". To subscribe to MSDN, go to http://microsoft.com/msdn/join/.

MSCS SDK documentation says, "Registry replication is a configurable feature that is available to the Generic Application and Generic Service resource types. Basically, you tell it what registry key to watch/replicate and that's all there is to it. If the application/service stores volatile information in a specific registry key, then the key should be declared in the properties section of the resource so that it may be replicated. If this is done, when the resource comes online on another node, it will have the same registry information as the previously online resource. Application/service registry keys, by default, are not replicated or stored within the cluster database." Why should anyone use the cluster APIs to write registry keys to the cluster database instead of just using the registry checkpointing feature of a Generic DLL created with the Resource Wizard?
If you're just going to use a generic application resource, then you should just use registry checkpointing. However, using the generic application resource type has some limitations. For example:

You can't do active/active, which will limit load balancing if you also want failover (more about this below).
When you go offline, it simply terminates the process. If you have a GUI application, you may only get 300ms to clean up.
The application isn't configurable through the cluster administrator's tools.

Alternatively, you can write a resource DLL for the application. At that point you face additional issues. First of all, if you're talking about user configurable parameters, they should be using private properties associated with the resource type. It gives a common method by which admin tools can query and set the parameters for a given resource. These property requests are ultimately handled by the resource DLL. That leads to the question of why the resource DLL and application should use the cluster database.

Each resource has its own section of the cluster database, as opposed to the general per-application focus of the Windows NT registry. This becomes an issue if you want your resource to be more granular that just your application. For example, if your resource is your database server, you can only run the server on one node at a time. On the other hand, if your resource is databases presented by that server, then you can have the database server running on both nodes. (For example, one node might have a payroll database, while the other will have an orders database.) If one node goes down, the server on the other node can pick up the database that no longer has a host. This is the active/active configuration mentioned above. To do this your settings need to be per-resource, not per application. Also, registry checkpointing is only done when the resource is running. If you make any settings changes through a separate admin tool when the resource isn't online, those changes won't get propagated.

With service resources there is an option to have part of the registry entries fail over to the secondary node. Since all file share information is stored in the registry, can this be used as an alternate way to provide file share failover?
No. Share information is stored in the registry, but that doesn't mean modifying the registry is the correct way to create shares. One problem would be that you have to reboot for the registry changes to result in the creation of a share. There also remains the problem of what you do when you fail over. If both machines are set up with shares pointing to a drive on the shared bus, one machine is going to have shares referring to a device the machine can't access.

What mechanisms are advised with respect to Named Pipes and Semaphores in a cluster application environment for process-to-process communication (for example, registry settings changed on one node of the cluster, how are they updated at the other node, and so on)?
Since the main issue is the transfer of inner transactional state information you could use the transacted registry feature of MSCS to get registry information over to the other node in case of a failover or, even better, make your transactions small enough so they can be replayed easily. Use MTS to get the best support for your (D)COM objets.

The MSCS SDK references the file MSCLUS.DLL. What is this and where is it located?
MSCLUS.DLL is the COM interface to the CLUSAPI. Because it is close to completion, it was included in the initial MSCS SDK documentation. However, it was not completed in time to ship with the original release of Windows NT Server, Enterprise Edition 4.0. Microsoft plans to release it through Web and MSDN distribution in the first quarter of 1998. Note that an early copy was inadvertently shipped with the Platform SDK to MSDN subscribers in October/November 1997. That version has a "modified" time/date of Wednesday September 17, 1997 4:10:52 am. That early copy of MSCLUS.DLL is not supported by Microsoft and should not be used.

If you have cluster calls in an application, what do you need to do to make your application work in a non-cluster environment as well?
You should ensure that you can install on a cluster as well as on a single machine. Note that MSCS does not yet support an application-level channel through the cluster. The Cluster SDK gives you an idea of what you can do today to get aware of a cluster and what you can do with it.

Under what conditions will Microsoft Transaction Server (MTS) failover? Are individual MTS packages monitored with heartbeats?
MTS will failover when the MTS service or the server it is on fails, or when moved manually by the cluster administrator. It does not put heartbeats on individual packets, so developers should continue to code MTS clients to handle packet failures or hangs.

How do I programmatically distinguish whether I'm running on a cluster? I tried using the GetVersionEx() API, but it returns exactly the same information on Enterprise Edition as on standard Windows NT Server. Is there some registry key I can check or other API I can call?
HKEY_LOCAL_MACHINE\Cluster will not exist on a machine unless it has been enabled for clustering.