Wolfpack Clustering White Paper - Windows NT 4.0
Last updated on September 22, 1999

MSCS System Validation Test Plan

Final — Version 1.3 — September 16th 1999

This document describes the test plan for Wolfpack system validation, including the hardware and software requirements. "Wolfpack" is a code name for Windows NT® Server support for clustering.

 

For Windows 2000 clustering, see Wolfpack Clustering Whitepaper - Windows 2000.

Also see the Clustering Readme.

Contents

Introduction

Definitions

Obtaining a Wolfpack CD and Wolfpack Self-Test Kit

Checking the Cluster HCL on the Web

Service Pack

Systems Requirements and Configurations

Server Requirements for a Wolfpack System

Network requirements for running tests

Client Requirements for a Wolfpack System

Wolfpack Configuration Components

Setup Instructions for Validation Testing

Phase 1 Testing (24 hours)

Two Initiator SCSI Testing

Netcard Validation for Wolfpack (optional no logs required)

Phase 2 Testing – Validate 1 Node (24 hours)

Phase 3 Testing – Validate Move Group 2 Node

Phase 4 Testing – Validate Crash 2 Node

Phase 5 Testing – Validate 2 Node (24 hours)

Client Server Tests

Setting up and running client/server tests

Interpreting the log

File I/O Testing Using an SMB Share

IIS Testing

Print Server Testing

Causing Cluster Failovers During Client-Server Tests

Failover Program

Interpreting the failover log

Trouble shooting Failover

Additional Stress Testing (24 hours) (Optional for Release1.0 of Wolfpack)

Simultaneous Reboot Test (optional no logs required)

Move Group 2 Node Test (optional no logs required)

Crash 2 Node Test (optional no logs required)

How to Submit results to WHQL

What to do if tests fail, but you think it is a test bug?

How to Return Log results

Cluster description on the HCL

 

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

This documentation is an early release of the final product documentation. It is meant to accompany software that is still in development. Some of the information in this documentation may be inaccurate or may not be an accurate representation of the functionality of the final retail product. Microsoft assumes no responsibility for any damages that might occur either directly or indirectly from these inaccuracies.

Microsoft Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. The furnishing of this document does not give you any license to the patents, trademarks, copyrights, or other intellectual property rights except as expressly provided in any written license agreement from Microsoft Corporation.

Microsoft does not make any representation or warranty regarding specifications in this document or any product or item developed based on these specifications. Microsoft disclaims all express and implied warranties, including but not limited to the implied warranties or merchantability, fitness for a particular purpose, and freedom from infringement. Without limiting the generality of the foregoing, Microsoft does not make any warranty of any kind that any item developed based on these specifications, or any portion of a specification, will not infringe any copyright, patent, trade secret, or other intellectual property right of any person or entity in any country. It is your responsibility to seek licenses for such intellectual property rights where appropriate. Microsoft shall not be liable for any damages arising out of or in connection with the use of these specifications, including liability for lost profit, business interruption, or any other damages whatsoever. Some states do not allow the exclusion or limitation of liability or consequential or incidental damages; the above limitation may not apply to you.

ActiveMovie, ActiveX, BackOffice, Developer Studio, Direct3D, DirectDraw, DirectInput, DirectPlay, DirectSound, DirectVideo, DirectX, Microsoft, NetMeeting, NetShow, Visual Basic, Win32, Windows, and Windows NT are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Other product and company names mentioned herein may be the trademarks of their respective owners.

© 1997 Microsoft Corporation. All rights reserved.

Introduction

The final release of Windows NT 4.0 Enterprise Edition should be used with this test kit. The latest Service Pack should be applied to both nodes of the cluster.

This document is the test plan for Wolfpack system validation. It describes the hardware and software requirements for the validation process. The intended audience is people who are involved in validation of Wolfpack systems and also IHVs who wish to have systems validated. This document does not go into great detail about each specific test. Microsoft has other documents for each test that give specific testing criteria and methodology. This document is in draft form, and several issues have not yet been resolved. Issues that Microsoft is still resolving include:

The exact step-by-step procedure for running the test is not in this document because the entire test CD is not yet complete. Microsoft will provide this when the CD is completed.

The contents of this document are subject to change. Please refer to the most recent HCT CD for Wolfpack validation and print the latest copy of the Wolfpack System Validation test plan to obtain an update.

Definitions

The following terms are used throughout this document.

HCL: Hardware Compatibility List. The list of hardware components that are validated for the Microsoft® Windows NT or Windows® 95 operating systems.

HCT: Hardware Compatibility Tests. The set of tests that are run to perform validation of hardware that will be added to the HCL. An HCT kit is available from Microsoft, as described in the following section.

HW RAID: A RAID that is done with no knowledge of the operating system. As far as Windows NT knows, these RAID sets appear to be a normal physical disk and the RAID operations are all done in hardware.

SW RAID: This is what is meant when using the Windows NT Server Ftdisk driver to take several physical disks and make one logical fault-tolerant (FT) volume out of them. This driver supports RAID levels 0, 1, and 5.

WHQL: Windows Hardware Quality Labs. The Microsoft lab that performs the component validation testing for components that must be submitted to Microsoft.

Wolfpack: The internal product code name for MSCS.

Obtaining a Wolfpack CD and Wolfpack Self-Test Kit

Visit http://www.microsoft.com/hwtest/hctcd to obtain an official MSCS self test CD

NT server, enterprise CDs are available by OEM, Select, Retail, and MSDN licenses. Do not contact WHQL for NT Server Enterprise CDs.

Checking the Cluster HCL on the Web

Visit the site at:

www.microsoft.com/hwtest/hcl

You can search under cluster for the list of all complete cluster configurations for each vendor.
You can also search under cluster/raid, cluster/scsi, and cluster/fiberchannel to see a list of MSCS components that can be used for complete cluster configurations. Please note that no support is offered on a MSCS component level basis. Only complete configurations listed under "cluster" are valid configurations.

Service Pack

Please make sure that the latest Service Pack (SP4 or later) has been applied to both cluster nodes as well as the client test machines before starting the Wolfpack validation tests.

Example Configuration

Creating a Wolfpack cluster requires two PCI-based Intel® Pentium or DEC Alpha systems configured as described in the following list. For development purposes, any PCI-based Intel Pentium or DEC Alpha system listed on the Windows NT version 4.0 HCL can be used as a Wolfpack cluster node.

  1. At least one shared SCSI bus formed by a PCI-based SCSI controller is installed in each system.
  1. At least one external SCSI disk is attached to one of the shared buses. Each disk must be formatted for Windows NT file system (NTFS) only. A single partition is recommended on each disk because logical partitions cannot be independently failed over. The same drive letter should be permanently assigned to a given shared disk on each system.
  2. At least one disk on each system is not attached to any of the shared buses.
  3. Windows NT 4.0 is installed entirely on the nonshared disk(s) of each system. All paging files and system files must be on nonshared disks.
  4. At least one shared LAN is for intracluster communication. A single network adapter in each system must be attached to this LAN and configured with the TCP/IP protocol. Each adapter must be assigned an address on the same IP subnet. The intracluster network must use PCI NICs.
  5. At least one shared LAN is for client access to the cluster. A single network adapter in each system must be attached to this LAN and configured with the TCP/IP protocol. Each adapter must be assigned an address on the same IP subnet. Clients can be connected to this LAN by a routed IP network. The same LAN (and IP subnet) can be used for both intracluster communication and client access.
  6. One static TCP/IP address is for the cluster and one is for each resource group that will be created. These addresses will be used by clients to access cluster services. These addresses must be supplied to the Wolfpack setup and administration programs when resource groups are created.

Figure 1 illustrates the standard configuration of a Wolfpack cluster.

Figure 1. Standard Wolfpack configuration

Systems Requirements and Configurations

This section presents the system configuration criteria for a Wolfpack system. Note that all components in a cluster system must be validated to run on Windows NT 4.0 and be on the HCL before they will be considered for Wolfpack-specific testing. Components that are not on the HCL must pass HCT tests prior to Wolfpack testing, because Wolfpack testing is designed to test Wolfpack requirements, not general Windows NT 4.0 requirements.

For this type of hardware testing, the HCT kit and BackOffice® testing programs are used. The following lists constitute a Wolfpack configuration.

Server Requirements for a Wolfpack System

In addition to the minimum system requirements for a cluster, Microsoft will require the following for system validation:

Network requirements for running tests


The phase 2 and phase 5 tests will generate a lot of network traffic doing client/server type of IO. We recommend that all of the client machines and cluster nodes be on a private network. The cluster nodes may be a primary domain controller and a backup domain controller. However we find that the best results for this heavy level of stress testing is to have another server which is always up being a PDC for the domain.

The client nodes, monitoring node, and the cluster nodes must all be a member of this same domain. We typically have our lab setup to have all machines logged on as the same domain account which has local administrator rights on each node. We use this same account for the cluster service also.

The server nodes will experience very high stress loads with file I/O, IIS queries, and FTP queries. This is by design and is done to simulate what we believe will be the real world customer usage of high end cluster configurations. The network stress loads however are probably higher than what any customer would utilize. We recommend that all testing be done a private network.

Client Requirements for a Wolfpack System

The client machines will be used to simulate client-server stress against the cluster. The eight required client nodes cannot be used to test more than one cluster at a time. The idea of having eight dedicated clients is that Microsoft can run many tests on each node, simulating many real-world clients. For each cluster you want to test in parallel, you must have a different set of client machines. These clients must meet the following hardware and software requirements:

An additional client node will also be needed. This client should be separate from the clients used for client-server stress testing. It has the same hardware requirements as other cluster machines, but serves as the client monitoring node. It can be used as the client monitoring node for more than one cluster. This node is used to:

Wolfpack Configuration Components

A Wolfpack configuration consists of three main components. All three must be on the Windows NT HCL. Microsoft views Wolfpack requirements as a superset of normal Windows NT HCL requirements. However, the normal Windows NT Server HCL is a starting point for Wolfpack configurations. Wolfpack validation is designed primarily to ensure that a given hardware configuration will work with Wolfpack. The HCT kit is used to ensure proper functionality of hardware with normal one-node Windows NT Server. The three major components are:

The Wolfpack configurations listed on the HCL will be a complete configuration as described in the previous section. However, a particular configuration is not a validated Wolfpack configuration until it has gone through the system validation process, which is described in the rest of this document. These three components are the starting point for putting together complete Wolfpack configurations that can be self-validated and put onto the HCL.

One goal of each configuration should be to eliminate any Single Point of Failure or SPOF. This can include power failures, SCSI cables coming loose, disk cabinets failing etc. Because Wolfpack is intended to serve as foundation for highly-available systems, it is recommended (though not required) that configurations minimize or eliminate single-points-of-hardware-failure configurations.

Servers in a Wolfpack Configuration

The Wolfpack server requirements are fairly minimal. Microsoft requires the server to have PCI slots because all the shared SCSI adapters that Microsoft has tested are PCI-based controllers. Wolfpack is designed to work in any Pentium-based server on the HCL list, including uniprocessor and multiprocessor machines. All servers must go through normal HCL testing before any Wolfpack configuration testing and must be on the appropriate HCL list.

Currently, uniprocessor machines can be validated by running the test from the HCT kit via self-validation. Symmetric multiprocessing (SMP) machines must come to Microsoft for this validation. For information about getting your server onto the HCL list, please see the web site at http://www.microsoft.com/hwtest/hwtest.htm. There is also an e-mail alias for questions about WHQL testing at whqlinfo@microsoft.com.

Shared SCSI Bus in a Wolfpack Configuration

The shared SCSI bus is probably the most sensitive part of a Wolfpack configuration. There are three components that make up the Wolfpack shared SCSI bus, although not all three are required for any given configuration. The components are:

All of these components must be on their respective HCL before any Wolfpack testing will be done. Microsoft will have a new section on the HCL list for Wolfpack components of this type. Any of these components that are already on the HCL can be submitted to Microsoft for Wolfpack component testing. If the component is not on the HCL, then both normal HCL testing and Wolfpack component testing can be done with the same submission. They will be tested in a variety of configurations to help ensure that any IHV will be able to use them in a Wolfpack configuration.

For information on submitting shared SCSI components for Wolfpack validation, please see the web site at http://www.microsoft.com/hwtest/hwtest.htm. There is also an e-mail alias for questions about WHQL testing at whqlinfo@microsoft.com.

 

Figure 2. Wolfpack components with Windows NT HCL

Interconnect Cards in a Wolfpack Configuration

Interconnect cards are what Wolfpack uses to communicate between the nodes of a cluster. This network needs to be high-speed and highly reliable. The heartbeat of a cluster is constantly sent back and forth across this network to keep track of which nodes are up in the cluster. The cluster database records and checkpoint information are sent across this network as well. The most important aspects of this are the speed at which packets can be sent over the network, the number of dropped packets, and the reliability.

There will be a Wolfpack component HCL for interconnect cards. The only type of cards that need to be on the separate Wolfpack HCL for interconnect cards are any type of interconnect cards that meet one or more of the following:

  1. Use a private NDIS driver. 3rd party miniport drivers are standard and don’t require being listed on the Wolfpack interconnect HCL.
  2. Any card that can’t pass 100% of the NT NDIS tests. Point to Point network cards can function as a interconnect card but will have to be tested only as a interconnect card not a fully functional NDIS card.

For submitting interconnect components for Wolfpack validation, please see the web site at http://www.microsoft.com/hwtest/hwtest.htm. There is also an e-mail alias for questions about WHQL testing at whqlinfo@microsoft.com.

Changes That Constitute a Different Wolfpack Configuration

As shown in Figure 2, the final Wolfpack configuration consists of three main components. Any major change to these components will result in a new Wolfpack configuration and therefore will require new validation testing. Defining configurations in this way ensures that the end product will work when Wolfpack is installed. Major changes are defined as the following:

 

Changes That Do Not Constitute a Different Wolfpack Configuration

Microsoft wants to provide as much flexibility as possible for system vendors to build Wolfpack configurations while at the same time ensuring that the configurations will work. Changes to configurations that don’t have a major impact on the operability of Wolfpack will not constitute a new configuration.

Depending on the results of the system validation testing, Microsoft can change the process to allow more or fewer variations within a configuration. The following changes are believed to have no major impact upon Wolfpack and therefore do not constitute a new configuration. Therefore, when making these changes, no new validation needs to be done. However, it is recommended that all systems be tested periodically. Nonmajor changes are defined as the following:

Setup Instructions for Validation Testing

This section summarizes setup for both the hardware and software. The setup order corresponds directly to how the tests will be run, and should be followed precisely.

Before Phase 1:

  1. Set up the hardware as shown in Figure 1 earlier in this document.
  2. Install Windows NT Server version 4.0 and the latest service pack on both machines.
  3. Install Microsoft Internet Information Server (IIS) version 2.0 on both nodes while running Windows NT Server Setup. (Note applying SP4 updates you to IIS 3.0 )
  4. After the latest service pack is installed turn of the logging for the IIS. To do this go to Programs->Microsoft Internet Server (Common)->Internet Service Manager from the start menu. Right Click on the www service and go to Service Properties. Select the Logging tab and uncheck the Enable Logging check box.
  5. Make sure that the WolfpackA node can see the WolfpackB node on the network.
  6. Run Phase 1 tests.
  7. Reboot WolfpackA, and then turn off WolfpackB.
  8. Partition each drive on the shared bus for one partition.

After Phase 1 test completes.

  1. Format each drive for NTFS.
  2. Install the Wolfpack software on WolfpackA.
  3. Turn on WolfpackB.
  4. Join WolfpackB to WolfpackA, forming a cluster.
  5. Turn off WolfpackB.
  6. Run the Validate 1 Node Cluster test from the HCT kit (see instructions later in this document).
  7. Start the client test on each client (see instructions later in this document).

After completing step 7, Phase 2 testing will be started and timed on the Client Monitoring node.

  1. Reboot WolfpackA.
  2. Turn on WolfpackB.
  3. Run the Validate 2 Node Cluster test from the HCT kit (see instructions later in this document).

After completing step 3, Phase 5 will be started and timed on the Client Monitoring node.

Phase 1 Testing (24 hours)

Phase 1 is designed to take only one day. In this configuration, Microsoft will have the hardware as depicted in Figure 1. In this phase of the testing, Microsoft will not have any cluster software installed on either node. Also, FT sets should not be set up at this point by using the Windows NT Server Ftdisk driver. This phase of testing is designed to stress the hardware. Microsoft has specific hardware tests designed to stress the hardware in a manner similar to what Wolfpack software will do, but even more so.

All Phase 1 tests will be included on the Windows NT 4.0 (or higher) HCT kit that will ship with Wolfpack. These are some of the tests that Microsoft will use to validate components for the Wolfpack component HCL. By running these tests before you submit hardware to Microsoft, you will have a good idea of whether your hardware will pass those tests. The testing at Microsoft will involve numerous configurations and some other testing, but the Phase 1 tests are a good starting point.

Two Initiator SCSI Testing

Configure the controllers and shared SCSI bus as shown in Figure 1 on a typical cluster configuration. The drives should not be partitioned or have any file system installed on them. Doing so will not prevent the test from running, but the test will do I/O to the drive directly without using the FS. Any data on the drive will be lost, and after the test is completed, the file system will most likely not be able to mount the drive. The test should be run with a minimum of two drives on the bus, but more can be used. The test should run for about two hours per disk.

If you are planning on using the Windows NT Server Ftdisk driver, you should not have FT sets at this point, because the test will not work correctly with FT sets. If you are using HW RAID, then you should set up the HW RAID sets before running this test.

This test is designed to stress the shared SCSI bus. The test will run on each drive found on the shared bus in a serial fashion. The test will perform all SCSI commands on both initiators and communicate over a named pipe. The test is broken down into two main phases:

NOTES:

  1. You should not have Wolfpack installed at this point on either node
  2. You should not have any FT disk sets on the shared SCSI bus at this point
  3. To make sure all HW is setup correctly run Windisk on each node and make sure each sides sees the same shared drives. If Windisk has a popup box about a drive not having a signature you should reply ‘yes’ to writing a signature on the disk.

To start the test run on WolfpackA:

  1. On the server list, click the Plus box.
  2. Click the Cluster Plus box, and then click the CliSrv box.
  3. Select "Cluster Test Server," and then click the Add button.
  4. Click the Start button to start the test.

To start the test run on WolfpackB:

  1. On the Server list, click the Plus box.
  2. Click the Cluster Plus box, and then click the CliSrv box.
  3. Select "Cluster Test Client," and then click the Add button.
  4. Click the Plus box beside the test in the Selected Test box.
  5. Double-click the text under the Plus box and change Param2 "Server Name" to "WolfpackA" (or use your actual server name).
  6. Click the OK button.
  7. Click the Start button to start the test.

Each node will produce a standard Windows NT log file with the results from the test. The client side will produce a log file called Clustsim_client.log. The server side will produce a log file called Clustsim_server.log. The distinction between client and server should not be confused with the function of a normal client and server. This is strictly the naming convention used for this test. One node of a name pipe has to be the client and one node has to be the server. From the HCT test manager, you can easily view the results of each test. If the test fails, it will report in the client log why the test failed. If the actual failure happened on the server node, then you should look at both log files to determine what command failed and why.

Problem Resolution:

If the clustsim.exe test fails you should look at the 2 log files. The server node will produce a log file called clustsim_server.log and the client node will produce a log file called clustsim_client.log. The server log will not have as much information as the client log. All variation and statistic gathering will be done on the client node only. However if the last IO command that failed was issued by the server node you need to look there to find the exact details. Here is a list of the common problems this test has found:

  1. Release command. This SCSI II command should never fail. Even if the initiator issuing the release currently doesn’t have a reservation. The semantics of this command is that after it completes the issuing initiator no longer owns a reservation.
  2. No commands besides inquiry and request sense should work from a initiator, if another initiator currently has a reservation on a disk. This test will attempt to do write, read, and test unit ready commands that should fail in this scenario.
  3. Write caching problems. If the controller or RAID device does any write caching it must guarantee the cache for both paths.

Netcard Validation for Wolfpack (optional no logs required)

 

CPU Usage

The net cards should take as less of CPU time as possible. Further more in the new cards the CPU usage will normally be less as the size of the packet increases. Some of the older cards will consume 100% CPU when stresses even with large packets. The test card should have CPU Usage values lesser than or equal to the values in the CPU Usage graph shown below.

 

Through put

The test card throughput values should be greater than or equal to the throughput values shown in the following graph.

 

Packet dropping

The test card % of frames dropped should be less than or equal to the values specified in the following graph. Note the values in the graph are multiplied by 1000, so to get the actual % you have to divide the values shown in the graph with 1000. Ideally speaking packets should not be dropped at all. At most 1-% packets can be dropped as indicated by the above test.

 

 

 

 

Step by Step Instructions to run net card validation tests

The NDIS performance tests requires 2 machines to run them. One machine acts as a server and the other acts like a client. In the following test Node1 will act as a server and Node 2 will act as a client. The net card to be tested is placed on the client (private network).

Node 1:

  1. Run the HCT test manager.
  2. Click on the Device box
  3. Click on the Ndis Box
  4. Select "Manual NIC Tests (Variety)"
  5. Click on the Add button
  6. Click on the Start button. This should start the "Ndtest" user interface.
  7. Select the Vendor tab and fill up the vendor information.
  8. Select the LAN/ATM tab.
  9. In the Test Card list box select the network card connected to the private network.
  10. For Message Card select the netcard that is connected to the public network. (Should be different from the private network).
  11. Check the server mode check box
  12. Click start to start the test in the server end.

Node 2:

  1. Run the HCT test manager.
  2. Click on the Device box
  3. Click on the Ndis Box
  4. Select "Manual NIC Tests (Variety)"
  5. Click on the Add button
  6. Click on the Start button. This should start the "Ndtest" user interface.
  7. Select the Vendor tab and fill up the vendor information.
  8. Select the LAN/ATM tab.
  9. In the Test Card list box select the network card connected to the private network. This is the netcard that is to be validated for Wolfpack.
  10. Trusted card remains as "nocard"
  11. For the Message Card select the netcard that is connected to the public network. (Should be different from the private network).
  12. Check the "Enable" checkbox in the Runtest group box.
  13. List of script files appears. Choose 2m_clus.tst.
  14. Click start to start the test in the client end.
  15. This test takes about 15 minutes to complete.
  16. After the test 2m_clus.tst is completed, view the results by selecting Results->Test Logs.
  17. You will get a dialog box displaying all the log files. The file name of the log file is located in the first column. The log file for the 2m_clus.tst would be named as 2m_clus.leth*. Double clicking on this will open the log file in a notepad. It may be a good Idea to save it in a separate directory say 2m_clus.log.

Procedure to get this graphs.

  1. Run convert.cmd (Syntax is convert <file where 2m_clus.log exists>). It produces a file called all.log
  2. Open netcard.xls and run the macro Drawgraph. (Please note this is Excel 97). It will prompt you for the filename. You have to give the complete path for the file all.log generated above.
  3. It will draw graphs as shown in the sample below.

 

 

 

Phase 2 Testing – Validate 1 Node (24 hours)

Phase 2 of the testing will use the same HW configuration seen in figure 1. At this point you should install the cluster software on WolfpackA. Then do a join operation from WolfpackB to form a complete cluster. This phase of the testing is to ensure that all of the cluster resources will work with only a single node up. This is an important case because in the event of failure users will expect their system to function just as if both nodes were up after a failure from WolfpackB, although with performance loss. Phase 2 and Phase 5 will run the same set of regression tests. Since this is not the normal set of regression tests then we only need to run the tests for a shorter time period. Also no failovers can happen at this point. WolfpackB should be turned off during this part of the testing, in order to simulate what would happen with a normal cluster node failure.

 

Phase 3 Testing – Validate Move Group 2 Node

This test is intended for Windows 2000 only.

Phase 4 Testing – Validate Crash 2 Node

This test in intended for Windows 2000 only.

Please note that Phase 3 and 4 do not exist for Windows NT 4.0. Phase 3 and 4 are two new tests intended only for Windows 2000 Cluster Validation. Phase 5 in the latest test kit is identical to phase 3 in the original test kit. Because of the order that the tests are run, it was necessary to rename the original phase 3 test to phase 5.

Phase 5 Testing – Validate 2 Node (24 hours)

Phase 5 of the testing is to test a cluster with both nodes joined into a cluster and powered on. When you start the Phase 5 testing you should turn WolfpackB on. We will utilize a minimum of 8 client machines to run client/server stress tests. While these client tests are running against the cluster we will have both nodes of the cluster rebooting. This will cause the resources on the cluster to move back and forth between the nodes. All of the client tests are designed to retry operations in the case of a cluster node failure. They will allow a reasonable amount of time for the resource to failover. If the resource is still unavailable after that time period the test will consider this a failure. In the case of a failure the test will report and error and log why the test stopped.

During all of the Phase 5 tests we will have the failover test running on the client monitoring node. The person running the tests, may also run Cluadmin on this node to monitor the failovers of the cluster.

Please note in a future release of the test kit Phase 5 will be bumped up to 48 hrs. Currently 24 hrs is the minimum submission run time for Phase 5.

 

Client Server Tests

All of the client server tests will be used both in Phase 2 and Phase 5 of the Wolfpack validation process. These tests can be broken down in the following type of tests:

  1. File IO using a SMB share
  2. IIS tests
  3. SQL tests
  4. Print Spooler tests

These tests are designed to simulate the most common cluster resources that users will run on a clustered system. All of these tests will log their results to a log file. The tests should be ran for the period defined by that particular phase. The tests run in a infinite loop. The machine should be shutdown after the allotted test time. The log files can then be examined to see which tests passes and which tests failed during the test run.

All of the client/server tests will be run in conjunction on each client node. The client tests will use the well-defined names to access the cluster resources. We will have a GUI interface from the HCT test kit to setup all of the cluster resources.

Setting up and running client/server tests

The HCT test manager will setup cluster resources so that client tests can attach to well known cluster names. When running our program to setup the resources the following information will be needed:

  1. Cluster Name
  2. Static IP addresses
  3. Subnet mask for the IP addresses

From this input the setup program from the HCT test manager will setup the following cluster groups (if needed ) and resources on the cluster.

The cluster groups will look like this:

  1. Cluster Group
  1. Disk Group 1
  1. Disk Group 2

If you have setup of more than 2 Disks as cluster resources these groups will just look like the Disk Groups above except the number will be Disk Group X. The only limit to the number of groups that you can setup is how many static IP addrs you supply the group setup program. The more groups that you use the more client tests that we will start, and the better the overall testing of your HW.

 

Step by Step Instructions to setup cluster resources and start Phase2 tests

NOTE: You must install the cluster administrator on the monitoring node where you are going to run the HCT test kit from. Previous versions of the HCT test kit shipped with a clusapi.dll. We no longer ship this dll. Instead installing cluster administrator will put clusapi.dll in your path so we can load it. This will allow the HCT test kit to run on various Wolfpack builds. You should use the exact Wolfpack CD used to install the cluster Nodes.

  1. On the client monitoring node insert the HCT CD. Run the HCT test manager. Make sure you have a plus box for "Cluster" from the first menu. If not you need to get a newer HCT CD
  2. Click on the Cluster box
  3. Click on the Nodes Box
  4. Select "Validate 1 Node Cluster"
  5. Click on the Add button
  6. Click on the Start button. This should start the "Cluster Validation Tests" process.
  7. Type the cluster name into the ‘Cluster Name:’ edit box. If you make a mistake simply type over the name.
  8. (Optional) Click on the Verify button. This will attempt a connect with the cluster and query the node names. The cluster name and the node names should show up in the list box. Please note for the single node validation test only one node name, and the cluster name, should show up. If there is more then one node name, please check that one of the cluster nodes is powered off before continuing.
  9. Type in each client name and add the name to the list by using the Add button. If you make a mistake select the client name you wish to remove and click on the Del button. The number of entered client names is displayed immediately to the right in the ‘Num Clients:’ field.
  10. (Optional) The ‘Min Clients:’ field can be edited to match the ‘Num Clients:’ field. This is so the test can be run with fewer clients and still pass. Please note that a validation run for submission must have at least eight clients.
  11. (Optional) Individual client tests can be removed from a test run. To do this click on the ‘Configure’ tab. Then select the test to remove and click on the Del button. Clicking on the Scan button will restore the default list of tests.
  12. (Optional) The length of a test run can also be configured under the ‘Configure’ tab by editing the ‘Run Test(hrs):’ field. Please note that a validation run for submission must be at least 24 hrs long
  13. Click on the Start button
  14. At this point a dialog titled "Specify Account Password" will appear. Enter the account and password for the cluster administrator account. This will allow a special monitoring service (qserv) to be installed and run in the desired security context. If the password/account information entered is incorrect there should be a dialog indicating this, when this happens return to step 13.
  15. A dialog titled "Specify static IP Addresses" will now appear. The dialog will request one IP address for every disk resource. Type each static IP address and click on the Add button. If you enter the wrong address select the incorrect address and click on the Del button to remove it.
  16. Type the subnet mask and click on the OK button. All of the static IP addresses will be assigned this subnet mask. Once you click OK the IP, NetName, Share, Web and FTP resources will be created.

After step 14, the client master will install qserv and other needed test services and files onto the cluster node. It then will proceed to do the same for each client in the client list. When step 16 completes the tests will then be started on each of the clients, on the nodes and then finally on the local system (i.e. client master). When that is completed the valdclus process will switch to the ‘Status’ tab and start the clock. The tests will then run for 24 hours. After that period all of the client nodes will shut down the tests to start Phase 5.

 

Instructions to setup cluster resources and start Phase5 tests

Please refer to the Phase2 test, the setup procedure is identical except.

The Cluster Name can be changed by simply typing a new name in the ‘Cluster Name:’ edit box, and optionally hitting the Verify button. The client names can be changed using the Add/Del buttons. The Phase 5 test runs for 24 hours. At a random time interval the cluster nodes will crash back and forth during this part of the testing. After that period all of the client nodes will shut down.

 

When the Start button is pressed, the client monitor will initiate the client/server tests on each of the client node(s) automatically. After the client tests are started they will be added to the list displayed in the "Cluster Validation Tests" process under the ‘Status’ tab. As the tests are started "Generic Application" resource(s) will be created on the cluster. The "Generic Application" resource(s) will run a local lrgfile test against each of the shared disks. For the "Validate 2 Node Cluster" test an additional test process "spfail.exe" will be started on the client monitoring code. This process will periodically crash one of the nodes in the cluster.

The client tests are designed to constantly access the server and put stress on the network and also the shared SCSI bus. The tests can handle a node crash while they are running. The tests will just resume whatever type of client/server IO they were doing before the crash happened. These tests are designed to simulate what will happen in a real cluster environment when you have 100s of clients accessing the cluster and a failure happens. These tests will be continually asking for service from the cluster so they simulate many real world clients that only ask for server services a small percentage of the time.

We currently have 5 different client/server tests and more will be added in the future. So at least 10 test instances will be started against the cluster from each client node.

When the client and failover tests are completed the client monitoring node will shutdown all of the tests and produce a summary log. This summary log is reported to the test manager. In addition each of the client nodes instances may also have a more detailed log file(s) for each test. At the moment no automatic means exists for gathering these logs.

Clicking the ‘Abort’ button will stop any running tests, in some cases this will also force a close of the window if a critical process (spfail.exe) was stopped. Closing the window at any time will abort the tests and initiate the clean up code. This will attempt to stop all the test processes and delete all the test resources. This will also return the summary log (vald1nod.log or vald2nod.log) to the test manager.

 

Interpreting the log

When the Validate Cluster Tests exit a summary log (vald1nod.log or vald2nod.log) should be generated. If all of the tests started successfully the first part of the log will be a listing of the machines involved. It should look something like this:

****

+VAR+INFO 0 : [<computerName>] 1381 Service Pack 3, cpu:2, level:21164, mem:63, page:112

cpu: is the platform type ( 0 for x86, 2 for alpha)

level: is the level of the chip ( 4, 5, 6 for x86 21064, 21164 for alpha)

mem: is the amount of physical memory (in meg)

page: is the max commit limit (in meg)

If the test failed to complete there should be a line like this:

+VAR+SEV2 0 : Test stopped with XXX minutes elapsed.

Where XXX < the expected time. This should be next to another line indicating the state the test was in when it exited.

+VAR+SEV2 1 : Tests did not complete

Exiting while in state: <State> : <StateNum>

Possible states include:

Unknown: <StateNum> - this is an unmapped error the <StateNum> indicates the value

Stopped:, Connect_Error, Start_Error Running and Stop_Error:

- test was aborted by the user while in this state.

Running_Error: - this likely indicates a failure in one of the ‘critical’ processes. This usually means that spfail.exe exited unexpectedly. But it could also mean that all instances of a critical node process (clussvc etc) are not running on any of the cluster nodes. To figure out which it is you need to look at the next piece of the log file, which contains the last state of all the test processes. These are in the form:

+VAR+INFO 2 : <process>|<pid>|<state>|<elapsed time>|<computer>|<type>

Scan down the list looking for processes of type critical_<something>. For processes of type critical_node there should be at least one instance in the running state. For the vald2nod.log there should also be a processes spfail.exe process of type critical_local. If the failover passrate is less then 100% or if the last state isn’t Running or Stopped with and elapsed time equal to the run time. Then the failure was do to some problem encountered by spfail.exe. You should then look at the spfail.log file for further information.

If the test ran to completion it is still possible for the test to fail if:

This last point can be verified by counting the number of ‘qserv’ processes of each of the respective types. As well as counting the number of ‘clussvc’ processes.

 

File I/O Testing Using an SMB Share

Microsoft has rewritten some of the Windows NT file I/O tests so they can handle failovers while running I/O tests. This test requires that one of the shared drives be set up with both a file share and a network name. The file share allows mapping from a logical name to a physical partition; the network name allows clients to access the file share. All of this will be set up by the scripts when they are run from the client monitoring node.

Syscache Test

Syscache is designed to perform a series of I/O operations to a network drive. It will precisely verify the write operations using read operations. If a write operation fails to a system failover, the test will redo the write operation. If a read operation fails, the test will redo the read operation and compare the result to the last write operation.

The Microsoft client-server test launcher will start Syscache. Currently, tests in Cluster mode perform only unbuffered I/O and retry on every file I/O operation.

Lrgfile Test

This test creates a large file and then reads backward and shrinks the file while checking read data.

Lrgfile uses unbuffered I/O only and retries on every file I/O operation.

The user will not be required to know the syntax of these test programs. The test launcher will start the test when it is run on each client node. If the cluster is set up using the well-known names that Microsoft provides, no further input will be required.

The Lrgfile program will run as a generic cluster application on the server. This allows the tests to provide local heavy I/O stress on the cluster as well as client I/O stress. This is helpful in ensuring that the SCSI bus can handle failovers when large I/O operations are present on the bus. Validation test runs two slightly different variations of LRGFILE: one localy on cluster server node (as Generic Application resource) and one as a client on client machine.

Problem resolution:

If LRGFILE test fails, look for LRGFILE.LOG log file. This file should contain error information and reason for failure. Since LRGFILE test is run with -Vn switch (leave n MB of free space on disk), common problem is that test did not start at all, because there was not enough space to start the test. LRGFILE retries 15 times with 1 second pause between retries before exiting with this error. Another common problem is disk media failure. Such a problem is reported as data corruption because expected data was not read from the disk. To eliminate that kind of error run LRGFILE test localy on server. You can find LRGFILE.EXE in your HCT\TESTBIN directory or on HCT CD. Copy it on server node and run command from WINDOWS NT console.

lrgfile -r10 -p32 -b8 -dX:

-r10 means run 10 times

-p32 means that LRGFILE will use 32*4 kB chunks of data for each write/read operation

-b8 means that LRGFILE will use 8 buffers for asynchronous write/read

-dX: replace X with suspected shared drive letter. Be sure that disk is online on the node that runs LRGFILE!

LRGFILE will grow temporary test until it consumes entire available disk space, than shrinks file back while checking data. DO NOT MOVE/FAILOVER DISK RESOURCE DURING THIS TEST. After LRGFILE finishes, look in the log file LRGFILE.LOG (in the same directory LRGFILE.EXE was run from) and search for data errors (e.g. Disk data error: Chunk#xpct 0x12, chunk#got 0x0, Page#=0). If you see this error your disk failed and is unreliable. If your disk passes this test, but does not pass node test with random moves/failovers, it point to the cache problem. If both tests pass but client test does not it points to the problem in redirector. Mostly, data was not written on the disk, but was cached (usually on hardware level) and reported as saved at the time prior to failover/move.

Most of other errors are either due to cluster service /resource failure or network failure.

 

Mapfile Test

This file system test sets up memory-mapped files to the cluster. It then modifies a section of the mapped file, committing these changes to disk by flushing the file mapping. After the flush operation completes, the test will read the data back into memory and compare that the correct data was written to the disk. If a failover happens before the flush operation, all changes are discarded and the test restarts. If a failover happens while the test is in the verification phase, it will simply redo the verification.

Problem resolution:

Mapfile test requires 4 MB of space on tested drive. If there is not enough space, Mapfile exits. Another common problem is network failure or cluster service/resource failure. If test fails because of data corruption problem (read data differs from expected), the cause is not easy to determine. Mostly, data was not written on the disk, but was cached (usually on hardware level) and reported as saved at the time prior to failover/move. See LRGFILE Problem resolution paragraph how to eliminate disk-media problem.

 

IIS Testing

In the list of cluster groups, Microsoft created the IIS group. This group will contain a network name and a shared driver. As part of the server setup script, Microsoft will copy some HTML files to the shared drives. Microsoft has modified its IIS client tests to continually poll the IIS server for information. This test simulates what an Internet Explorer version 3.0 (or higher) or other browser client would see by constantly accessing the IIS pages on the cluster. It will also retry in the case of errors being returned from the IIS server on the cluster. The test will perform operations to make sure that the IIS server is returning correct data from the shared drive.

This test will be totally automated and will connect to the IIS virtual root in each disk group using the network name. A virtual root is a mapping from an IIS alias to a physical location on a disk currently owned by an IIS server machine. A typical example would be http:\\WolfpackIIS\home mapped to I:\wwwroot\home.html, where WolfpackIIS is the network name and Home is the IIS alias or virtual root.

There will be two virtual roots for each disk group:

There will be specific files that clients will access for each virtual root. For the WWW root, Microsoft will copy HTML files from the client monitoring node to the server. For FTP roots, Microsoft will copy files to the server from the client-monitoring node.

Gqstress Test

This test is designed to do constant IIS queries against the virtual root set up in each disk group. The test will make sure that the virtual root is online. If the test is unable to access the root, it will retry the operation. The test will allow the root to be offline for a certain period of time. This is expected during failovers because the network name and IP address have to be moved to the other server. This test can simulate thousands of queries in a short time; it is designed to stress the IIS virtual roots. Each client will have one instance of this test doing queries against each IIS virtual root that is a WWW root.

Problem resolution:

The name of the log file is gqstress.log. A SEV2 error indicates a failure. It is normal to have select time out as it happens during fail over. The most probable cause for the test failure is that IIS may not be started, crashed or security problems. Ensure that IIS is installed and running. Use the IIS Service Manager to check whether the IIS service is started. You can use Microsoft Internet Explorer to see if IIS is accessible from the client. If IIS is up and running check if all the virtual roots that are created are up and running (You can do this from IIS Service manager).

Do the following from the client browser to check if the IIS, network connectivity & security are fine.

Past experience showed the most common cause for the failure of gqstress is security. This is noticed if one (or both) of the cluster node is PDC or a BDC. This is caused by the fact the IIS user account becomes invalid (IUSR_NODENAME) if the node is PDC or BDC.

 

Ftpstress Test

As part of the setup, Microsoft will copy some small test files to the shared disks. This test will use FTP transfers to move that file back and forth from the client nodes. In the case of a failure, it will redo the last operation. It will keep track of which files have been successfully transferred to the server, and then verify that those files are actually on the server.

Problem resolution:

The name of the log file is ftpcont.log. A "FAIL: Max time out" indicates a failure. The most probable cause for the test failure is that ftp server may not be started or crashed. Ensure that ftp service is installed, running and accessible. Use the IIS Service Manager to check whether the FTP service is started. You can use the ftp client program that comes with Windows 95 or NT to check if ftp service is accessible from the client.

Print Server Testing

Note: This test is not included in the current version of the test kit.

Microsoft has not yet finished designing the print server testing. The basic idea is that clients will spool print jobs to the print server, the print server will spool the jobs to the shared drive, and then the clients will check their print jobs. In the case of failover, the clients will check that their print jobs are still available when the print server moves to the other node.

Causing Cluster Failovers During Client-Server Tests

When running the client-server stress tests during Phase 5, the most important test case to have is cluster nodes crashing asynchronously. This will simulate what a real-world server might encounter. When the client-server stress tests are running, Microsoft will be simulating many clients simultaneously accessing the server. The cluster must be able to lose one node with all of this client activity. The clients should not experience more than a 30-second delay for all of their resources coming available on the nonfailing node. To accomplish this, Microsoft will install a special service on both nodes of the cluster. Microsoft will also install a test program on the client-monitoring machine, which will communicate with the service on both nodes of the cluster.

This test will also verify that all cluster resources present when the test starts will be moved back and forth when the nodes crash. If this test finds any problems with the state of the cluster, it will cause both nodes of the cluster to break into the kernel debugger. Without this, it is almost impossible to debug problems with the state of cluster resources. The failover test will wait one hour between each reboot to crash the other node. This will allow for much client I/O.

The failover test is designed to crash one node of a cluster and then the other node. It waits for the crashed node to reboot until it crashes the next node. This means that the client programs can expect to have access to the cluster resources at all times, except when actual failovers are happening. This test ensures that the controller firmware and also the miniport driver for the controller don’t stall while rebooting when the other side has SCSI reservations on the shared drives. It allots time for each server to reboot after a crash. If the server fails to reboot within the allotted time, it registers a failure. This is how most Windows NT Server setups will work when the default is to have the node crashdump and then automatically reboot after a failure.

As part of Phase 5, Microsoft will set up this test on both servers and also on the client-monitoring node. No changes are needed to the other client nodes. This will install a new service and a special driver on each server in the cluster.

The client-monitoring node will log all information and print out its status on the kernel debugger for each server in case of problems. The log file on the client-monitoring node is called spfail.log.

The Crashtst test should be run during all of Phase 5. The number of reboots will depend upon how fast the machine reboots. If anything goes wrong or if the client node detects any inconsistent resource states on the cluster, it will cause each cluster node to enter the kernel debugger, and the test on the client node will stop. To analyze the problem, the log file generated by the Spfail.exe test can be analyzed along with the logs on the cluster nodes from the cluster service.

Failover Program

The components of the failover program are Spfail.exe on the client monitor node, spsrv.exe, remclus.dll and crashtst.sys on each node. Spsrvcl.exe is a client program to spsrv is used to debug spsrv. Spfail.exe sends the crash command to the spsrv. Spsrv after receiving the crash command, it passes the command to the kernel mode device driver crashtst, which in turn calls the following HAL routine HalReturnToFirmware to produce a node crash.

Interpreting the failover log

The name of the logfile is spfail.log. Search for the "SEV2" from the beginning of the log file. The first occurrence of the SEV2 error is the cause for the failure of the spfail.exe.

Atleast one node should be up during the entire Phase 5 testing. If both the nodes die then spfail typically gets a 6BA (Hex) RPC Server unavailable. All the error numbers in spfail are in Hex.

Another typical reason why a spfail fails is if the node fails to boot after the crash. In this case spfail.exe will time out.

Trouble shooting Failover

To check if the spsrv is installed and functioning properly on each node run the following commands from the client monitor.

"spsrvcl –host:<node name> -cmd:ping" and

"spsrvcl –host:<node name> -cmd:ping –input:crash"

Both the above commands should return status=0 for success. Success of the first command implies that spsrv is up and running. Success of the second command implies that the spsrv has loaded the remclus.dll which is a required component to crash the node.

To manually crash the node run the following command from the cmd window.

"spsrvcl –host:<node name> -cmd:crash"

Additional Stress Testing (24 hours) (Optional for Release1.0 of Wolfpack)

NOTE: This set of tests is not included on the test kit. We were unable to finish automating this test kit into the HCT tests. This test log will not be required to submit to Microsoft for cluster configurations to be put on the configuration HCL. We feel like this is a valid test mix and we are working on making these tests available to OEMs and other vendors.

The objective of stress testing is to make sure the clustering software can maintain communications with heavy processor load and I/O load at the same time. The nodes of a cluster constantly send packets back and forth to make sure each node is still functioning properly. If one cluster node doesn’t get a packet from another node in a specified time, the node not receiving the packet will declare the other node to be out of the cluster. Therefore, it becomes very important that Microsoft put as much load as possible on Wolfpack servers to see how well they and the clustering software can handle a high load.

To accomplish this, Microsoft will leave Disk Group 1 on WolfpackA and Disk Group 2 on WolfpackB. This means that of the two or more shared drives, one or more will be owned on WolfpackA and one or more will be owned on WolfpackB. Having the disks owned on different nodes will test sending I/O commands on the same bus to drives from different controllers on the same shared SCSI bus. While running this test, the cluster should never claim one node of the cluster down. Cluadmin should be started on a remote node to monitor the cluster.

Test objective: Run Windows NT I/O stress on each node for a 24-hour period.

A separate document will describe this set of tests. These tests will become part of the Windows NT Server test kit and can be run from the HCT kit in the future. The I/O stress tests are all part of the normal Windows NT file system tests. They will execute all of the supported file system operations in a stress mode.

Simultaneous Reboot Test (optional no logs required)

This test is not an automated test. However, Microsoft has found so many problems with controllers and firmware when on a shared bus that this test was deemed necessary. The main objective is to make sure the controllers never hang during boot when both are turned on at the same time.

For this test, the system should be set up in the same fashion as for the crashtst test. As soon as both machines are booted, the cluster administrator tool should be used to make sure the shared drives can be moved back and forth to both nodes. Next, shut down both nodes and repeat the test. This should be performed 10 times to ensure that the boot works properly. No hang should be observed where one node hangs or has to be rebooted to get out of a hang.

Also, the controller firmware or the miniport driver should not hang for an unreasonable length of time while timing out commands or getting SCSI RESERVATION_CONFLICT errors.

Move Group 2 Node Test (optional no logs required)

Please note that this test is different from the Validate Move Group 2 Node Test.

This test is run from a client master in the same manner as the Validate 2 Node test, however this test will be configured with no additional clients. This test will simply exercise the ability of the cluster and devices to handle continuos move group operations over a 24 hour period. A generic resource (lrgfile.exe) test will be run in each of the configured disk groups to simulate disk io stress/activity on the cluster nodes during the move.

Crash 2 Node Test (optional no logs required)

Please note that this test is different from the Validate Crash 2 Node Test.

This test is also run from a client master in the same manner as the Validate 2 Node Test (and the Move Group 2 Node Test). Like the move group test, this test will be configured with no additional clients and will complete in a 24 hour period. This test will exercise the abitly of the cluster and devices to handle continuos reboots of the cluster nodes. A generic resource (lrgfile.exe) test will be run in each of the configured disk groups to simulate disk io stress/activity on the cluster nodes during failover.

How to Submit results to WHQL

To submit results you must have run all of the required tests. You will be required to submit 3 floppy disks with log sets on them. Here is the list of which log files should be on each floppy disk that is required to submit the logs.

  1. Phase 1 Server Log. This log file called clustsim_server.log has the output of the server side of the low level shared SCSI test. You need to enter all system information for the server that this test ran on.
  2. Phase 1 Client Log. This log file called clustsim_client.log has the output of the server side of the low level shared SCSI test. You need to enter all system information for the server that this test ran on. When you gather the results for the complete configuration testing you should list how you want your configuration listed on the HCL in the notes section. Guidelines on how to do this are listed below
  3. Phase 2 and Phase 5 logs. These log files are generated on the monitoring node. The log files vald1nod.log and vald2nod.log will be put onto the diskette. You will need to fill in the system information on this node. However this information will only be of what was on the client node. The HCT test manager has no way to return log results without going through this process. All we check is the log results and not the actual monitoring node specifics.

 

What to do if tests fail, but you think it is a test bug?

We realize that in some cases you will run into a problem where you think it is a test bug that is blocking the tests from passing at 100%. Please go back and look in the trouble shooting section first. Failing that you can send the required log information to wolfhct@Microsoft.com so we can look at your problem. If we do determine that it is a test bug we will allow your configuration to be listed even if the test results are not 100% pass. We believe that most test results of valid configurations should pass at 100% though. At a minimum the required log information sent to wolfhct@Microsoft.com should consist of:

The cluster logs for each node, the vald2nod.log (or vald1nod.log), spfail.log and the output from the following commands:

  1. cluster.exe <node-name> res
  1. cluster.exe <node-name> group
  1. cluster.exe <node-name> node

The cluster logs are by default turned off in a retail install. To enable cluster logging use the system applet under the control panel to create the system environment variable ‘ClusterLog’ and set it to the path of the log file to create.

Example: ClusterLog=C:\cluster.log

 

How to Return Log results

After running either the Phase 1, Phase 2 or Phase 5 tests you can go to the HCT Test manager and select return Cluster results. You will need a floppy disk for each of the 3 test machines involved. Please label the diskettes as follows

  1. Cluster node #1 ( should have Client/Server – Server test ran on it )
  2. Cluster node #2 ( should have Client/Server – Client test ran on it )
  3. Monitoring Node. (Should have vald1nod.log and vald2nod.log on it)

 

Cluster description on the HCL

We allow each vendor submitting a cluster configuration to pick the format of how it will be listed to some extent. No obvious marketing material may be included. Here is the general format that should be followed. This information should be listed in the Notes section when you are return the cluster logs for #2 above ( client/Server – Client test ).

Cluster Configuration Name

Server #1 name

Server #2 name

Shared SCSI components (only Controllers, Raid Controllers, or Raid devices should be listed. Don’t list drive cabinets, drives, cables, etc…) NOTE: You must list the SCSI controller you are using if you are using a HW Array device. If you are using a PCI based SCSI Raid controller then you need only list that device.

Notes ( any support information. You may also include a link to your own URL for further info )