Failover

If a metadata server fails and the metadata-server automatic restart service cannot bring it back into the cluster, or if you manually stop the metadata server, SAN File System automatically and non-disruptively fails over the metadata-server workload by redistributing its filesets and, if necessary, reassigning the master role to other active metadata servers.

SAN File System also detects rogue metadata servers. A rogue metadata server is not reachable from the cluster, fails to respond to requests, and might be running or have latent queued I/O. If a rogue metadata server is detected, the cluster first attempts to communicate with the rogue metadata server from disk to have it complete and quiesce all I/O activity that is failing and stops the engine running the rogue metadata server before failing over its workload.

Note:
  1. After a failover, review the workload reassignments.
  2. Administrative commands that are interrupted by a failover need to be manually restarted against the new master metadata server.

Redistributing filesets

SAN File System attempts to reassign filesets in a useful way across the remaining active metadata servers based on a distribution algorithm. The distribution algorithm first attempts to redistribute the static filesets to a spare, idle metadata server that is set aside for failover. A spare metadata server is one that has no static filesets assigned to it. If more than one spare exists, all static filesets assigned the failed metadata server are distributed to a single spare. If a spare metadata server does not exist, the static filesets are treated as dynamic filesets. The dynamic filesets are then distributed in a round-robin fashion among the metadata servers with the fewest number of assigned filesets.

The failover is temporary for static filesets. A static fileset is a fileset that you manually assigned to a specific metadata server (using the mkfileset or setfilesetserver command). These filesets fail back to their statically assigned metadata server when that metadata server rejoins the cluster. Dynamic filesets, which are assigned to a metadata server by the system, are not reassigned to their previously assigned metadata server; however, they might be redistributed during failover to rebalance the workload after the static fileset fail back.

Reassigning the master role

When a failure affects the master metadata server, the master role is reassigned to another metadata server according to a quorum algorithm. This algorithm makes use of a quorum disk and a majority voting procedure to assign the master role to a metadata server that is a member of the largest active, mutually-connected group of metadata servers that all have access to the system storage pool.

The quorum algorithm does not take into account the network connectivity between the metadata servers and the clients. If a network partition separates the clients from the metadata server, the chosen master might not be ideal.

Restriction: You cannot specify a preferred master metadata server as the failover target or predict the failover target. Reserve some space capacity for the master role on each metadata server in the cluster. The master role requires only a small amount of processing.

Parent topic: Cluster

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.