SERVERware 3 Split brain dangers and best practices

In computing, a split-brain refers to a state in which high-availability clusters have all of their
private links, which are used to monitor the health and status of each node in the cluster, go down
simultaneously, but the cluster nodes continue to run, each one believing they are the only one
running.


Each node may then randomly serve clients their own data set updates, without any coordination
with the other data sets. In this situation, shared storage may experience data corruption and if the
data storages are kept separate data inconsistencies might occur that will require operator
intervention and cleanup.


In order to prevent this from happening and to ensure data integrity, in SERVERware only one node
is allowed to run a VPS at a time. The use of power switches in the hardware configuration enables
a node to power-cycle another node before restarting that node's high availability services during a
failover process. This action prevents two nodes from simultaneously accessing the same data and
corrupting it. Fence devices are used to guarantee data integrity under all failure conditions.


Fencing is the process of isolating a node or protecting shared resources from a malfunctioning node
within a high availability environment. The fencing process locates the malfunctioning node and
disables it. There are a couple of fencing methods such as PDU fencing and IPMI fencing.


PDU fencing is an action of moving a cluster member that is in an unknown-but-presumed-inoperable
state into one where service takeover by the survivor is safe and the chances of a split-brain occurring
are small. PDU fencing cuts the power to the box to achieve this goal while IPMI fencing allows for a
more graceful shutdown of the troubled system.