The Startup Behavior of a 2-Node Pacemaker Cluster
This article describes fencing considerations, quorum settings, and commands for sensible behavior in a 2-node Pacemaker with Corosync cluster
- Do not go online with stale data (replication case).
- Do not cause a startup fencing loop.
- Run Pacemaker primitive services exactly once (prevent IP conflicts, data corruption, and other issues).
To be allowed to start services, a node needs to be quorate, that is, the node needs to be a member of a cluster partition that has quorum. The node also must be certain that the respective Pacemaker primitive service is not (and cannot possibly be) running anywhere else.
Setting two_node: 1
within the quorum section of a Corosync
configuration file enables 2-node cluster operations. Enabling this
setting automatically enables the Corosync wait_for_all
quorum option.
This is sensible behavior because quorum in a 2-node cluster is two nodes (50% of the votes + one). So on startup, a node will always wait for the other node, and only then become ready to provide services.
Pacemaker will then start, see both nodes in the membership, will probe for current service status, and try to change the state of the world using start, stop, or possibly other actions based on configured policy.
If the 2-node cluster now loses a node, the other node will continue to run services, or take over services. In a 2-node cluster, this presents a special problem as the 2-node cluster does not have real quorum (a simple majority in a cluster with an odd number of three or more nodes). If the 2-node cluster lost a node because of a communication problem, both nodes are alive, but from either node, the other node will appear to be unresponsive, and so each node needs to fence the other node before taking any further action. After a successful fencing operation, services cannot possibly run on the fenced node.
In this scenario, typically one of the nodes won the race to fence the other node, and so the other node is rebooting, due to Pacemaker options that you would have configured as part of a typical setup. After the reboot, if communication is still down between the nodes and without the wait for all behavior, the node would, after some timeout perhaps, start Pacemaker. However, because communication between the nodes is down, the newly rebooted node would think that the other node is unresponsive, and try to fence the other node that you, as the omniscient global observer, know is happily running services. This pattern would repeat with changing roles, each time with the newly rebooted node fencing the other node. You might consider not using fencing for your 2-node cluster. However, without fencing, and used-to-be replicated data, but no communication, you would get diverging data sets.
Without fencing, and with shared data, you would get data corruption. With proper fencing configured on both the Pacemaker and DRBD® levels, you might get successful STONITH behavior, but then DRBD would still refuse to take over with only consistent or outdated data on one of the nodes.
With the implicit wait for all, Pacemaker will not start, and so the newly rebooted node will not become quorate until communication with the peer has been reestablished. This avoids the startup-and-fence-the-other-node repeating loop.
The remainder of this article repeats and rephrases the above information, with some subtleties and some additional commands that you might want to use in certain circumstances. If you have understood the startup behavior of a 2-node Pacemaker cluster from the article so far, you can stop reading here, or else continue reading to reinforce and deepen your understanding.
In Pacemaker with Corosync 2-node clusters, you should use the
two_node: 1
quorum setting in your Corosync configuration file.
Remember from earlier discussion that this Corosync configuration
setting effects an implicit wait for all behavior for each node.
You should also consider that no-quorum-policy=stop
is the default
setting in Pacemaker if you have not configured it differently in your
Pacemaker configuration. Although if you have configured
no-quorum-policy=freeze
, the behavior described in this article in a
2-node cluster will be the same as for no-quorum-policy=stop
.
So after startup, and without communication to the other node, the newly
rebooted node does NOT have quorum, and without quorum, it will not
fence the other node or start anything, or do anything else, really,
because of the stop
(or freeze
) no-quorum-policy
setting.
Once communication is reestablished between the two nodes and the newly
rebooted node has seen its peer (and so become quorate), you then can
lose the peer (and, because of the two_node
Corosync quorum setting,
keep quorum).
Should you ever need to bring up an isolated single node, you can then explicitly cancel the initial wait for all stage, at runtime with the following command:
corosync-cmapctl -s quorum.cancel_wait_for_all u8 1
Of course, before doing this, you should confirm that this is the right thing to do in your specific situation by following some documented administrative best practices procedure that you should have in place around your data or services.
But properly configured DRBD might still prevent your node from going
online, if your node suspects that its peer might have better data. If
you, as an omniscient global observer, know better, then you can use the
drbdadm primary ~-~-force
command to manually try to have the node go
online with outdated or possibly stale data, or even with inconsistent,
but hopefully just by a little bit data. (An fsck
command is strongly
recommended here!)
The key point is that in a 2-node cluster, as in this case, Corosync
(the communication and membership layer of Pacemaker clusters) is
configured for two_node
quorum behavior, which implies
quorum.wait_for_all
, as in the corosync-cmapctl
command above.
That means that you can shut down one node from a 2-node cluster, keep
the other node running, and that should work just fine.
However, if you then chose to stop that single node as well, and restart
it as an isolated single node, the wait_for_all
Corosync setting will
block the node from starting services because it will be waiting for its
peer.
This behavior is by design, and in general a good thing. If you actually mean to bring up that single isolated node, and you know it has good data, and you know the other node is down, then you can explicitly cancel the initial wait for all stage with this command:
corosync-cmapctl -s quorum.cancel_wait_for_all u8 1
The recommended Corosync quorum setting for a 2-node Pacemaker with
Corosync cluster is two_node: 1
, which automatically enables an
implicit wait_for_all: 1
quorum setting.
The consequence is that you have to bring both nodes up, and in communication with each other, before the cluster will start services. You then can lose either node, so long as the other node keeps running.
If you have to boot a single, isolated node, and you know that this node
has the most recent and good data, and you know that the other node is
down, and will stay down, and you now want this node to start services,
without bringing up the other node, you can cancel the wait, using the
corosync-cmapctl
command above.
If you disable the wait_for_all
Corosync quorum setting, and set
no-quorum-policy=ignore
in your Pacemaker configuration, and get into
a situation where fencing does work, but the cluster communication does
not, then you might end up with two nodes repeatedly rebooting and fencing
each other.
This is why this is NOT the default setup. This situation could be mitigated by not starting the cluster software on regular boot. But that would always require operator interaction after a reboot for any reason.
Created by MAT (based on original content by LE) - 2022-07-21
Reviewed by DJV 2022-07-25