Decreasing File System Unmount Times in DRBD Clusters
There are a few aspects of DRBD® nodes and your network that can slow
down the umount
process:
- Round-trip times between nodes where data is being copied and replicated to
- Large amounts of unwritten data, for example dirty cache, that is, data that is not yet written to the disk
- Data in RAM that requires a write-out to persistent storage
- Network throughput (when working with remote nodes)
When a DRBD stack spans a WAN and is configured to use the synchronous
replication protocol (Protocol C), the network throughput can increase
the time it takes the umount
process to complete. This can be true,
even with a stack that has premium hardware on all nodes in the cluster.
If the network connection between the nodes is for example 300 Mbps,
then that will slow down the cluster synchronization between the nodes.
Consider an example of a DRBD stack with the following network throughput speed and dirty cache size:
- 100 Mbps throughput on the connections between nodes
- 10 GiB of dirty cache
With a maximum throughput possible of 100 Mbps, it would take a minimum
of 800 seconds (10 x 8 x 1000 / 100) to process the umount
.
However, with adjustments to the cluster, the bottlenecks can be reduced, and
you can speed up unmount times. For example, increasing the network throughput
to 1 Gbps with low enough (RTT) could improve the umount
speed in the example
above from 800 seconds to 80 seconds.