Tuning the DRBD Resync Controller
This article describes how to tune the DRBD® resync controller to optimize resynchronization speed and avoid over saturating the replication network. This can lead to a better performing and healthier DRBD device.
The dynamic sync-rate controller for DRBD® was introduced way back in version 8.3.9. It was introduced as a way to slow down DRBD resynchronization speeds. The idea here is that if you have a write intensive application running atop the DRBD device, it might already be close to filling up your I/O bandwidth. Using a dynamic rate limiter is a way to ensure that recovery resync does not compete for bandwidth with the ongoing write replication. To ensure that the resync does not compete with application I/O, the defaults are typically conservative values.
If the defaults seem slow to you or your use case, you can speed things up with a little bit of tuning in the DRBD configuration.
It is nearly impossible for DRBD to know just how much activity your storage and network back end can handle. It is fairly easy for DRBD to know how much activity it generates itself, which is why we tune how much network activity we allow DRBD to generate.
The dynamic sync-rate controller is configured using the following DRBD settings:
- resync-rate
- rs-discard-granularity
- c-max-rate
- c-min-rate
- c-fill-target
- max-buffers
- sndbuf-size
- rcvbuf-size
The following sections will help you tune each of the settings mentioned above.
Set the resync-rate to ⅓ of the c-max-rate.
With the dynamic resync-rate controller, this value is only used as a starting point. Changing this will only have a slight effect, but will help things speed up faster.
Size is specified in bytes and has a default value of zero and a maximum value of 1048576.
From the drbd.conf-9.0
man page:
When
rs-discard-granularity
is set to a non zero, positive value then DRBD tries to do a resync operation in requests of this size. In case such a block contains only zero bytes on the sync source node, the sync target node will issue a discard/trim/unmap command for the area. […] This feature only gets active if the backing block device reads back zeroes after a discard command.
This setting should be of particular interest in cases where LINSTOR® is managing ZFS volumes. LINSTOR sets the rs-discard-granularity to 8K (8x1024=8192) bytes on ZFS volumes (zvols). For a LINSTOR volume backed by an empty, for example, newly created, ZFS volume, LINSTOR’s default value results in a slower resync than could be possible with a higher value. Increasing the rs-discard-granularity value, for example, to 1M (1024K) bytes, will result in a significant speed increase. During one test, resync speeds increased from 200KiB/s to 200MiB/s!
You can view the current value of the setting using the command:
linstor volume-definition list-properties <resource> <volume_number>
It is possible to override LINSTOR’s value for the rs-discard-granularity setting using the following command:
linstor volume-definition set-property <resource> <volume_number> DrbdOptions/Disk/rs-discard-granularity 1048576 # or a different value
Set c-max-rate
to 100% (or slightly more) than what your hardware can
handle.
For example: if you know your network is capable of 10Gbps, but your disk throughput is only 800MiB/s, then set this value to 800M.
Increase the c-min-rate
to a third of the c-max-rate
.
You should usually leave this value alone as the idea behind the dynamic sync rate controller is to “step aside” and allow application I/O to take priority. If you really want to ensure things always move along at a minimum speed, then feel free to tune this a bit. As mentioned earlier, you might want to start with a lower value and work up if doing this on a production system.
Set c-fill-target
to 1M.
This should be enough to get the resync rate going well beyond the defaults.
Increase max-buffers
to 40k.
40k is usually a good starting point, but good results have been seen with values anywhere between 20k to 80k.
Set sndbuf-size
and rcvbuf-size
to 10M.
The kernel usually tunes TCP buffers automatically, but setting this to a static value might help to improve the resync speeds. Again, on a production system, start with a more conservative value, like 4M, and increase it slowly while observing the system’s behavior and response to the changed setting.
IMPORTANT: This should not be done in production environments outside of a planned maintenance time period. Disabling the dynamic sync-rate controller to implement full synchronization speed will mean that DRBD synchronization is favored above application I/O. Use this parameter setting with extreme caution.
In certain situations, you might find that resync-ing is taking longer than you want. This could be due to a variety of factors particular to your setup and environment. One way that you can achieve full synchronization speed is to temporarily disable the dynamic sync-rate controller. However, full synchronization speed will not magically make your network or disk speeds faster. Full speed is still subject to the limits of your hardware.
To disable the dynamic sync-rate controller, in the
peer-device-options
section of your DRBD resource configuration file,
set:
c-plan-ahead 0;
If you are using LINSTOR:
linstor resource-definition drbd-options --c-plan-ahead 0 <resource>
To re-enable the dynamic sync-rate controller in LINSTOR:
linstor resource-definition drbd-options --unset-c-plan-ahead <resource>
You can read more about the c-plan-ahead
parameter in the
drbd.conf-9.0
man page.
Reviewed 2022/10/14 – RJR
Updated 2022/10/14 - MAT