Tips for Using ZFS Layered Over DRBD with Pacemaker
ZFS is a popular modern file system and using it layered on top of DRBD® requires some special considerations. This article will describe some considerations and requirements for building a system with ZFS over DRBD managed by Pacemaker.
-
DRBD® 9.x resource is created and started on all nodes.
-
Pacemaker and Corosync are configured and started on all nodes.
-
ZFS is installed for your distribution (ZFS “Getting Started” documentation.
For DRBD to be used underneath ZFS, we must disable DRBD 9’s auto-promotion feature. ZFS doesn’t hold the device open in the kernel the same way other file systems or processes in Linux do, and therefore will not cooperate with DRBD’s auto-promote features.
resource r0 {
device /dev/drbd0;
disk /dev/sdb;
meta-disk internal;
options {
auto-promote no;
}
on zfs-0 {
address 192.168.222.20:7777;
node-id 0;
}
on zfs-1 {
address 192.168.222.21:7777;
node-id 1;
}
on zfs-2 {
address 192.168.222.22:7777;
node-id 2;
}
connection-mesh {
hosts zfs-0 zfs-1 zfs-2;
}
}
Also, if you’re planning on using multiple DRBD devices to create a
zpool
, you will want to use a multi-volume DRBD configuration.
resource r0 {
volume 0 {
device minor 0;
disk /dev/sdb;
meta-disk internal;
}
volume 1 {
device minor 1;
disk /dev/sdc;
meta-disk internal;
}
on zfs-0 {
address 192.168.222.20:7777;
node-id 0;
}
on zfs-1 {
address 192.168.222.21:7777;
node-id 1;
}
on zfs-2 {
address 192.168.222.22:7777;
node-id 2;
}
connection-mesh {
hosts zfs-0 zfs-1 zfs-2;
}
}
Once your DRBD resource is created, promote it on a single node and
begin creating your zpool
.
# drbdadm primary r0
# zpool create new-pool /dev/drbd0
# zpool status
pool: new-pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
new-pool ONLINE 0 0 0
drbd0 ONLINE 0 0 0
errors: No known data errors
You should now see you have the ZFS file system mounted at /new-pool
.
# mount | grep new-pool
new-pool on /new-pool type zfs (rw,xattr,noacl)
Test putting something into the mount point, and then export (umount
and
stop) the zpool
, and demote the DRBD device.
# zpool export -f new-pool
# zpool status
no pools available
# mount | grep new-pool
# drbdadm secondary r0
Now the DRBD device can be promoted on a different node, and the zpool
can be imported and used there. You should see whatever data you placed
into /new-pool
on the previous node has been replicated to all peers.
# drbdadm primary r0
# zpool import -o cachefile=none new-pool
# zpool status
pool: new-pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
new-pool ONLINE 0 0 0
drbd0 ONLINE 0 0 0
errors: No known data errors
# mount | grep new-pool
new-pool on /new-pool type zfs (rw,xattr,noacl)
To make a zpool
backed by DRBD a part of Pacemaker configuration you
need verify that the DRBD device backing the zpool
gets started and
promoted before the zpool
is imported. You also need verify that the
zpool
is imported on the node where DRBD has been promoted.
In Pacemaker specific terms, the zpool
must be colocated with the DRBD
primary, and ordered to start only after DRBD is promoted.
The following examples satisfy these requirements in both crmsh
and pcs
configurations, respectively:
primitive p_drbd_r0 ocf:linbit:drbd \
params drbd_resource=r0 \
op start interval=0s timeout=240 \
op promote interval=0s timeout=90 \
op demote interval=0s timeout=90 \
op stop interval=0s timeout=100 \
op monitor interval=29 role=Master \
op monitor interval=31 role=Slave
primitive p_zfs ZFS \
params pool=new-pool \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
op monitor interval=20 timeout=40s
ms ms_drbd_r0 p_drbd_r0 \
meta master-max=1 master-node-max=1 notify=true clone-node-max=1 clone-max=3
colocation cl_p_zfs-with-ms_drbd_r0 inf: p_zfs:Started ms_drbd_r0:Master
order o_ms_drbd_r0-before-p_zfs ms_drbd_r0:promote p_zfs:start
<resources>
<primitive id=p_zfs class=ocf provider=heartbeat type=ZFS>
<instance_attributes id=p_zfs-instance_attributes>
<nvpair name=pool value=new-pool id=p_zfs-instance_attributes-pool/>
</instance_attributes>
<operations>
<op name=start interval=0 timeout=60s id=p_zfs-start-0/>
<op name=stop interval=0 timeout=60s id=p_zfs-stop-0/>
<op name=monitor interval=20 timeout=40s id=p_zfs-monitor-20/>
</operations>
</primitive>
<master id=ms_drbd_r0>
<meta_attributes id=ms_drbd_r0-meta_attributes>
<nvpair name=master-max value=1 id=ms_drbd_r0-meta_attributes-master-max/>
<nvpair name=master-node-max value=1 id=ms_drbd_r0-meta_attributes-master-node-max/>
<nvpair name=notify value=true id=ms_drbd_r0-meta_attributes-notify/>
<nvpair name=clone-node-max value=1 id=ms_drbd_r0-meta_attributes-clone-node-max/>
<nvpair name=clone-max value=3 id=ms_drbd_r0-meta_attributes-clone-max/>
</meta_attributes>
<primitive id=p_drbd_r0 class=ocf provider=linbit type=drbd>
<instance_attributes id=p_drbd_r0-instance_attributes>
<nvpair name=drbd_resource value=r0 id=p_drbd_r0-instance_attributes-drbd_resource/>
</instance_attributes>
<operations>
<op name=start interval=0s timeout=240 id=p_drbd_r0-start-0s/>
<op name=promote interval=0s timeout=90 id=p_drbd_r0-promote-0s/>
<op name=demote interval=0s timeout=90 id=p_drbd_r0-demote-0s/>
<op name=stop interval=0s timeout=100 id=p_drbd_r0-stop-0s/>
<op name=monitor interval=29 role=Master id=p_drbd_r0-monitor-29/>
<op name=monitor interval=31 role=Slave id=p_drbd_r0-monitor-31/>
</operations>
</primitive>
</master>
</resources>
<constraints>
<rsc_colocation id=cl_p_zfs-with-ms_drbd_r0 score=INFINITY rsc=p_zfs rsc-role=Started with-rsc=ms_drbd_r0 with-rsc-role=Master/>
<rsc_order id=o_ms_drbd_r0-before-p_zfs first=ms_drbd_r0 first-action=promote then=p_zfs then-action=start/>
</constraints>
After configuring and committing your changes to Pacemaker you should have a simple ZFS fail-over cluster.
# crm_mon -1rD
Node List:
* Online: [ zfs-0 zfs-1 zfs-2 ]
Full List of Resources:
* p_zfs (ocf::heartbeat:ZFS): Started zfs-2
* Clone Set: ms_drbd_r0 [p_drbd_r0] (promotable):
* Masters: [ zfs-2 ]
* Slaves: [ zfs-0 zfs-1 ]
Reviewed by MDK – 2022/4/20