Setting Up High Availability NFS Storage with DRBD and Pacemaker

In environments where uptime is critical, having a High Availability (HA) Network File System (NFS) storage solution can make a huge difference. A HA NFS setup ensures that services remain online and data remains accessible, even if one node fails. In this post, we’ll guide you through creating an HA NFS storage solution using DRBD (Distributed Replicated Block Device), Pacemaker, and Corosync for redundancy and automatic failover.

Step 1: Prepare the Servers

To set up an HA NFS solution, start with two servers that will act as NFS nodes and have access to shared storage or use DRBD to replicate data between them.

Node 1: node1
Node 2: node2

Step 2: Install Required Software

Install the required software on both nodes:


Copied!sudo apt update
sudo apt install nfs-kernel-server drbd-utils pacemaker corosync
sudo apt update
sudo apt install nfs-kernel-server drbd-utils pacemaker corosync

Step 3: Configure DRBD for Data Replication

DRBD will replicate the storage between the two nodes, creating a shared storage layer for the NFS system.

Configure DRBD: Create a DRBD resource configuration file on both servers, for example, /etc/drbd.d/nfs.res.


Copied!resource nfs_data {
    protocol C;

    on node1 {
        device    /dev/drbd0;
        disk      /dev/sdb1;  # Adjust this to your specific setup
        address   192.168.1.1:7789;
        meta-disk internal;
    }

    on node2 {
        device    /dev/drbd0;
        disk      /dev/sdb1;  # Adjust this to your specific setup
        address   192.168.1.2:7789;
        meta-disk internal;
    }
}
resource nfs_data {
    protocol C;

    on node1 {
        device    /dev/drbd0;
        disk      /dev/sdb1;  # Adjust this to your specific setup
        address   192.168.1.1:7789;
        meta-disk internal;
    }

    on node2 {
        device    /dev/drbd0;
        disk      /dev/sdb1;  # Adjust this to your specific setup
        address   192.168.1.2:7789;
        meta-disk internal;
    }
}

Initialize DRBD: Set up and start DRBD on both nodes.


Copied!sudo drbdadm create-md nfs_data
sudo drbdadm up nfs_data
sudo drbdadm create-md nfs_data
sudo drbdadm up nfs_data

Set Primary Node: Make one node the primary:


Copied!sudo drbdadm -- --overwrite-data-of-peer primary nfs_data
sudo drbdadm -- --overwrite-data-of-peer primary nfs_data

Check DRBD Status: Verify the DRBD sync status.


Copied!cat /proc/drbd
cat /proc/drbd

Step 4: Set Up NFS on the DRBD Volume

Create a Filesystem: Format the DRBD resource (e.g., using ext4 or xfs).


Copied!sudo mkfs.ext4 /dev/drbd0
sudo mkfs.ext4 /dev/drbd0

Mount the Volume: Mount the DRBD resource to the NFS directory.


Copied!sudo mkdir -p /mnt/nfs_share
sudo mount /dev/drbd0 /mnt/nfs_share
sudo mkdir -p /mnt/nfs_share
sudo mount /dev/drbd0 /mnt/nfs_share

Configure NFS Exports: Set up NFS exports on both nodes in /etc/exports.


Copied!/mnt/nfs_share *(rw,sync,no_root_squash)
/mnt/nfs_share *(rw,sync,no_root_squash)

Restart NFS Service


Copied!sudo systemctl restart nfs-kernel-server
sudo systemctl restart nfs-kernel-server

Step 5: Configure Pacemaker and Corosync for Failover

Pacemaker and Corosync will manage HA for NFS by handling failover between the two nodes.

Set up Corosync: Configure Corosync in /etc/corosync/corosync.conf on both nodes to communicate over a multicast or unicast address.

Start Pacemaker and Corosync:

Enable and start the Pacemaker and Corosync services on both nodes


Copied!sudo systemctl enable corosync
sudo systemctl enable pacemaker
sudo systemctl start corosync
sudo systemctl start pacemaker
sudo systemctl enable corosync
sudo systemctl enable pacemaker
sudo systemctl start corosync
sudo systemctl start pacemaker

Set Up Pacemaker Resources: Add DRBD, NFS server, and Virtual IP as Pacemaker resources.


Copied!sudo pcs cluster auth node1 node2 -u hacluster -p yourpassword
sudo pcs cluster setup --name mycluster node1 node2
sudo pcs cluster start --all

sudo pcs resource create drbd_nfs_data ocf:linbit:drbd drbd_resource=nfs_data op monitor interval=20s
sudo pcs resource master nfs_data_master drbd_nfs_data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

sudo pcs resource create fs_nfs_data Filesystem device="/dev/drbd0" directory="/mnt/nfs_share" fstype="ext4" --group nfs_group
sudo pcs resource create nfs-server systemd:nfs-server --group nfs_group
sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 --group nfs_group
sudo pcs cluster auth node1 node2 -u hacluster -p yourpassword
sudo pcs cluster setup --name mycluster node1 node2
sudo pcs cluster start --all

sudo pcs resource create drbd_nfs_data ocf:linbit:drbd drbd_resource=nfs_data op monitor interval=20s
sudo pcs resource master nfs_data_master drbd_nfs_data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

sudo pcs resource create fs_nfs_data Filesystem device="/dev/drbd0" directory="/mnt/nfs_share" fstype="ext4" --group nfs_group
sudo pcs resource create nfs-server systemd:nfs-server --group nfs_group
sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 --group nfs_group

Check Cluster Status: Verify that all resources are active and properly configured


Copied!sudo pcs status
sudo pcs status

With this setup, you now have a High Availability NFS storage system that automatically fails over between nodes if one becomes unavailable. Clients connect to the shared IP (192.168.1.100) and will not notice if a failover occurs.

Conclusion

Setting up a High Availability NFS solution with DRBD, Pacemaker, and Corosync ensures that data remains accessible, even in the event of a node failure. By following this guide, you’ve created a robust, fail-safe NFS setup that provides redundancy, reliability, and peace of mind for critical storage needs.