Create a highly available NFS service with Oracle Linux 7

section 0Before You Begin

This lab will show you how to install and configure a highly available NFS service on Oracle Linux 7 using Corosync, Pacemaker, Gluster and Ganesha.

Background

In this lab we will create a NFS service hosted by three VMs: master1, master2 and master3. Each of these VMs will replicate a Gluster volume for data redundancy and use clustering tools for service redundancy.

A fourth VM named client1 will mount this NFS service for demonstration and testing.

Components

  • Corosync provides clustering infrastructure to manage which nodes are involved, their communication and quorum
  • Pacemaker manages cluster resources and rules of their behavior
  • Gluster is a scalable and distributed filesystem
  • Ganesha is an NFS server which can use many different backing filesystem types including Gluster

What Do You Need?

If you're attending the Hands-On Lab at Oracle OpenWorld 2019, your laptop already has the required software installed.

If you're following this from home then you will first need to install:

Environment

Description of lab-network.svg follows
Diagram of our lab [lab-network.svg]
  1. This table describes network addressing of our lab environment.
    Hostname IP Address Fully Qualified Hostname
    master1 192.168.99.101 master1.vagrant.vm
    master2 192.168.99.102 master2.vagrant.vm
    master3 192.168.99.103 master3.vagrant.vm
    client1 192.168.99.104 client1.vagrant.vm
    nfs 192.168.99.100 nfs.vagrant.vm

This lab involves multiple VMs and you will need to perform different steps on different VMs. The easiest way to do this is through the vagrant command

vagrant ssh <hostname>

Once connected you can return to your desktop with a standard logout or exit command.

For example

  1. [user@demo lab]$ vagrant ssh master1
    Last login: Tue Aug 20 23:56:58 2019 from 10.0.2.2
    [vagrant@master1 ~]$ hostname
    master1.vagrant.vm
    [vagrant@master1 ~]$ exit
    logout
    Connection to 127.0.0.1 closed.
    [user@demo lab]$ vagrant ssh master3
    Last login: Tue Aug 20 05:25:49 2019 from 10.0.2.2
    [vagrant@master3 ~]$ hostname
    master3.vagrant.vm
    [vagrant@master3 ~]$ logout
    Connection to 127.0.0.1 closed.

All steps are executed as the user root so you will need to change to root user using sudo -i

  1. [user@demo lab]$ vagrant ssh master1
    Last login: Tue Aug 20 23:56:58 2019 from 10.0.2.2
    [vagrant@master1 ~]$ sudo -i
    [root@master1 ~]# exit
    logout
    [vagrant@master1 ~]$ exit
    logout
    Connection to 127.0.0.1 closed.
    

To avoid continually logging in and out you may want to open four terminal windows, one for each of master1, master2, master3 and client1 to easily switch between VMs. If you are doing this, ensure that all your terminals are in the in the same directory so the vagrant ssh commands succeed.

A step may say "On all masters..." and should be performed on master1, master2 and master3. This is done to remove redundancy as the action and result are identical.


Start the Lab Environment

You will first download and start the VMs we will be using in this lab environment. This is simplified through the use of Vagrant.

If you're at Oracle OpenWorld 2019 you can skip this step and move onto Install Software.

  1. Download the lab configuration
    # git clone https://github.com/oracle/linux-labs.git
  2. Change to the lab directory
    # cd linux-labs/HA-NFS/
  3. Start the lab virtual machines
    # vagrant up

Vagrant will download an Oracle Linux image and create four virtual machines with unique networking configuration. A fifth IP address is used for our shared NFS service.

Remember you can access them using the vagrant ssh hostname command mentioned above and you do not need to connect via address.


Install software

Now enable the required Oracle Linux repositories before installing the Corosync, Ganesha, Gluster and Pacemaker software.

Remember to use vagrant ssh hostname to login to each of the masters and sudo -i to elevate your privileges to root before running these commands.

  1. (On all masters) Install the Gluster yum repository configuration
    # yum install -y oracle-gluster-release-el7
  2. (On all masters) Enable the repositories
    # yum-config-manager --enable ol7_addons ol7_latest ol7_optional_latest ol7_UEKR5
  3. (On all masters) Install the software
    # yum install -y corosync glusterfs-server nfs-ganesha-gluster pacemaker pcs 

Create the Gluster volume

You will prepare each VMs disk, create a replicated Gluster volume and activate the volume

  1. (On all masters) Create an XFS filesystem on /dev/sdb with a label of gluster-000
    # mkfs.xfs -f -i size=512 -L gluster-000 /dev/sdb
  2. (On all masters) Create a mountpoint, add an fstab(5) entry for a disk with the label gluster-000 and mount the filesystem
    # mkdir -p /data/glusterfs/sharedvol/mybrick
    # echo 'LABEL=gluster-000 /data/glusterfs/sharedvol/mybrick xfs defaults  0 0' >> /etc/fstab
    # mount /data/glusterfs/sharedvol/mybrick
  3. (On all masters) Enable and start the Gluster service
    # systemctl enable --now glusterd
  4. On master1: Create the Gluster environment by adding peers
    # gluster peer probe master2.vagrant.vm
    peer probe: success.
    # gluster peer probe master3.vagrant.vm
    peer probe: success.
    # gluster peer status
    Number of Peers: 2
    
    Hostname: master2.vagrant.vm
    Uuid: 328b1652-c69a-46ee-b4e6-4290aef11043
    State: Peer in Cluster (Connected)
    
    Hostname: master3.vagrant.vm
    Uuid: 384aa447-4e66-480a-9293-8c10218395a4
    State: Peer in Cluster (Connected)
    
  5. Show that our peers have joined the environment

    On master2:

    # gluster peer status
    Number of Peers: 2
    
    Hostname: master3.vagrant.vm
    Uuid: 384aa447-4e66-480a-9293-8c10218395a4
    State: Peer in Cluster (Connected)
    
    Hostname: master1.vagrant.vm
    Uuid: ac64c0e3-02f6-4814-83ca-1983999c2bdc
    State: Peer in Cluster (Connected)
    

    On master3:

    # gluster peer status
    Number of Peers: 2
    
    Hostname: master1.vagrant.vm
    Uuid: ac64c0e3-02f6-4814-83ca-1983999c2bdc
    State: Peer in Cluster (Connected)
    
    Hostname: master2.vagrant.vm
    Uuid: 328b1652-c69a-46ee-b4e6-4290aef11043
    State: Peer in Cluster (Connected)
  6. On master1: Create a Gluster volume named sharedvol which is replicated across our three hosts: master1, master2 and master3.
    # gluster volume create sharedvol replica 3 master{1,2,3}:/data/glusterfs/sharedvol/mybrick/brick
    For more details on volume types see the Gluster: Setting up Volumes link in the additional information section of this page
  7. On master1: Enable our Gluster volume named sharedvol
    # gluster volume start sharedvol

Our replicated Gluster volume is now available and can be verified from any master

  1. # gluster volume info
    Volume Name: sharedvol
    Type: Replicate
    Volume ID: 466a6c8e-7764-4c0f-bfe6-591cc6a570e8
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 1 x 3 = 3
    Transport-type: tcp
    Bricks:
    Brick1: master1:/data/glusterfs/sharedvol/mybrick/brick
    Brick2: master2:/data/glusterfs/sharedvol/mybrick/brick
    Brick3: master3:/data/glusterfs/sharedvol/mybrick/brick
    Options Reconfigured:
    transport.address-family: inet
    nfs.disable: on
    performance.client-io-threads: off
    
    # gluster volume status
    Status of volume: sharedvol
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick master1:/data/glusterfs/sharedvol/myb
    rick/brick                                  49152     0          Y       7098
    Brick master2:/data/glusterfs/sharedvol/myb
    rick/brick                                  49152     0          Y       6860
    Brick master3:/data/glusterfs/sharedvol/myb
    rick/brick                                  49152     0          Y       6440
    Self-heal Daemon on localhost               N/A       N/A        Y       7448
    Self-heal Daemon on master3.vagrant.vm      N/A       N/A        Y       16839
    Self-heal Daemon on master2.vagrant.vm      N/A       N/A        Y       7137
    
    Task Status of Volume sharedvol
    ------------------------------------------------------------------------------
    There are no active volume tasks

Configure Ganesha

Ganesha is the NFS server that shares out the Gluster volume. In this example we allow any NFS client to connect to our NFS share with read/write permissions.

  1. (On all masters) Populate the file /etc/ganesha/ganesha.conf with our configuration
    # cat << EOF > /etc/ganesha/ganesha.conf
    EXPORT{
        Export_Id = 1 ;       # Unique identifier for each EXPORT (share)
        Path = "/sharedvol";  # Export path of our NFS share
    
        FSAL {
            name = GLUSTER;          # Backing type is Gluster
            hostname = "localhost";  # Hostname of Gluster server
            volume = "sharedvol";    # The name of our Gluster volume
        }
    
        Access_type = RW;          # Export access permissions
        Squash = No_root_squash;   # Control NFS root squashing
        Disable_ACL = FALSE;       # Enable NFSv4 ACLs
        Pseudo = "/sharedvol";     # NFSv4 pseudo path for our NFS share
        Protocols = "3","4" ;      # NFS protocols supported
        Transports = "UDP","TCP" ; # Transport protocols supported
        SecType = "sys";           # NFS Security flavors supported
    }EOF

For more options to control permissions see the EXPORT {CLIENT{}} section of config_samples-export in the Additional Information section of this page


Create a Cluster

You will create and start a Pacemaker/Corosync cluster made of our three master nodes

  1. (On all masters) Set a shared password of your choice for the user hacluster
    # passwd hacluster
    Changing password for user hacluster.
    New password: examplepassword
    Retype new password: examplepassword
  2. (On all masters) Enable the Corosync and Pacemaker services. Enable and start the configuration system service
    # systemctl enable corosync
    # systemctl enable pacemaker
    # systemctl enable --now pcsd

    Note we did not start the Corosync and Pacemaker services, this happens later in step five

  3. On master1: Authenticate with all cluster nodes using the hacluster user and password defined above
    # pcs cluster auth master1 master2 master3 -u hacluster -p examplepassword
    master1: Authorized
    master3: Authorized
    master2: Authorized
  4. On master1: Create a cluster named HA-NFS
    # pcs cluster setup --name HA-NFS master1 master2 master3
    Destroying cluster on nodes: master1, master2, master3...
    master1: Stopping Cluster (pacemaker)...
    master2: Stopping Cluster (pacemaker)...
    master3: Stopping Cluster (pacemaker)...
    master2: Successfully destroyed cluster
    master1: Successfully destroyed cluster
    master3: Successfully destroyed cluster
    
    Sending 'pacemaker_remote authkey' to 'master1', 'master2', 'master3'
    master1: successful distribution of the file 'pacemaker_remote authkey'
    master2: successful distribution of the file 'pacemaker_remote authkey'
    master3: successful distribution of the file 'pacemaker_remote authkey'
    Sending cluster config files to the nodes...
    master1: Succeeded
    master2: Succeeded
    master3: Succeeded
    
    Synchronizing pcsd certificates on nodes master1, master2, master3...
    master1: Success
    master3: Success
    master2: Success
    Restarting pcsd on the nodes in order to reload the certificates...
    master1: Success
    master3: Success
    master2: Success
  5. On master1: Start the cluster on all nodes
    # pcs cluster start --all
    master1: Starting Cluster (corosync)...
    master2: Starting Cluster (corosync)...
    master3: Starting Cluster (corosync)...
    master1: Starting Cluster (pacemaker)...
    master3: Starting Cluster (pacemaker)...
    master2: Starting Cluster (pacemaker)...
  6. On master1: Enable the cluster to run on all nodes at boot time
    # pcs cluster enable --all
    master1: Cluster Enabled
    master2: Cluster Enabled
    master3: Cluster Enabled
  7. On master1: Disable STONITH
    # pcs property set stonith-enabled=false

Our cluster is now running

On any master:

  1. # pcs cluster status
    Cluster Status:
     Stack: corosync
     Current DC: master2 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
     Last updated: Wed Aug 21 03:37:15 2019
     Last change: Wed Aug 21 03:34:48 2019 by root via cibadmin on master1
     3 nodes configured
     0 resources configured
    
    PCSD Status:
      master3: Online
      master2: Online
      master1: Online

Create Cluster services

You will create a Pacemaker resource group that contains the resources necessary to host NFS services from the hostname nfs.vagrant.vm (192.168.99.100)

  1. On master1: Create a systemd based cluster resource to ensure nfs-ganesha is running
    # pcs resource create nfs_server systemd:nfs-ganesha op monitor interval=10s
  2. On master1: Create a IP cluster resource used to present the NFS server
    # pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=192.168.99.100 cidr_netmask=24 op monitor interval=10s
  3. On master1: Join the Ganesha service and IP resource in a group to ensure they remain together on the same host
    # pcs resource group add nfs_group nfs_server nfs_ip

Our service is now running

  1. # pcs status
    Cluster name: HA-NFS
    Stack: corosync
    Current DC: master2 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
    Last updated: Wed Aug 21 03:45:46 2019
    Last change: Wed Aug 21 03:45:40 2019 by root via cibadmin on master1
    
    3 nodes configured
    2 resources configured
    
    Online: [ master1 master2 master3 ]
    
    Full list of resources:
    
     Resource Group: nfs_group
         nfs_server	(systemd:nfs-ganesha):	Started master1
         nfs_ip	(ocf::heartbeat:IPaddr2):	Started master1
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

Test NFS availability using a client

You may want to have two terminal windows open for these steps as we test failover with master1 and client1

  1. On client1: Mount the NFS service provided by our cluster and create a file
    # yum install -y nfs-utils
    # mkdir /sharedvol
    # mount -t nfs nfs.vagrant.vm:/sharedvol /sharedvol
    # df -h /sharedvol/
    Filesystem                 Size  Used Avail Use% Mounted on
    nfs.vagrant.vm:/sharedvol   16G  192M   16G   2% /sharedvol
    # echo "Hello from OpenWorld" > /sharedvol/hello
  2. On master1: Identify the host running the nfs_group resources and put it in standby mode to stop running services
    # pcs status
    Cluster name: HA-NFS
    Stack: corosync
    Current DC: master2 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
    Last updated: Wed Aug 21 03:45:46 2019
    Last change: Wed Aug 21 03:45:40 2019 by root via cibadmin on master1
    
    3 nodes configured
    2 resources configured
    
    Online: [ master1 master2 master3 ]
    
    Full list of resources:
    
     Resource Group: nfs_group
         nfs_server	(systemd:nfs-ganesha):	Started master1
         nfs_ip	(ocf::heartbeat:IPaddr2):	Started master1
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    # pcs node standby master1
  3. On master1: Verify that the nfs_group resources have moved to another node
    # pcs status
    Cluster name: HA-NFS
    Stack: corosync
    Current DC: master2 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
    Last updated: Wed Aug 21 04:02:46 2019
    Last change: Wed Aug 21 04:01:51 2019 by root via cibadmin on master1
    
    3 nodes configured
    2 resources configured
    
    Node master1: standby
    Online: [ master2 master3 ]
    
    Full list of resources:
    
     Resource Group: nfs_group
         nfs_server	(systemd:nfs-ganesha):	Started master2
         nfs_ip	(ocf::heartbeat:IPaddr2):	Started master2
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
  4. On client1: Verify your file is still accessible

    This will have a short delay as the service moves from one master to another

    # ls -la /sharedvol/
    total 9
    drwxr-xr-x.  3 root root 4096 Aug 21 03:59 .
    dr-xr-xr-x. 20 root root 4096 Aug 21 02:57 ..
    -rw-r--r--.  1 root root   21 Aug 21 03:59 hello
    # cat /sharedvol/hello
    Hello from OpenWorld
    
  5. On master1: Bring our standby node back into the cluster
    # pcs node unstandby master1

You now have an understanding of how you could use Pacemaker/Corosync to create highly available services backed by Gluster.


Enable Gluster encryption (Optional)

You will create a self signed certificate for each master and have it be trusted by its peers.

For more options see Setting up Transport Layer Security in the Additional Information section of this page

  1. (On all masters) Create a private key and then create a certificate for this host signed with this key
    # openssl genrsa -out /etc/ssl/glusterfs.key 2048
    # openssl req -new -x509 -days 365 -key /etc/ssl/glusterfs.key \
                                       -out /etc/ssl/glusterfs.pem \
                                       -subj "/CN=${HOSTNAME}/"
  2. (On all masters) Combine the certificate from each node into one file all masters can trust
    # cat /etc/ssl/glusterfs.pem >> /vagrant/combined.ca.pem
  3. (On all masters) Copy the combined list of trusted certificates to the local system for Gluster use
    # cp /vagrant/combined.ca.pem /etc/ssl/glusterfs.ca
  4. (On all masters) Enable encryption for Gluster management traffic
    # touch /var/lib/glusterd/secure-access
  5. On master1: Enable encryption on the Gluster volume sharedvol
    # gluster volume set sharedvol client.ssl on
    # gluster volume set sharedvol server.ssl on
    
  6. (On all masters) Restart the Gluster service
    # systemctl restart glusterd

Our Gluster volume now has transport encryption enabled

  1. # gluster volume info
    Volume Name: sharedvol
    Type: Replicate
    Volume ID: 970effb5-5d9a-4ece-9188-7f0525010acf
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 1 x 3 = 3
    Transport-type: tcp
    Bricks:
    Brick1: master1:/data/glusterfs/sharedvol/mybrick/brick
    Brick2: master2:/data/glusterfs/sharedvol/mybrick/brick
    Brick3: master3:/data/glusterfs/sharedvol/mybrick/brick
    Options Reconfigured:
    server.ssl: on
    client.ssl: on
    transport.address-family: inet
    nfs.disable: on
    performance.client-io-threads: off
    

more informationWant to Learn More?