GlusterFS for Persistent Docker Volumes

GlusterFS is a distributed file system that can be used to span and replicate data volumes across multiple Gluster hosts over a network. Similar to an NFS file server, Gluster volumes can be mounted simultaneously by multiple Gluster clients. With NFS however, it’s far more difficult to achieve horizontal scalability, compared to using a Gluster cluster.

Gluster provides built-in support for replication and high availability. For a Gluster volume used in production, it is recommended to replicate the data across at least 3 nodes (triple replicated). Using a minimum of 3 replicas prevents data corruption from “split brain” as 2 replicas are required to be online for the Gluster volume to be accessible.

Each replica of a Gluster volume, stored on a different Gluster host from one another, is known as a “brick”. Triple replication requires 3 times the disk space (raw) of the Gluster volume size (usable). A 100 GB replicated volume requires 300 GB of raw disk space (100GB X 3 bricks on 3 nodes).

A Gluster volume brick can be located on a local hard or solid-state disk, or for a cloud-based deployment, on an attached block volume to the Gluster host.

Another option is to use 2 storage bricks and 1 arbiter brick (stores metadata only) instead of 3 replicas. In this mode, just over 2 times the raw disk space is required (100GB X 2 bricks on 2 nodes + metadata on 1 node).

A Gluster volume using an arbiter node provides a similar level of data consistency as a triple-replicated volume, but may not necessarily be available unless all 3 bricks are online. For a Gluster volume that requires high availability, such as one used to host persistent data for Docker, using replica 3 is generally preferred to an arbiter node.

The GlusterFS plugin for Docker is a managed plugin developed for Docker so that containers can mount sub-directories on a Gluster volume as Docker volumes. The plugin is also compatible with Docker Swarm, where it is particularly useful. Compared to the REX-Ray plugin that relies on the block storage APIs of major cloud providers, the GlusterFS plugin has many advantages.

Docker volumes using the GlusterFS volume driver

  • Can be mounted simultaneously on more than one Swarm node at a time.
  • Move seamlessly between Swarm nodes when a container is rescheduled.
  • Attach in a few seconds versus 30 seconds or more with REX-Ray.
  • Are not obsoleted by changes to cloud provider APIs.

If you are willing to manage and maintain a Gluster cluster, using the GlusterFS plugin can be a vastly superior solution to REX-Ray for persistent storage with Docker in production. The Gluster hosts can reside on the same servers as your Docker Swarm nodes (if you have 3 or more Swarm managers), or on separate servers accessible from the same network. This article outlines a setup which can be used in production to handle persistent data in Docker Swarm, without the cost or complexity of solutions like Portworx or StorageOS.

Most of the storage plugins that follow the CSI (Container Storage Interface) standard are written for Kubernetes first, with theoretical compatibility with other container orchestrators such as Mesos or Cloud Foundry. Although Mirantis is sponsoring the lead developer of SwarmKit to make Swarm compatible with CSI plugins, there’s no ETA on when the work will be completed.

Using a Gluster volume as a storage backing for Docker persistent volumes seems to be the most future-proof solution for Docker Swarm users at the moment.

How to set up 3 Gluster nodes with Docker Swarm

Create 3 VMs with internal networking in the same region and datacenter as your Docker Swarm nodes. In this example, we install the Gluster packages from the apt repositories of Ubuntu 18.04, which at the time of this writing is Gluster 3.13.2.

Set up firewall rules that permit internal communication between the Gluster nodes (peers) and Swarm nodes (clients) on TCP and UDP port 111 (portmapper), and TCP ports 24007 (Gluster Daemon), 24008 (Management). Additionally, one TCP port per brick starting from 49152 should be opened to the peers and clients.

Assume for instance, you have the following servers:

  • 10.0.1.101 gluster1 – Gluster host (brick 1)
  • 10.0.1.102 gluster2 – Gluster host (brick 2)
  • 10.0.1.103 gluster3 – Gluster host (brick 3)
  • 10.0.1.104 docker1 – Swarm node 1
  • 10.0.1.105 docker2 – Swarm node 2
  • 10.0.1.106 docker3 – Swarm node 3

Add the above hostnames and IPs to /etc/hosts for each server.

If your internal network is a secure environment, you might choose to simply open all TCP and UDP ports between the Gluster nodes and Swarm nodes. Like any other file server mount target, properly configuring the firewall rules is crucial to avoid unauthorized access to your data.

# ufw allow from 10.0.1.101/32 && \
ufw allow from 10.0.1.102/32 && \
ufw allow from 10.0.1.103/32 && \
ufw allow from 10.0.1.104/32 && \
ufw allow from 10.0.1.105/32 && \
ufw allow from 10.0.1.106/32

# ufw allow from 10.0.1.101/32 proto udp && \
ufw allow from 10.0.1.102/32 proto udp && \
ufw allow from 10.0.1.103/32 proto udp && \
ufw allow from 10.0.1.104/32 proto udp && \
ufw allow from 10.0.1.105/32 proto udp && \
ufw allow from 10.0.1.106/32 proto udp

Attach a block storage volume to each Gluster host with the desired size of each Gluster volume brick, for example 100GB. Create the directory /mnt/gluster1/brick on gluster1, /mnt/gluster2/brick on gluster2, and /mnt/gluster3/brick on gluster3. Format the volumes as xfs, mount them to their respective directories on each Gluster host, and add an entry to /etc/fstab.

On all Gluster hosts, install the following packages using apt.

$ sudo apt install xfsprogs attr glusterfs-server glusterfs-client glusterfs-common -y

On all Gluster hosts, enable and start the glusterd service.

$ sudo systemctl enable glusterd
$ sudo systemctl start glusterd

On all Gluster hosts, discover the other Gluster nodes that will form part of your cluster. Then, confirm the other peers are known to each node on the cluster.

$ gluster peer probe gluster1
$ gluster peer probe gluster2
$ gluster peer probe gluster3

$ gluster pool list

Now from gluster1, create and start the distributed GlusterFS volume gfs. You can view the information and status of the volume to confirm it has started.

$ gluster volume create gfs \
gluster1:/mnt/gluster1/brick \
gluster2:/mnt/gluster2/brick \
gluster3:/mnt/gluster3/brick

$ gluster volume start gfs

$ gluster volume info gfs

$ gluster volume status

On all Gluster hosts, mount the gfs volume locally.

Create the directory /mnt/gluster and use it as the mount point.

df -h should show that the volume is mounted with the usable size.

$ mkdir -p /mnt/gluster
$ sudo echo 'localhost:/gfs /mnt/gluster glusterfs defaults,_netdev,backupvolfile-server=gluster1 0 0' >> /etc/fstab
$ sudo mount -a

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
...
localhost:/gfs  100G   21G   81G  21% /mnt/gluster

Install and setup the GlusterFS plugin for Docker

The GlusterFS plugin for Docker must be installed across all Swarm nodes, one by one. The following command aliases the glusterfs-volume-plugin to glusterfs, so it can be invoked as such from the Docker command line.

$ docker plugin install --alias glusterfs trajano/glusterfs-volume-plugin --grant-all-permissions --disable

Set the hostnames of the GlusterFS nodes for the plugin.

$ docker plugin set glusterfs SERVERS=gluster1,gluster2,gluster3

Be sure the entries for gluster1, gluster2, and gluster3 have been set up in /etc/hosts. Otherwise you will encounter the “Transport endpoint is not connected” error when trying to create and mount any Docker volumes.

Then, enable the plugin.

$ docker plugin enable glusterfs

From any Swarm node, deploy a stack consisting of 1 simple service with a persistent volume managed by the GlusterFS storage driver.  This portion is adapted from Ruan Becker’s blog.

Before deploying the stack, create the vol1 subdirectory at the root of the Gluster volume. From a Gluster host (with the gfs volume mounted locally), execute the command $ mkdir -p /mnt/gluster/vol1.

Go back to one of the Swarm nodes and create this stack file.

docker-compose.yml

version: "3.4"

services:
  foo:
    image: alpine
    command: ping localhost
    networks:
      - net
    volumes:
      - vol1:/tmp

networks:
  net:
    driver: overlay

volumes:
  vol1:
    driver: glusterfs
    name: "gfs/vol1"

$ docker stack deploy -c docker-compose.yml test

Once the service is deployed successfully, with the $ docker service ps test_foo --no-trunc command, you can see which of the Swarm nodes the test_foo service is running on.

Switch to the terminal for that Swarm node and find the container ID of the test_foo service with $ docker ps.

Exec into the container’s /bin/sh shell with $ docker exec -it <container_id> /bin/sh. You can benchmark the write speed to the GlusterFS volume mounted inside the container at /tmp with the following simple script.

The dd command writes 10 gigabytes of test data to the Docker volume backed by GlusterFS and the time command outputs the total time elapsed.

In our test with Gluster volume replicated across 3 bricks with DigitalOcean’s internal network and Block Volumes, writing a 10GB file (of zeroes) to the Gluster volume from a Docker container took 154.4 seconds, working out to 64.9MB/s.

$ touch test.sh
$ echo 'dd if=/dev/zero of=/tmp/test.bin bs=1024k count=10000' > test.sh
$ chmod +x test.sh

$ time sh test.sh
10000+0 records in
10000+0 records out
real    2m 34.40s
user    0m 0.05s
sys     0m 12.37s