Scalable Backend Storage Options for NextCloud Deployments

Most small deployments of NextCloud begin with a single server with local storage, which might consist of a single physical disk, multiple physical disks formatted as an LVM volume group, a software or hardware RAID array. Assuming a minimal configuration, the most typical RAID configurations are RAID 5 where two-thirds of the raw storage capacity is usable (min. of 3 disks), or RAID 10, where half of the raw storage capacity is usable (min. of 4 disks). Whether the environment is an on-premises datacenter or a public cloud, one of the most important design decisions will be whether to use block storage (local, SAN, Amazon EBS) or network-attached storage (NFS server, Amazon EFS, Gluster cluster).

Block storage is a more performant solution, whereas network-attached storage is a more scalable solution for NextCloud. Both have their own advantages and trade-offs that are important to consider depending on the number of users, concurrent users, performance, and data capacity needed.

As a rule of thumb, if you do not plan to scale your NextCloud deployment beyond two nodes (one active, one passive), we recommend that you choose block storage located on a SAN. On the other hand, if you require a massively scalable NextCloud instance that can go up to thousands or tens of thousands of users, you should look at network-attached storage options such as NFS or Gluster.

Block Storage (SAN or Amazon EBS)

The reason for this is because a SAN or Amazon EBS volume can only be attached to one server at a time. This means that your two NextCloud servers will function in an active/passive configuration, where the data volume needs to be reattached to the passive server in a failover event. What you gain with block storage is high performance at a low price, with the storage being much less bottlenecked by network bandwidth and replication. Also, you do not need to handle file locking between NextCloud servers because only one application server can ever write to the data volume at any given time.

The major limitation of using SAN-based storage for NextCloud is scalability. With an active/passive configuration, no load balancing is possible between NextCloud servers. This means that your NextCloud deployment will be able to handle only a low number of concurrent users, say less than 100 users. You can only scale your application server up to a certain extent (in terms of memory and CPU) before you hit the scalability limitations of serving all of your users from a single node.

SAN implies that a disk resides on a Storage Array Network, but for the purposes of this article we simply classified it as block storage along with its counterpart in the public cloud, Amazon EBS (or similar services). This is because solutions such as VMware vSAN are commonly used to provide attachable disks to VMs in a vCenter cluster of ESXi hosts.

Network Attached Storage (NFS or Gluster servers)

If you switch over to network attached storage using an NFS or Gluster mount to store the NextCloud data directory for all of your users, you can theoretically scale your deployment up to an unlimited number of NextCloud application servers. A load balancer such as HAProxy or Nginx in front of n number of NextCloud servers (running a local web server and PHP-FPM) can distribute the requests according to an algorithm such as round robin, IP hashing, or least connections. For most deployments, least connections will be the most logical setting for the load balancer that is the simplest to implement. Session stickiness using a cookie-based method or IP hash isn’t necessary, as long as you have a shared PHP session store such as Redis that is shared between all of the NextCloud app nodes. If you don’t have a shared session store or session stickiness, your users will encounter CSRF token errors as their requests may be handled by different backend nodes upon each refresh.

NFS v3 and NFS v4 can automatically handle file locking when multiple write requests come in from separate NextCloud servers, so it is not necessary to handle transactional file locking at the application level.

The drawback of network-attached storage for NextCloud, where the blocks are abstracted away from the individual nodes and only the file system is exposed, is the relative complexity of implementing it compared to locally attached or SAN storage. Also, the performance of NFS or Gluster is variable based on the bandwidth of the local network interfaces between your NextCloud servers and the NFS server or Gluster mount point. In addition, the ability of the NFS or Gluster servers to handle a large number of concurrent requests varies based on their processing capacity.

Redundant and/or Highly Available NFS vs Gluster Cluster

Any storage administrator will tell you that making an NFS server highly available is a challenging endeavor. The most barebones solution would be tolerating having a single point of failure (the NFS server) but if some redundancy is needed, the cost and complexity significantly rises. You could use an open source solution such as the lsyncd live syncing daemon to mirror changes between a primary and secondary NFS server. As soon as inotify detects a new write to the primary server, rsync is triggered over an SSH connection between the primary and secondary server to resolve the deltas. However, a manual failover to the secondary NFS server (e.g. by modifying a DNS record) is needed in the event of an outage, and the consistency of the data written in the moments before an outage is not guaranteed. Also, the changes following the failover would need to be transferred back over to the primary server by an administrator to resume normal operations.

For the most demanding implementations, there are more sophisticated solutions for high availability NFS like DRBD + Heartbeat, or Corosync and Pacemaker. However, high availability NFS is often eschewed altogether for more modern solutions such as Gluster – a project supported by Red Hat.

A Gluster cluster consists of 2 or more storage servers. It supports different volume types similar to RAID levels. An export that is made available for Gluster’s use on a storage server is denoted as a “brick.” A Glusterfs volume resides on such Gluster bricks, and can be mounted using /etc/fstab on the storage clients.

List of Gluster Volume Types

Distributed Glusterfs volume – Each file is stored once on the Gluster cluster. Provides the most storage capacity, but no redundancy against data loss, or downtime of Gluster nodes.

Replicated Glusterfs volume – The simplest configuration required for any level of redundancy. The number of replicas is equal to the number of bricks. If there are 2 bricks, there are 2 replicas of each file.

Distributed replicated Glusterfs volume – Similar to a replicated Glusterfs volume, but with a custom number of replicas and number of bricks. For example, a 4×2 volume consists of 8 bricks with a replica count of 2. A 2×4 volume consists of 8 bricks with a replica count of 4. When scaling the storage, additional bricks must be added in multiples of the number of replicas.

Striped Glusterfs volume – Large files are divided into multiple chunks (stripes), and the stripes are distributed across bricks on multiple servers. Provides enhanced read performance for busy storage servers under concurrent use, but no redundancy.

Distributed striped Glusterfs volume – Similar to striped Glusterfs volume, but files may have a custom number of stripes. The chunks (stripes) for a given file are distributed across a number of bricks equal to the stripe count. Additional bricks must be added in multiples of the number of stripes.

If you are deploying NextCloud in a public cloud environment such as AWS or Oracle Cloud, using managed NFS solutions such as Amazon EFS or Oracle File Storage is a convenient option, where the responsibility of optimizing performance and fault-tolerance is delegated to the cloud provider. However, the cost is expensive and possibly prohibitive for larger deployments at US$0.30/GB per month.

This article provides an introductory look into choosing an appropriate backend for NextCloud primary storage, considering pertinent factors such as cost, complexity, reliability (durability and availability), and performance. Our team of architects have considerable experience in planning & deploying NextCloud for enterprise, production environments. They can help you make the best choice in designing a solution that is scalable for your future needs.