Achieving Planetary Scale Cloud Storage with NextCloud + Tardigrade.io
Tardigrade.io by Storj Labs is an emerging competitor to Amazon S3 for decentralized cloud storage powered by the STORJ token on the Ethereum blockchain. The company has a strong pedigree being headed up by Ben Golub, former CEO of Docker, Inc. One would think that object storage is a commodity service that isn’t exciting to disrupt from incumbents such as AWS. Curiously enough, object storage is actually one of the most compelling use cases for distributed ledger technology – given the massive, planetary scale of cloud-native applications that leverage object storage. As of Apr 2013, Amazon S3 stored over two trillion objects, up from 905 billion objects just one year prior in 2012. Assuming an average object size of 100KB and considering that several years have gone by since 2013, Amazon S3 now undoubtedly stores exabytes of data for its customers.
Why Decentralize Your Object Storage with Blockchain
Amazon S3 has an excellent track record of durability. It is virtually impossible to lose data on it based on its eleven 9s of durability, meaning that Amazon guarantees it will have fewer than 10 losses per trillion objects. High-profile downtimes such as the US-East-1 outage in Mar 2017 however, have called its availability into question. When web applications worldwide including Amazon’s own status page rely on Amazon S3 to function, the architects of the Internet who originally conceived the World Wide Web (WWW) as a decentralized, peer-to-peer network rightfully question whether a single company such as Amazon should become a “choke point” for the entire Internet.
How Tardigrade Cloud Storage Achieves Decentralization
Tardigrade.io, Storj Labs’ answer to Amazon S3, seeks to address the growing centralization of the Internet by architecting a service that it claims will be 20% faster than Amazon S3 at half the price. Where Amazon S3 charges $20/TB-mo and $9/TB of outbound data, Tardigrade.io will come in at $10/TB-mo and $4.50/TB of outbound data. It’s crucial to note that the “20% faster” applies only to downloads from the Tardigrade.io service. Because an object stored on Tardigrade.io must be broken into 30 pieces, encrypted with AES-256, and stored across decentralized storage nodes across the Tardigrade.io network – it’s reasonable to expect that PUTs will be much slower than with a centralized object storage service. Using a Reed-Solomon erasure code of k = 18, n = 36 however, Tardigrade.io can achieve seven 9s of durability (assuming 10% of node churn) with a 2X expansion factor (2MB to store 1MB of data), versus with 9X replication (9MB to store 1MB of data) which would result in just six 9s of durability.
Even if one-third of the nodes are lost (12 out of 36) with an erasure code of k = 18, n = 36, an 18 MB object can be rebuilt using the 24 remaining nodes with at most 30MB of bandwidth used – with each node storing a 1MB piece. Compared to the 9X replication scenario where if 3 nodes are lost, 54MB of bandwidth is used to rebuild the object. The caveat of course, is the additional CPU time needed to rebuild the object with Reed-Solomon encoding, versus the minimal processing needed for rebuilding an object with replication. However, “a reasonable erasure encoding library can generate encoded data at least 650 MB/sec”, says JT Olio, VP Engineering with Storj Labs.
What leveraging Reed-Solomon encoding ultimately means for Tardigrade Cloud Storage users is a high degree of durability, even though their data is dispersed across storage nodes around the world. The k value represents the number of nodes which must be healthy in order to rebuild an object. According to this press release from Storj Labs, the Tardigrade Cloud Storage Service has never lost a file since the beta network was launched in April 2019.
Storj Labs cautiously manages the Tardigrade.io network by gradually onboarding storage node operators. Storage nodes must meet a certain threshold of durability and availability (during an onboarding period) to show that they can reliably store user data, before they can receive the full amount of STORJ payouts. We are particularly excited the Tardigrade.io project because it allows the petabytes of unused storage capacity in datacenters across the globe to be monetized, as an environmentally friendly alternative to other activities such as Bitcoin mining.
The Storj coin project has a great chance of being more widely adopted than other blockchain-based storage tokens such as Sia (SIA) or Filecoin (FIL) because it leverages Reed-Solomon erasure coding rather than merely using replication.
Although the Tardigrade Cloud Storage network relies on storage node operators to provide a reliable storage medium & network connection, zero trust in Storj Labs nor the storage node operator, is required from a data privacy and security perspective. All data is encrypted with a user’s own passphrase on their local machine with AES-256-GCM before being sent to the Tardigrade.io network. This provides comparable security to uploading an object to a conventional object storage service with client-side encryption.
Integrating Tardigrade Cloud Storage with NextCloud
We received an invitation to test Tardigrade Cloud Storage during Beta 2 and put it to work as the primary storage for NextCloud using the S3 gateway. Since we haven’t mentioned it yet, Tardigrade.io has a native API called uplink as well as a S3 gateway that rides upon Minio, another project invested in by Docker alumni, former CEO Steve Singh.
Note that Tardigrade Cloud Storage is still in beta stage and should not be relied upon to store mission-critical data. Also, the API limits are set sufficiently low during beta that you will run into some rate limits using Tardigrade with NextCloud during the beta.
Using NextCloud’s support for S3 compatible primary storage is the best way to integrate Tardigrade.io with NextCloud presently, although the developers of NextCloud have joined Storj Labs’ partner program for independent software vendors to create a native integration with Tardigrade in the future. This will deliver higher performance, and circumvent the doubling of bandwidth cost that occurs when an app uses the S3 gateway instead of the uplink API.
Since the NextCloud community is particularly conscious about data privacy and security, we are pleased that leveraging Tardigrade Cloud Storage means that data is encrypted in-transit (to Tardigrade) and at-rest with AES-256. The data does not leave your NextCloud server without being encrypted with a passphrase that is only known to you. This is a big win for zero-knowledge, encrypted cloud storage.
Using an email invite, we created a Tardigrade.io account on the us-central-1 Satellite, which the name by which Tardigrade refers to its endpoints. Users must verify their account with a valid credit card, which will not be charged until the beta is over. The Storj native token is also accepted for payments. Currently, only the Stripe payment gateway on the us-central-1 Satellite is running – it is coming soon on the other official Tardigrade Satellites europe-west-1 and asia-east-1. In the future, third-parties will have the capability to operate their own Satellites using the open-source Tardigrade software – furthering decentralization. Aside from acting as an endpoint, a Satellite stores the metadata for your projects and buckets, but not the object data itself, which is distributed across the storage nodes.
During the beta, we received a limit of 25 GB of storage and 25 GB of monthly bandwidth – which is sufficient for testing purposes. The first 10,000 developers who pre-register for a Tardigrade.io account will receive 1 TB of free storage for 30 days at launch.
Quick Start for Setting Up Tardigrade as NextCloud Primary Storage.
Step 1 – Create a Tardigrade account on the us-central-1 Satellite using the invitation link you received by email. If you don’t have an invitation, request one here. Verify your credit card.
Step 2 – Create a new project. In this case we named our project “decentralizedstorage”. “Pioneer” beta accounts are limited to one project, but you can have numerous buckets under each project, which NextCloud can leverage in a multi-bucket configuration.
Step 3 – Create an API key for the project. This API key is used by the S3 gateway on your NextCloud server to authenticate to the Tardigrade satellite. Keep it safe for step 3.
Step 4 – Install the S3 gateway for Linux in the
/opt directory, or other directory of your choice.
curl -L https://github.com/storj/storj/releases/latest/download/gateway_linux_amd64.zip -O && unzip gateway_linux_amd64.zip
Step 5 – Configure the S3 gateway by selecting the Satellite and entering your Tardigrade API key.
Step 6 – Using a utility such as screen, run the S3 gateway daemon in the background. For real world usage, a systemd service should be set up so that the daemon starts after each restart.
When the Minio API key and secret are displayed in the terminal, record them for Step 7
Step 7 – Follow the usual steps to install NextCloud after adding the following configuration file to config/storage.config.php. We tested this configuration with NextCloud 17.0.2.
‘use_path_style’=> true is crucial for configuring Tardigrade with NextCloud, otherwise you will encounter an Internal Server Error when setting up with errors like “”Could not create object urn:oid” in data/nextcloud.log.
Also pay attention that
'use_ssl' => false because the Minio S3 gateway listens on localhost port 7777 so SSL is not necessary. These are the two main points that are different for setting up Tardigrade Cloud Storage versus Amazon S3, Wasabi, or other object stores with NextCloud.
$CONFIG = [
'objectstore_multibucket' => [
'class' => 'OC\\Files\\ObjectStore\\S3',
'arguments' => [
'num_buckets' => 64,
'bucket' => 'storj',
'autocreate' => true,
'key' => 'AAAAAAAAAAAAAAAAAAAAAAAA',
'secret' => ' AAAAAAAAAAAAAAAAAAAAAAAA',
'hostname' => '127.0.0.1',
'port' => 7777,
'use_ssl' => false,
'region' => '',
// required for some non Amazon S3 implementations
Step 8 – Be patient (and do not close the browser window or refresh) after you click finish on the NextCloud setup wizard while it creates the objects for the initial user account on Tardigrade Cloud Storage. It can take a few minutes due to the decentralized nature of Tardigrade storage. The performance should improve as Tardigrade ramps up its network as production launches.
This is how the objects stored in the S3 backend by NextCloud using Tardigrade looks like.
You can start uploading files into your personal storage vault with NextCloud – now decentralized with the power of Tardigrade Cloud Storage and the Storj coin.
Like with other object storage providers, all future NextCloud user accounts created on this instance will have their user data stored on Tardigrade Cloud Storage. Only in this case, the storage backing is decentralized and resilient, based on sharding the data to 30 storage nodes using Reed-Solomon erasure coding and Ethereum blockchain technology.