Migrate Large Volumes of Data with lsyncd in Real-Time

lsyncd, the Live Syncing Daemon, is a robust and time-tested tool for cloud migrations that combines the delta-syncing algorithm of rsync with inotify to monitor file system events. If you need to transfer data from any source that will be written to during the duration of the sync, lsyncd can monitor for new changes in real-time and replicate them over to the destination.

Example use cases include:

  • Migrating a web application with minimal to no downtime
  • Copying a file share (OwnCloud, NextCloud, etc.) from one server to another
  • Keeping multiple, load-balanced web servers in sync

Contact the team for a consultation whether lsyncd or other tools can help with your cloud migration or high availability project. lsyncd can comfortably support terabytes of data or more.

Normally with rsync, a file list of changed files is generated when the rsync process is initiated. If a file is changed at the source before rsync begins copying that file, the new version of the file will be copied. If a file is added at the source, it will not be picked up until the next time rsync is executed. There is also the small possibility that a file is being changed while rsync is copying it. If so, the checksum will fail and rsync will retry the transfer once before skipping the file with an error.

You could schedule rsync to run as a cron job, for example, to take a regular backup of a directory to a remote destination. However, this is not a good way to back up directories that require a consistent state (a snapshot of a particular moment in time) such as MySQL data directory. It would be much better to flush the database with a read lock, use mysqldump to generate a .sql backup with --single-transaction, then rsync the .sql file to the backup destination.

If the source file system you are trying to transfer is not formatted with an advanced filesystem such as ZFS or BTRFS that supports snapshot, send, and receive, then using lsyncd to monitor the file system through inotify, and trigger an rsync whenever any changes are detected is a solid alternative.

The lsyncd daemon only needs to be installed at the source, not the destination. All the destination requires is a Linux user account that can accept SSH connections from rsync. Because lsyncd relies on rsync over SSH, the data is automatically encrypted in-transit using the server and the client’s public keys — no different than any other SSH session.

lsyncd can typically be installed from your package manager’s repositories, but the latest version can be compiled and installed from GitHub. An older version of lsyncd, 2.1.5, contains a bug that results in this error when syncing directories: Error: in Lua: default-rsyncssh.lua:92: attempt to call local 'path2' (a string value)

The way to work around it is by downloading the source of lsyncd 2.1.5 with apt-get source lsyncd, adding the missing comma at the end of line 92 in lsyncd-2.1.5/default-rsyncssh.lua with a text editor, compiling it with make, and installing it to /usr/local/bin with make install.

Once lsyncd is properly installed, create a configuration at /etc/lsyncd/lsyncd.conf.lua to specify the source directories to be monitored with inotify, and synced with the destination. Create the log directories for lsyncd with sudo mkdir -p /var/log/lsyncd/.

settings {
   logfile = "/var/log/lsyncd/lsyncd.log",
   statusFile = "/var/log/lsyncd/lsyncd-status.log",
   statusInterval = 0
}

sync {
   default.rsyncssh,
   source="/var/www/html/",
   host="www-data@192.0.2.102",
   targetdir="/var/www/html/",
   rsync = {
      compress = false,
      archive = true,
      verbose = true,
      rsh = "/usr/bin/ssh -l www-data -i /root/.ssh/id_rsa -p 22 -o StrictHostKeyChecking=no"
   }
}

Here is a brief explanation of the options specified above:

  • statusInterval = 0 means that lsyncd will wait 0 seconds before beginning rsync after a change is detected by inotify.
  • source refers to the local directory to be synced FROM.
  • host is the SSH username and IP address of the remote.
  • destination refers to the remote directory to be synced TO.
  • compress = false significantly lowers CPU usage but increases bandwidth usage.
  • archive = true is the equivalent of passing the -a flag to rsync. It preserves permissions, user, and group (as long as the same UID and GID exists at the remote).
  • verbose = true is the equivalent of passing the -v flag to rsync.
  • rsh = "/usr/bin/ssh -l www-data -i /root/.ssh/id_rsa -p 22 -o StrictHostKeyChecking=no" specifies the SSH user and port at the remote site, and which SSH private key (at the source) to use to connect. SSH private keys should be set to a permission of 600.

Normally the web server user apache, nginx, or www-data at the destination has a /sbin/nologin shell specified in /etc/passwd so it is not possible to connect to it over SSH. To override this, you need to change the shell to /bin/sh or /bin/bash and create the .ssh/authorized_keys file in the home directory for this user. For security reasons, be sure that the .ssh folder in the home directory such as /var/www/ is not accessible to the public through the web server.

Next, you will want to create a systemd service for rsyncd so that you can start it in the background, and on bootup (if desired). This is an example systemd service file you can place in /etc/init.d/rsyncd.

Then you can start, stop, and enable lsyncd using the standard service commands, sudo service lsyncd start, sudo service lsyncd stop, and sudo service lsyncd enable.

#! /bin/sh
### BEGIN INIT INFO
# Provides:          lsyncd
# Required-Start:    $remote_fs
# Required-Stop:     $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: lsyncd daemon init script
# Description:       This script launches the lsyncd daemon.
### END INIT INFO

PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin
DESC="synchronization daemon"
NAME=lsyncd
DAEMON=/usr/local/bin/$NAME
CONFIG=/etc/lsyncd/lsyncd.conf.lua
PIDFILE=/var/run/$NAME.pid
DAEMON_ARGS="-pidfile ${PIDFILE} ${CONFIG}"
SCRIPTNAME=/etc/init.d/$NAME
NICELEVEL=10

# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

# Exit if config file does not exist
[ -r "$CONFIG" ] || exit 0

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

#
# Function that starts the daemon/service
#
do_start()
{
        start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON \
        --test > /dev/null \
                || return 1
        start-stop-daemon --start --quiet --pidfile $PIDFILE \
        --nicelevel $NICELEVEL --exec $DAEMON -- \
                $DAEMON_ARGS \
                || return 2
}

#
# Function that stops the daemon/service
#
do_stop()
{
        start-stop-daemon --stop --quiet --pidfile $PIDFILE --name $NAME
        RETVAL="$?"
        [ "$RETVAL" = 2 ] && return 2
        start-stop-daemon --stop --quiet --oknodo --exec $DAEMON
        [ "$?" = 2 ] && return 2
        # Many daemons don't delete their pidfiles when they exit.
        rm -f $PIDFILE
        return "$RETVAL"
}

#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
        start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
        return 0
}

case "$1" in
  start)
        log_daemon_msg "Starting $DESC" "$NAME"
        do_start
        case "$?" in
                0|1) log_end_msg 0 ;;
                2) log_end_msg 1 ;;
        esac
        ;;
  stop)
        log_daemon_msg "Stopping $DESC" "$NAME"
        do_stop
        case "$?" in
                0|1) log_end_msg 0 ;;
                2) log_end_msg 1 ;;
        esac
        ;;
  status)
        status_of_proc $DAEMON $NAME && exit 0 || exit $?
        ;;
  restart|force-reload)
        log_daemon_msg "Restarting $DESC" "$NAME"
        do_stop
        case "$?" in
          0|1)
                do_start
                case "$?" in
                        0) log_end_msg 0 ;;
                        1) log_end_msg 1 ;; # Old process is still running
                        *) log_end_msg 1 ;; # Failed to start
                esac
                ;;
          *)
                # Failed to stop
                log_end_msg 1
                ;;
        esac
        ;;
  *)
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 3
        ;;
esac

:

To monitor the progress of lsyncd, the watch command can be used in conjunction with tail to print the contents of the log files to the console.

watch -n 1 tail /var/log/lsyncd/lsyncd.log
watch -n 1 tail /var/log/lsyncd/lsyncd-status.log