Any plans to support snapshots on the modern Filesystems (ZFS et. al)?


#1

With filesystems like BTRFS, ZFS and APFS natively supporting snapshots, I’m curious to know if there are any plans to support them in Duplicati.


#2

I’m not aware of any near-term plans to support snapshot functionality beyond stuff like Windows Volume Shadow Copy (VSS) features allowing access to in-use files.

I’m not well versed on snapshots but my understanding is that they are available on COW (copy on write) file systems and basically make a copy of the file metadata (kind of like a FAT table) for files at a particular time. They are quite fast due to only the metadata needing to be copied since the actual file contents are already on disk as part of the COW.

Assuming I’m got that mostly right, how would you picture that working - just backing up the meta data of a snapshot? Or maybe being able to back up the files of a particular snapshot?


#3

AFAIK, creating snapshots in CoW filesystems presents you with a read-only “snapshot” of the filesystem as it existed at the time of the snapshot’s creation, kinda like VSS in Windows. This includes the metadata as well as the raw data; which is to say that modifying any file does not affect the original file in the snapshot. This would allow much more consistent backups with duplicati (probably the same reason why VSS and LUKS snapshots were supported).

I believe that under the hood the fs simply saves a pointer to the current state, akin to a bookmark. This makes the operation quite cheap and consequently, very fast (almost instantaneous in my experience).

While it is certainly easy to create snapshots with a script called with --run-script-before, managing those snapshots (naming, deleting, etc.) could be done best by duplicati itself.


#4

I think I understand - if there were a way to tell Duplicati to back up a particular, the most recent), or a custom created-then-deleted snapshot it would remove some of the concurrency and living issues in the sane way that VSS does.

Assuming that’s similar to what you’re asking for, I have a few thoughts that may or may not be valid:

  1. I assume there’s a way to flag that a snapshot is “in use” so it it doesn’t get deleted during a backup

  2. It seems to me the easiest way to go would be like with VSS - create a snapshot exclusively for Duplicati, do the backup, kill the snapshot

  3. If a backup takes a long time (such as across multiple days / reboots during initial run) the snapshot could get stale…but I suppose that’s not really an issue

  4. In the case of a single backup taking multiple runs rescanning files probably wouldn’t be necessary (thanks to the snapshot)

  5. There’s no need to include snapshot info in the backup since it might be transient (backup only) and isn’t useful during a restore

So I guess it should also be considered whether or not the benefits of backing up from a snapshot are worth the development time…


#5

Yes, that’s exactly why I want this being managed by Duplicati itself instead of a separate script.

That’s a nice idea! I wonder if Duplicati has something similar in place when using VSS/LVM snapshots.

The benefits would completely depend on the percentage of Duplicati users actually using one of these CoW filesystems. It is probably a fraction of a percentage. But with APFS becoming Apple’s new de-facto filesystem, I’m pretty sure a large percentage of the mac users would soon be using it.

As for the development time, from what I gather, Duplicati is using 3 scripts (find-volume.sh, create-lvm-snapshot.sh and delete-lvm-snapshot.sh) to manage LVM snapshots in linux. By leveraging the existing routines that deal with LVM and writing similar scripts for ZFS, BTRFS, et. al., I would guess the C# codebase would not require a lot of changes.

This is of course is based on my conjecture and somebody with proper knowledge of the codebase would be in a much better position to say if this is indeed be the case.


#6

Yes, that is how I would propose implementing it. This should be easy to do for someone familiar with the ZFS tools.

If the preferred way to do ZFS snapshots is via libraries, it is also easy to call any C-API from C#. It would need to have pretty much the same functions:

  1. Create snapshot for folder/drive(s)
  2. Given a path, give the snapshot (path) that represents it
  3. Delete the snapshot

The second point can be more dynamic, as each call to open a file or read metadata is passed to the snapshot implementation. (I.e. could call something like zfs_open_snapshot_file(path) and return a System.IO.FileStream using the file handle from the call if this is required.


#7

FYI, here are my versions of these 3 scripts for ZFS.

zfs-scripts.zip (2.3 KB)


#8

Great work!

Would you mind if we include this in the Duplicati source code?

Also, assuming yes for the question above, does it make sense to autodetect the filesystem (check if there is ZFS, otherwise use LVM) or should we add another option like --snapshot-type=zfs ?


#9

FYI, I added this to find_volume.sh:

FSTYPE=`zfs get -H -o value type "$MOUNTPOINT"`
if [ "$?" -ne 0 ] || [ -z "$FSTYPE" ]
then
    [[ "$?" -ne 0 ]] && EXIT_CODE=$? || EXIT_CODE=-1
    echo "Error: unable to determine mount point type $NAME"
    exit $EXIT_CODE
fi

Before echoing the mountpoint / device. Because my /boot was actually not a zfs device on one of my hosts.

But sure, feel free to include these in the source.

That said, it should not be too difficult to alter the scripts to detect whether a device is an LVM or ZFS device (or, say btrfs). I might look at that tomorrow, or someone else can (technically I don’t need LVM or btrfs, but I guess the utility of this might make it worth it).


#10

So I’ve been looking at this. I actually think it would be easier to to modify the find-volume.sh to add a fstype variable to what it returns (eg. lvm, zfs, btrfs, etc). Then have it invoke create-${type}-snapshot.sh and remove-${type}-snapshot.sh .

I modified find-volume as such here:

#!/bin/bash

# This script returns the device on which the volume is mounted
#
# Input is always:
#  $1 = name of the folder to locate the LVM device for
#
# The script MUST output a line with device="<path>", where path is the lvm id.
# The script MUST output a line with mountpoint="<path>", where path is the device root.
# This ensures that any tools invoked can write info to the console,
#  and this will not interfere with the program functions.


#
# Rename the input
#
NAME=$1

#
# Get the reported mount point for the current folder
#
VOLUME=`df -TP "$NAME" | tail -1 | awk '{ print $1}'`
if [ "$?" -ne 0 ] || [ -z "$VOLUME" ]
then
        [[ "$?" -ne 0 ]] && EXIT_CODE=$? || EXIT_CODE=-1
        echo "Error: unable to determine device for $NAME"
        exit $EXIT_CODE
fi

MOUNTPOINT=`df -TP "$NAME" | tail -1 | awk '{ print $NF}'`
if [ "$?" -ne 0 ] || [ -z "$MOUNTPOINT" ]
then
        [[ "$?" -ne 0 ]] && EXIT_CODE=$? || EXIT_CODE=-1
        echo "Error: unable to determine mount point for $NAME"
        exit $EXIT_CODE
fi

FSTYPE=`df -TP "$NAME" | tail -1 | awk '{ print $2}'`
if [ "$?" -ne 0 ] || [ -z "$VOLUME" ]
then
        [[ "$?" -ne 0 ]] && EXIT_CODE=$? || EXIT_CODE=-1
        echo "Error: unable to determine device for $NAME"
        exit $EXIT_CODE
fi

if [ "x$FSTYPE" = "xzfs" ]; then
        DEVICE=$VOLUME
elif [ "x$FSTYPE" = "xbtrfs" ]; then
        #
        # Get the BTRFS path for the mapped volume
        #
    
        function get_label {
                LABEL=`btrfs filesystem label "$1" 2>/dev/null`
                if [ "$?" -ne 0 ]
                then
                        EXIT_CODE=$?
                        echo "Error: Unable to determine label (btrfs) for mapped volume $VOLUME"
                        exit $EXIT_CODE
                fi
                SUBVOLUME=`btrfs subvolume show "$1" 2>/dev/null | head -n 1 | awk '{ print $1 }'`
                if [ "$?" -ne 0 ]
                then
                        EXIT_CODE=$?
                        echo "Error: Unable to determine subvolume (btrfs) for mapped volume $VOLUME"
                        exit $EXIT_CODE
                fi

                DEVICE="$LABEL/$SUBVOLUME"
                export DEVICE
        }

        get_label $MOUNTPOINT

        if [ -z "$DEVICE" ]
        then
                EXIT_CODE=-1
                echo "Error: unable to determine volume identifier (btrfs) for $NAME"
                exit $EXIT_CODE
        fi
else
        #
        # Get the LVM path for the mapped volume
        #

        function get_lvmid {
                DEVICE=`lvs "$1" --options vg_name,lv_name --noheadings 2>/dev/null | tail -1 | awk '{ print $1 "/" $2}'`
                if [ "$?" -ne 0 ]
                then
                        EXIT_CODE=$?
                        echo "Error: Unable to determine volume group (VG) for mapped volume $VOLUME"
                        exit $EXIT_CODE
                fi
                export DEVICE
        }

        get_lvmid $VOLUME

        #
        # Get the LVM path for the mapped volume (second try)
        #
        if [ -z "$DEVICE" ]
        then
                VOLUME=`mount | awk '($3 == "'$MOUNTPOINT'") {print $1}'`
                get_lvmid $VOLUME
        fi
 
        if [ -z "$DEVICE" ]
        then
                EXIT_CODE=-1
                echo "Error: unable to determine volume group (VG) for $NAME"
                exit $EXIT_CODE
        fi

        FSTYPE=lvm
fi

echo "mountpoint=\"$MOUNTPOINT\""
echo "device=\"$DEVICE\""
echo "fstype=\"$FSTYPE\""

exit 0

It would be easier to expand to any form of snapshot capable FS this way. Simply distribute scripts for lvm and zfs and btrfs, and let find-volume.sh figure out which scripts to invoke.