Source data contains hard links and junctions (Windows) - very large number of files

I use UrBackup to backup my Windows and Linux systems to a central backup server. I am trying to get this backup data offsite. For the most part, Duplicati works great, but one of the systems I back up is a file server. UrBackup uses a combination of hard links (for files) and directory junctions on NTFS in order to save disk space. According to Duplicati, my backup of my file server contains ~50TB of data and ~15 million files (as all files are treated unique). WizTree and other disk space tools that read the MFT table show it to be 1.6 million files and folders, consuming 1.7TB of allocated space and about 6TB of “space.”

While the space used on the remote side definitely reflects the deduplication capabilities of Duplicati (Duplicati shows 6.23TB for another machine with ~900k files and folders, 1.4TB “size”, and 450GB allocated space on disk, the remote backup is only 88GB), trying to roll through 15 million files on a spinning disk is very slow going when every file is checked in 1MB chunks (my blocksize).

While I have enabled USN Journal, that will only help after the first backup is complete; the initial backup has been running for 3.5 days and still has 12 million files and 42TB left. I am not sure how long a new backup with USN Journal enabled (and working) will take, or if it will even be helpful. Are there any options that will recognize the hard links and directory junctions (essentially a soft link to a directory) and allow the backup to not take so long?

I don’t think Windows has much. Others get a bit more, however would any of below actually work for the UrBackup file design even if they worked on Windows? To me, hard links seem kind of backup-unfriendly.


--hardlink-policy = All
Use this option to handle hardlinks (only works on Linux/OSX). The first option will record a hardlink ID for each hardlink to avoid storing hardlinked paths multiple times. The option all will ignore hardlink information, and treat each hardlink as a unique path. The option none will ignore all hardlinks with more than one link.

symlink-policy is a little more Windows-friendly, I think, but you’d have to test to see if they help your use.

--symlink-policy = Store
Use this option to handle symlinks differently. The store option will simply record a symlink with its name and destination, and a restore will recreate the symlink as a link. Use the option ignore to ignore all symlinks and not store any information about them. The option follow will cause the symlinked target to be backed up and restored as a normal file with the symlink name. Early versions of Duplicati did not support this option and bevhaved as if follow was specified.


changed-files would let you write a custom scanner, but what would the scanner tell Duplicati to backup?
There’s no file type specifically for hard links, so your options might be to either back new file up – or not.