Hardlinks handling

Hi,

Could someone please elaborate on hardlinks handling by Duplicati?
I have a backup made by Timeshift on my internal drive. And I want to have encrypted copies of that, made by Duplicati - one on an external drive and another in a cloud. Since Timeshift uses hardlinks, I’m setting the Duplicati’s hardlink-policy option to ‘first’. ‘The “first” option will record a hardlink ID for each hardlink to avoid storing hardlinked paths multiple times.’ Am I getting it right, with this option will the hardlinks be recreated when a backup is restored?

Also, does it matter if the Duplicati’s backups are encrypted? I’m asking because here Backup encryption · Issue #118 · teejee2008/timeshift · GitHub there is a message saying that for Timeshift ‘Encryption is not supported as it will break hardlinks.’

OS: Fedora 28, Duplicati version: 2.0.3.3_beta_2018-04-02

Many thanks!

Not quite. Duplicati is built as a “restore my files” backup system, as opposed to “restore my system” (technically, a file-layer backup, not a block-level backup).

The hardlink handling simply exists to avoid infinite loops. With the --hardlink-policy=first option, Duplicati will ignore multiple hardlinks that point to the same place. When restoring, it does not have any hardlink information, so it will just restore the files, but not the additional hardlinks. If you choose all, it will treat each hardlink as a normal file/folder and thus store multiple copies of the same content.

No, the encryption in Duplicati is done before uploading volumes and has no impact on any other features.

1 Like

If --hardlink-policy is set to all, the storing of multiple copies will be deduplicated, so it won’t eat a ton of storage on the backend.

However, am I correct in assuming that multiple independent copies would be restored, rather than one copy and a bunch of hard links?

1 Like

Thank you!
One more thing, there is also this option: --hardlink-policy=none. ‘The option “none” will ignore all hardlinks with more than one link’. To me this is quite similar to what you’ve said about the ‘first’ option:

What’s the difference between ‘first’ and ‘none’ then? Sorry, I’m probably missing something obvious.

Yes, correct.

Yes, also correct. Duplicati does not store any hardlink information, so the restore treats everything as individual files. If we want to change this, we need to store some more information about hardlinks. This could be done similarly to symlinks, such that when hardlink number 2 is discovered, it just emits something similar to a symlink, meaning some metadata that explains how to recreate the hardlink.

When restoring, this is easy to detect, but we need some logic to support partial restores, and deal with restoring to somewhere that is not the original path.

Good question! Duplicati uses the linkcount to check if a file or folder is a hardlink (linkcount > 1). If you set the hardlink strategy to none, then all items with a linkcount > 1 will be ignored.

2 Likes

When restoring, this is easy to detect, but we need some logic to support partial restores, and deal with restoring to somewhere that is not the original path.

I can see that restore - especially when to an alternative location - could be tricky to decide.

With the prevalence of symlinks - how much use is there of hard links nowadays (other than rsync targets)?

If there’s much use, it might be worth putting hardlink handling on the future enhancements list.

1 Like

On linux systems it is quite impossible to make useful backup of the system without correct handling of hardlinks. I just tried that. This is the result - lot of system binaries are missing because they are hardlinked to different locations:

Only in /./bin: bzcat
Only in /./bin: bzip2
Only in /./bin: uncompress
Only in /./sbin: jfs_fsck
Only in /./sbin: mkfs.jfs

Welcome to the forum, @Nanda_Bhikkhu

As mentioned above, Duplicati is not really designed as a full system backup product. I’d recommend using it for user data only, not system files/binaries.

Linux distros vary. On mine (a Linux Mint), several of the files you show are not hard linked:

$ ls -li /bin/uncompress /sbin/jfs_fsck /sbin/mkfs.jfs
262336 -rwxr-xr-x 1 root root   2301 Oct 27  2014 /bin/uncompress
917704 -rwxr-xr-x 1 root root 404408 Jul 17  2013 /sbin/jfs_fsck
917756 -rwxr-xr-x 1 root root  55904 Jul 17  2013 /sbin/mkfs.jfs
$ 

and some others are, but just in the same directory. The 3 is the number of hard links, per ls:

$ ls -li /bin/bzcat /bin/bzip2 /bin/bunzip2
293106 -rwxr-xr-x 3 root root 31352 Jul  4  2019 /bin/bunzip2
293106 -rwxr-xr-x 3 root root 31352 Jul  4  2019 /bin/bzcat
293106 -rwxr-xr-x 3 root root 31352 Jul  4  2019 /bin/bzip2
$ 

If you have the stat command, it’s a bit more informative (goes beyond just inode number).

What does your system have, and what test (with what –hardlink-policy option if set) is this?
The output looks like what diff might make, but I don’t know the directories or their source.

It sounds from earlier posts like nothing should be missing, but a restore will not re-link files.
In other words, for my example, bunzip2, bzcat, and bzip2 would be individual identical files.

Linux system directories can get quite complex. Sometimes there are directory symlinks too.
I’m just curious about details of the files. Posts read like default should be to backup all files.