Hardlinks handling


#1

Hi,

Could someone please elaborate on hardlinks handling by Duplicati?
I have a backup made by Timeshift on my internal drive. And I want to have encrypted copies of that, made by Duplicati - one on an external drive and another in a cloud. Since Timeshift uses hardlinks, I’m setting the Duplicati’s hardlink-policy option to ‘first’. ‘The “first” option will record a hardlink ID for each hardlink to avoid storing hardlinked paths multiple times.’ Am I getting it right, with this option will the hardlinks be recreated when a backup is restored?

Also, does it matter if the Duplicati’s backups are encrypted? I’m asking because here Backup encryption · Issue #118 · teejee2008/timeshift · GitHub there is a message saying that for Timeshift ‘Encryption is not supported as it will break hardlinks.’

OS: Fedora 28, Duplicati version: 2.0.3.3_beta_2018-04-02

Many thanks!


#2

Not quite. Duplicati is built as a “restore my files” backup system, as opposed to “restore my system” (technically, a file-layer backup, not a block-level backup).

The hardlink handling simply exists to avoid infinite loops. With the --hardlink-policy=first option, Duplicati will ignore multiple hardlinks that point to the same place. When restoring, it does not have any hardlink information, so it will just restore the files, but not the additional hardlinks. If you choose all, it will treat each hardlink as a normal file/folder and thus store multiple copies of the same content.

No, the encryption in Duplicati is done before uploading volumes and has no impact on any other features.


#3

If --hardlink-policy is set to all, the storing of multiple copies will be deduplicated, so it won’t eat a ton of storage on the backend.

However, am I correct in assuming that multiple independent copies would be restored, rather than one copy and a bunch of hard links?


#4

Thank you!
One more thing, there is also this option: --hardlink-policy=none. ‘The option “none” will ignore all hardlinks with more than one link’. To me this is quite similar to what you’ve said about the ‘first’ option:

What’s the difference between ‘first’ and ‘none’ then? Sorry, I’m probably missing something obvious.


#5

Yes, correct.

Yes, also correct. Duplicati does not store any hardlink information, so the restore treats everything as individual files. If we want to change this, we need to store some more information about hardlinks. This could be done similarly to symlinks, such that when hardlink number 2 is discovered, it just emits something similar to a symlink, meaning some metadata that explains how to recreate the hardlink.

When restoring, this is easy to detect, but we need some logic to support partial restores, and deal with restoring to somewhere that is not the original path.

Good question! Duplicati uses the linkcount to check if a file or folder is a hardlink (linkcount > 1). If you set the hardlink strategy to none, then all items with a linkcount > 1 will be ignored.


#6

When restoring, this is easy to detect, but we need some logic to support partial restores, and deal with restoring to somewhere that is not the original path.

I can see that restore - especially when to an alternative location - could be tricky to decide.

With the prevalence of symlinks - how much use is there of hard links nowadays (other than rsync targets)?

If there’s much use, it might be worth putting hardlink handling on the future enhancements list.