Using Duplicati as archiving tool

lopixo9083 · May 22, 2021, 6:39am

I have the following usecase and I’m wondering if it can be done with Duplicati:

I have a set of files from a completed work project. Likely those files don’t change anymore but I want to archive them. What I can imagine as flow is:

Run a Duplicati action to backup the entire folder. Store the main results somewhere in cold storage (eg deep glacier), but keep a snapshot file (maybe the duplicati database) locally.
After e.g. a year, come back to the folder, and check if there have been updates since the last backup. I don’t really expect changes, but maybe somebody added e.g. some notes or an invoice or some other small file. Create a delta archive, and store the delta again somewhere in cold storage.

A simple solution to do this is with tar --listed-incremental (GNU tar 1.34: 5.2 Using tar to Perform Incremental Dumps), but this doesn’t exist natively in Windows. I could do the first step with 7zip, but this doesn’t have this feature with snapshot files - it needs the full first backup to be able to produce a delta file.

Is there a way to make this with Duplicati?

Kahomono · May 22, 2021, 2:30pm

Wait - are you sure 7z doesn’t have a flag to add files modified since some arbitrary date? Or all files newer than a specified file? You can plant that time-anchor file by saving the log of the original archive action.

lopixo9083 · May 22, 2021, 2:58pm

7zip doesn’t seem to support this natively, but indeed, I could use an external tool/script to generate a list of new files and feed that to 7zip.

It’s a bit more error-prone since it relies on file attributes and not on file content so if something comes along and updates all the time stamps, it’ll make a very big update in the next backup round, but I guess that’s a small risk.

ts678 · May 22, 2021, 4:22pm

This sounds like a normal backup. At first I was worried that this was an archive-and-delete-from-source.
I don’t like them for several reasons. The UI is not designed to find them, and backup is not 100% reliable.
Backup is to be a second (or for safety, third, etc.) copy, and not the one-and-only copy of important data.

Features

Incremental backups
Duplicati performs a full backup initially. Afterwards, Duplicati updates the initial backup by adding the changed data only.

The backup process explained

How the backup process works

Cold storage is a separate issue. Some (e.g. Google Cloud) seem mostly like a pricing table difference.
Others genuinely require special actions to get to cold files, and this requires workarounds for Duplicati.

I believe Duplicati won’t get tricked by this. The updated time stamp will make it scan the file, but contents being the same means no file data gets updated. Only the metadata (including time stamp) gets updated.

I’m not seeing it either, but I’m not sure how that leads into having to script something. I think it’s more like

Differential Backups with 7-zip shows how the differential (or “update”) mode can make the difference file.
One drawback is that on the system you need the original files and the original 7-zip backup always there.
Duplicati needs only the original files and the local database that knows the hash of the original file blocks.

Simplest solution is to just do Duplicati backups on a regular schedule to pick up whatever gets changed.
Pricing can be held down (if needed) by using less expensive always-there storage, not true cold storage.

lopixo9083 · May 23, 2021, 6:11am

Thanks for your detailed response!

I was indeed not sure about the terminology for archive vs backup. I choose archive to indicate that the data is likely to stay unchanged. I also had the mental image of a tape archiver in mind. It would always be an addition to my normal 3-2-1 backup setup.

For some reason it gives me peace of mind to have an virtually immutable copy of my data somewhere, which doesn’t rely too much on tooling changes that can break or other ways to corrupt my backups.

For 7zip, there are two different solutions:

An differental backup, but it uses the original 7zip backup. This requires a lot of local disk space since I need to keep all backups also locally.
A backup based on timestamps: an external tool to provide files that have changed after a certain time stamp.

ts678 · May 24, 2021, 1:21am

This is getting confusing. Original post wished to use Duplicati, but now the intent is to not use it?

I don’t know your work project file counts, sizes, or compressibility, but would plain file copies do?

rclone sync or rclone copy can copy additional small files as files, without re-uploading everything.
If somebody/something edits an existing file, the whole file would upload. You may lack versioning,
however some storage types such as Backblaze B2 allow versioning (and space usage goes up).
You can also somewhat limit damage (accidental or malicious) by not giving deleteFiles capability.