Ability to Merge two backups

avmaksimov · March 20, 2021, 7:01pm

I think it can be useful in the following scenario. I make a backup for every year, but after some years I want to merge it by some years. And it would be faster for me and D2 to make a new backup and add the same folders as in old backups and delete later old backups.

Is it possible in the near future?

drwtsn32 · March 20, 2021, 7:30pm

It’s a little unclear to me what you mean by “merge.”

If you are talking about two different backup jobs you have defined in Duplicati, then no, there is no way to merge the backup data (the files that Duplicati places in remote storage).

If you are talking about a single backup job as defined in Duplicati, and there is more than one backup version/snapshot, well the data is already “merged” in a sense because of deduplication. Just set the retention on your backup so older versions are removed automatically. I don’t see a benefit to “starting over” with a new backup.

avmaksimov · March 20, 2021, 7:33pm

I mean it and this is my suggestion .

ts678 · March 20, 2021, 7:47pm

Please give an example with dates and files, and say how you want Duplicati Restore screen to look.
A backup version has a time and a copy of the files at that time. A different backup may differ on both.

Xavron · March 20, 2021, 9:23pm

You really don’t want to merge. Its another point where failure can happen. To minimize failure you need to make an entirely new backup.

I think what this person is saying, is to simply make a new backup but bring over the files unchanged already in the last backup? There would be no sending of those files. It would simply be a merging of last into a new one.

But that just isn’t good if the last backup has problems. The problems could then go into the new one. Attempting this may be like attempting to keep some old oil in an engine and add some new oil with it. Not the best thing to do.

People already see problems with “merging” as the backups gain new changes, some have experienced db issues.

If the backup solution for you isn’t working as well as you would like then it would be better to change your method to fit the time you want eg better internet connection, faster local network connections, flash drive instead of internet, etc.

avmaksimov · March 25, 2021, 6:02pm

You understood me correctly. I agree with you that it’s not a necessary or primary thing. But I don’t agree with your comparison with oil.
The db issues can be only if developers will make bugs.
My problems are not in ssd, internet connection and so on, but in things that I want to save backup history and merge two backups to reduce backup count.

May be it will be useful for someone. Even if backup crypted by different keys.

avmaksimov · April 2, 2021, 5:45am

I explain my problem. I have some backups with different passwords and they have some versions. So I want to merge this backups to reduce my backup list.

I worry about current data: it may be corrupted or incorrect. But I can’t check this. Because to do this I have to restore this backups and compare last backup files with current. It takes much time. And I’ll lost my backup’s history. It’s not a big problem BTW.

So, that can be useful instead of this: group backups or ability to get md5 or some other hashes for all files in a backup version or ability to compare with a backup version and data on my disk(s).

ts678 · April 2, 2021, 5:17pm

Allow changing the archive passphrase? gets into some relevant issues.

You don’t HAVE to restore everything, but a simple small check would help ensure things work…

Minimal footprint restore test has some relevant comments, including some on hash checking.
Hashes for each source file are actually in the DB already, and used to verify a restore is good.

Restore for test won’t do that, but merge might. That’s why I asked earlier about intended result.

avmaksimov · April 3, 2021, 7:54am

Thank you. So the question is how I can get hashes for files from db for all files and from all versions. That’s will be very very useful.

ts678 · April 3, 2021, 1:18pm

It means work from you, using tools and scripts that you write. For a gentle introduction, you can look at
How the restore process works, specifically the Verification of the restored files section holds the idea…
Documentation for the local database format will help, but it’s actually more tables than you need to use.

Let me show pictorially some tables, using DB Browser for SQLite.

The Fileset table has versions and their Timestamp:

The ID there goes to the FilesetEntry table which points to files and their Timestamp:

The FileID there goes to the File view:

The BlocksetID there goes to the Blockset table:

FullHash is the SHA256 hash, Base64 encoded.

avmaksimov · April 3, 2021, 4:31pm

Thank you very much!!! I think it’s not difficult job for me.

But there are much information I have to understand.

I think I’ll try to do this in Powershell and put in here.

ts678 · April 3, 2021, 4:39pm

See here (and search for filelist.json) for a method which goes directly to the dlist files for the file info. This is a little harder because you have encryption so your filenames end in .zip.aes and you decrypt using SharpAESCrypt.exe from Duplicati folder, or third-party one like AES Crypt. After that, unzip and get JSON.

Ask as you like. That was just the high-level sketch of it.

I don’t do Powershell, but someone else might help. Ultimately, the question is whether you get it to work…

ts678 · April 3, 2021, 10:41pm

Next alternative is to leverage SQL. Mine’s not that good, but here’s some:

SELECT "FullHash"
	,"Path"
FROM "Blockset"
JOIN "File" ON "Blockset"."ID" = "File"."BlocksetID"
JOIN "FilesetEntry" ON "File"."ID" = "FilesetEntry"."FileID"
WHERE "FilesetEntry"."FilesetID" = 2

I arbitrarily chose FilesetID = 2 for example. For other versions of the backup, use other values.