Did duplicati backup everything?

driminicus · September 12, 2017, 11:00am

I just installed duplicati (using the linuxserver.io docker on debian stretch) and have one source folder of about 75 GB worth of data.

I ran the backup and pointed it to a webdav server using GPG encryption. This seemed to have worked fine, I got no error messages (and the tests I ran before actually backing up the folder worked well and I verified that everything was there).

However: on the remote folder only about 38 GB worth of data is used. This is also shown by duplicati; it says:

Source:
75.38 GB
Backup:
38.30 GB / 3 Versions

The folder consists for a large part of jpg image files (and a couple of GB worth of avi files), at least more than the 38 GB backed up, so I assume the discrepancy in space used is not because it is compressed that much.

All files and folders in the folder are owned by the same user that runs duplicati, so it should have access to all files and folders (and if not that should give an error in the logs).

How do I figure out what happened?

Oh, and before I forget: thanks for all the hard work on this project. It’s really cool!

kenkendk · September 12, 2017, 11:09am

This indicates that it has indeed processed 75 GB of data, and somehow compressed it to 38 GB. This could happen if some of the files are identical as they will then be stored only once.

If you are using the WebUI, you should go the the “Restore” menu, and check that all files are found. I also recommend trying to restore a few, such that you are confident that they are in fact present. You can also have a look at the log, which should contain a summary of the last backup, and see if there were any files that were skipped due to errors.

If you are using the commandline, you can use the “list” and “restore” commands to see what files are in the backup and do a test restore.

Thanks!

driminicus · September 12, 2017, 11:24am

Oh, this is the solution! I somehow duplicated a big folder, that would explain. Thanks!

tophee · September 12, 2017, 9:55pm

Hm, I’m not sure I like to hear this advice from the lead developer of my backup solution… I suppose you’re saying this because we’re still in beta?

JonMikelV · September 12, 2017, 10:07pm

Beta or not any backup solution is only as good as it’s restorability! I figure one should ALWAYS be doing test restores from time-to-time.

Personally, when working with a new backup tool (or destination) I like to do a full backup, let a few versions get recorded then do a full restore to an external drive and use Beyond Compare (or similar tool) to do a binary comparison of everything restored.

Sure it can take a long time, but as long as you’re not hitting any bandwidth limits it’s not like you have to sit there and watch the progress bar the whole time. Well, unless that’s something you’re into, I guess…

The end result is not only do a feel better knowing my backup is valid and restorable, but I also learn (or refresh) how to actually DO a restore instead of making mistakes in the heat of the moment when something really does go kablooey (or hurricanee, storm surgee, lightningee, theftee, etcetera-ee).

(To be honest I’ve been thinking about how to put together a “How-To” on doing test restores. I just haven’t quite gotten to it yet.)

kenkendk · September 12, 2017, 10:28pm

I was not implying that I think there are problems, but it is really sad to make backups and then figure out in the end that you forgot some files, or that the restore works different than what you expect. Better safe than sorry, especially when it comes to backups!

Welll … same as what @JonMikelV said.

davegold · September 13, 2017, 12:20am

Corporate backup standard is to do periodic restore tests. Anyone who says that restore tests do not need to be run is not being realistic.

I’ve worked with sites that do weekly tests, and ones that do quarterly disaster recovery tests, and in both cases they see backups as an insurance policy that needs to be verified as working.
Some products even have developed features to do automated restore testing; Veeam is one example in the corporate/paid software world.

tophee · September 13, 2017, 6:12am

Yes, of course, but I thought duplicati already has that built in (it downloads a random block after every backup and checks it)

See here:

davegold · September 13, 2017, 1:05pm

@tophee That is really cool, I must have missed that. I haven’t looked into the dedup engine yet, so how things are CRC’ed at the like are beyond my current knowledge.
Thanks very much for sharing!

Tapio · September 13, 2017, 1:29pm

Yup. I’m doing restore tests every month. But still… things can be overseen, long forgotten and you realize problems after years… E.g., after more than a decade, I realized the diploma thesis of my wife is not existing any more… anywhere. Still unbelieveable for me (I was always doing backups and having backup in mind), but it is as it is, there was a hole…