Possibility to identify large directories in backup

smo · October 21, 2022, 11:11am

Hello,

Is there any way to identify which directories/files occupy the most space in a backup?

Obviously i could just check my source folders with tools like baobab (linux). But I have already tried to define a lot of filters to ignore stuff I don‘t want in my backup. But do these filters all work or have I made mistakes? No idea.

I could restore my backup and than check with baobab whats within. But that would be not very efficient.

Cheers
smo

gpatel-fr · October 21, 2022, 11:56am

Hello

as far as I know there is no such feature. I don’t expect it to be ever developed, since Duplicati is de-duplicating backuped data, it could never be correct.

In the mean time, you don’t need it to see if your filters are working, just begin a restore for the backup and Duplicati will show you the last backup version tree.

BTW using your tool to estimate backup size from the local size could be very wrong, since easily compressed files will be very overestimated, that is, a directory with 250 MB of jpeg files will take 250 MB in the first backup version, while a directory with 500 MB of text files could compress to 20 MB.

smo · October 21, 2022, 3:13pm

Thanks for your reply.

I don’t think that such a feature would be developed either.
I was just wondering if there is any place (maybe the sqlite database?) where I can get the size of the source file, maybe even with the directory it lives in. But I took a quick look at my local database and couldn’t find such information.

My motivation is to reduce the amount of files that are backed up to speed up recovery. Even if all files are compressed and deduplication reduces used disk space, it still takes time to extract the compressed data. This takes times each time I’m testing the recovery. The faster the recovery the better.

Thanks for the hint to just begin a restore to check if my filters are working. I’ll take a look at that.

smo

ts678 · October 22, 2022, 12:00am

Welcome to the forum @smo

For better but slower test, use no-local-blocks, otherwise it gets most blocks from source, not backup.

Duplicati will attempt to use data from source files to minimize the amount of downloaded data. Use this option to skip this optimization and only use remote data.

If you’re really willing to read database, Database rebuild has info on the tables. Source files sizes are in Blockset table. Fitting the wish, File table is now a view with PathPrefix storing the unique folder prefixes.

but there’s a simpler way to know file sizes. Just list the files. See Visualize backup data usage for more.

The TEST-FILTERS command can help test, and (contrary to documentation) appears to accept multiple folders. You could also pair that up with stat --format="%s" or something else that can show the sizes.

smo · October 22, 2022, 5:34pm

Thanks for the detailed tips, @ts678. I will have a look at them very soon.

And thanks for all your work here on the forum!