Visualize backup data usage

Since Duplicati is cataloging everything, it would be really helpful if it could show me how much space each directory and file in the data is contributing to the size of my backups.

My current source data set is over 35GB, and I could probably trim it quite a bit if only I could easily tell what some of the bigger contributors are.

1 Like

Hello @HunterZ and welcom to the forum!

Block-level deduplication makes it sometimes impossible to say how much space a file adds to the size. Additional copies of the same file add roughly nothing to size of backup. Partially-identical files add some.

Compression is sometimes a factor, but for many people the largest files are audio or visual media, and Duplicati doesn’t try to compress those again when it builds the zip file if they’re in a compressed format.

Versioning can apply. 10 edits of a photo might take 10 times the space. Text appends might deduplicate.

Having said all that, there is a wide selection of tools for your computer that can guide you at least a little.

Here are some examples, and as local tools I suspect they can visualize better than Duplicati’s web can.

I highly recommend WizTree - it’s super quick (nearly instantaneous) compared to WinDirStat.

I think that impossible is not right word, for example NTFS Windows dedup can at least tell us dedup ratio and saving

image

Anyone can also run ddpeval.exe to get evaluation of dedup saving for some folder
Example: Using the Deduplication Evaluation Tool (ddpeval.exe) to Calculate Space Savings Gained before deploying the Windows 2012 Deduplication Feature | Pipe2Text.com
It’s portable exe and it works on W7 and higher.

but it wasn’t the only word. :smile: The rest was ... to say how much space a file adds to the size which was the request, and which I don’t see done. For whole folders this could be a useful approximation, although Duplicati’s deduplication works differently. Despite no visuals, ddpeval.exe may help with analysis.

I’m sure we can make a query that figures it out, but I’m also sure it will take forever to finish running.

The computational complexity is very high.

Although we could probably do something simpler like saying how much space each file takes up without considering dedup.

At a volume level you can figure this out, but not a file or directory level.

Say you have Directory1 and Directory2. Both contain the exact same 5GB file. Which folder gets to claim the dedupe savings?

1 Like

I’m just interested in being able to spot when I’m backing up a gig or two of data from some app that I don’t care about. Whether this is in terms of the live data on my local machine or in terms of the backup archive size isn’t terribly important.

@drakar2007: The problem with using an external tool like WinDirStat is that it shows me my whole disk and cannot easily show me just what I’ve effectively selected as my backup set in duplicati. I’ve got dozens of filters (mostly exclusion) in my duplicati config, and I’m interested in figuring out if what’s still left has any big files that I might want to add additional exclusion filters for.

Thanks for the WizTree suggestion though, I’ll check it out.

This is far from the “visualize” need, but while GUI restore tree doesn’t reveal sizes, find command does.

Running commandline entry
Finished!

            
Listing contents 0 (3/27/2019 1:09:32 PM):
C:\stop test source\ 
C:\stop test source\length1.txt (1 bytes)
C:\stop test source\linuxmint-18-cinnamon-64bit.iso (1.58 GB)
Return code: 0

Big files would presumably have " GB)" in list for a dumb search, but a regular expressions can do better. My suspicion is that these are the original sizes without compression or deduplication, but it’s something.

2 Likes

You can also use the COMPARE command, that will give you the difference and size of any 2 backup versions of your backup set.
Comparing version 1 to version 0 will show the size of the most recent and the second last backup.
Using binary search (start with comparing version 10 to version 1, then compare 10 to 5 or 5 to 0 etc) can help tracking down a backup version with much added data.

You can also subscribe to the free Duplicati Monitoring service. The web interface can show a nice graph of the allocated backend storage per backup version:

image

That’ll do, thanks!

Piped the output to a file, then fired up WSL and ran dos2unix on it. I was then able to use egrep to find any files with sizes >= 1 GB:

ben@Helios:/mnt/c/Users/bensh/Desktop$ cat duplicati.txt | egrep 'GB\)$'
C:\Users\bensh\.android\avd\Nexus_5X_API_28.avd\snapshots\default_boot\ram.img (1.50 GB)
C:\Users\bensh\.android\avd\Nexus_5X_API_28.avd\userdata-qemu.img (6.00 GB)
C:\Users\bensh\.android\avd\Nexus_5X_API_28.avd\userdata-qemu.img.qcow2 (2.01 GB)
C:\Users\bensh\Documents\soundfonts\CrisisGeneralMidi3.01.sf2 (1.57 GB)

Edit: Yep, I was able to reduce my source data size from around 35GB to around 25GB via the exercise, as I suspected. Thanks all.