Staggered versioning / thinning out versions

Hi all,

being yet another victim of Crashplans discontinuation of their home plans, I stumbled upon this lovely piece of software and so far I really like it.

One feature I seem to miss though is staggered versioning or version thinning, as for example ArqBackup calls it. That is, I would like Duplicati to keep hourly backups for a day or so, then daily backups for the current month, weekly for a year and so on. You get the idea :slight_smile:

While I know that Duplicati only saves blocks that differ from a previous backup, the size of my backup location grows by about 500MB with each run, since there are so many files changing. So far this results in about 20GB per week, depending on how long my PC is turned on. With my backup source only being 45GB, I will soon have 10x as much backup data as my original data.

As far as I see my options currently are only:

  • Keeping fewer version (–keep-versions, --keep-time), resulting in maybe not being able to recover a file that got corrupted much earlier without noticing it
  • Reducing how frequent the backup is run resulting in potentially losing the work of several hours on that day in case of a hardware failure

So I hope to keep the growth of my backup down to a somewhat managable level without any of these problems by using staggered versioning.

Being a Java Developer I feel intrigued to try to finally get into C# and to try to add this feature myself. Though before doing so, I wonder if this feature is also something that the devs would consider adopting into the public builds. That probably determines how much time I also have to spend with getting into the whole Github workflow and on integrating that feature into the advanced options or even into the GUI. ^^

Tekki

2 Likes

have a look at this github issue:

There is even an open bounty for this :slight_smile:

2 Likes

I do like the idea of a feature like this, but as an aside, what sort of data are you backing up that causes your backup set to grow that much from routine use? Are you backing up operating system files that change all the time, for example? I usually skip these as having backups of most of them wouldn’t really be specifically helpful in a catastrophic recovery. If i’m off the mark though, forgive me, i’m just curious :slight_smile:

Yes … call me lazy, but I simply back up my whole C:\User<Username> folder ^^
Annoyingly %APPDATA% and %LOCALAPPDATA% often are used for both for storing configuration data (which I want to backup) and cache / temp files (which of course I don’t care about).
So I’d have to go through each and every program and check where it keeps its cache / temp files to specifically exclude them …

For me - 95% of that stuff wouldn’t help in the event of a catastrophic crash, anyway. I’d be reinstalling my programs and in most cases, just manually updating the settings again. I cherrypick the ones where I know exactly which backups would be helpful (plex DB, firefox bookmark saves, notpead++ temp files, etc), and leave the rest basically alone. The real stuff I’m worried about backing up is my big binaries - digital camera photos, scans, dvd and blu-ray rips, and my extensive music collection. And of course most of that stuff is 99% static.

In your case I’d think that having Duplicati store only the last handful of versions might be sufficient. And you’d probably get a good amount of optimization by at least cherrypicking certain temp / cache folders to blacklist from your backups, since those will be data dense and almost completely useless when it comes time to restore from any sort of crash.

Another quick workaround until a better solution becomes available, at the cost of some initial storage memory (though still better than what you have right now), would be to create two backups with the same source files/folders.

Configure one to run every hour and keep backups until they’re older than 7 days and the other one to run once a week and keep backups for a year (add backups/tweak durations to fit your needs).
Of course you don’t benefit from deduplication between the backups but in the longterm it probably still needs much less storage space.

3 Likes

@drakar2007 I guess I could at least remove some of the well known temp / cache locations but I really don’t feel like doing that for each and every new program I install, just to keep my backup small. It should mostly be a “set and forget” thing and not something I have to tune every few weeks. And backing up the exact setting of all the programs is just as important to me as keeping my actual data, so I feel “home” again on a freshly set up PC after a catastrophic event. But maybe that’s just how I feel about it and I certainly understand people who are happy with just keeping their documents and photos :slight_smile:

@Nelvin Thanks for the suggestion. If I should fail to implement this, then I’ll certainly consider this!

@agrajaghh Thanks for the link. I posted a comment in there with a few questions and request for suggestions. I’ll still keep an eye on this thread as well, since I assume the forum is probably used more widely.

From experience I’ve learned to not expect a simple backup of the user directory / appData folders to actually represent a backup of settings for individual programs (necessarily), especially as so much is kept in disparate and scattered locations, such as the original program folder, the registry, etc, depending on the program in question. Hence why I only cherrypick those that I know well and where reconfiguration after a system crash would be more painful than the meticulous efforts to restore - and these days, for me, just not very many applications fit into that category anymore.

For those that might still be interested in reducing backup density over time the latest canary build has initial support for a --retention-policy parameter as described here.