How does retention work?

I’m looking into duplicati to replace crashplan for my NAS now that they have dropped end user support.

Intention is to rent a 2TB VPS as a destination and use duplicati

I am trying to understand the retention options and their impact on needed space at the destination.

For some of my files, raw photos, I only want one copy kept. They never change and this is a secondary or tertiary backup of those files. To need to restore them multiple hard drives in multiple locations have to have died at the same time.

Other files I probably want versions of to provide quicker restore from user error without having to get the offsite disks back.

From what I’ve been able to understand so far is I set keep-versions to 1 for the photos it’ll do exactly that and roughly need 1:1 source to destination storage ( excluding any compression )
Is that correct ?

For the other files keep-versions of 3 will keep just three versions of each file, and not three versions of the whole backup set ?
So if I have a 100GB dataset where 10GB of it changes regularly I’ll need 100+3*10GB = 130GB of storage at the destination ?

I’ve used duplicity in the past, which works on a different principal of keeping backup sets, so destination storage ends up being a multiple of the source, which will get expensive.

As I said before happy to read myself if someone has a good link.

Thanks

2 Likes

Yes. But be aware that the “versions” are the snapshots not the files.
If you have --keep-versions=1 and make a backup, you will only have that one snapshot (whichever files were present at the time). If files are missing on the next backup, they will go away from the backup.

I think you can easily set this to something larger, as Duplicati 2 is designed to support exactly this use case: large files that do not change. It does this by applying de-duplication, meaning that replicated data is only stored once. So even if you have 3 backups of the same file, it will only take up the space for storing 1 copy.

Duplicati does not support the sync-like setup that would give you true 1:1 copies.

Yes, that is correct.

Duplicati 1.3.x was based on the same idea as duplicity (hence the similar names) but with 2.0 we changed everything to avoid the long chains and expensive (in terms of storage and bandwidth) multiple copies.

1 Like

Thank you for the very clear answer.
I think as soon as I’ve managed to get the first backup, and restore test done, crashplan can say bye bye to my money.

1 Like

Just read through this again and had one follow on question.

I notice a keep everything option as well, so with de-dupe for ‘archival’ storage of images this sounds like it won’t use up space but will provide protection for any idiotic behaviour with deletions ongoing ?

That is the idea. Each backup will only have the overhead of actual changed data, and the file list (one dlist file). If nothing has changed nothing will be uploaded (can be changed with --upload-unchanged-backup).

So “keep everything” works fine for the case of large files that are rarely changed. But what about smaller files that change frequently? With “keep everything” enabled, my backup would grow slowly infinitely, right?


For anyone else trying to understand how retention works, here are some other relevant topics:

Yes. It will grow forever related to the number of file adds & changes (change size, not full changed file size) and not the number of backups (like a full backup would).

1 Like

@deasmi, if you’re still interested in reducing backup density over time the latest canary build has initial support for a --retention-policy parameter as described here.

1 Like

Okay, and if I ever decide: enough is enough, I’ll scrap everything older that X, I can change the setting and the next time the backup runs, it will delete the old stuff? (Yes, I know, technically, it won’t be deleted quite yet but rather marked for deletion, but let’s ignore that for the moment)

That’s correct.
Alternatively (better) you can invoke the delete command. This will do exactly the same, without having to change the backup configuration and reverting the changes afterwards.

Duplicati.CommandLine.exe help delete

Usage: delete <storage-URL> [<options>]

  Marks old data deleted and removes outdated dlist files. A backup is deleted when it is older than <keep-time> or
  when there are more newer versions than <keep-versions>. Data is considered old, when it is not required from any
  existing backup anymore.

  --keep-time=<time>
    Marks data outdated that is older than <time>.
  --keep-versions=<int>
    Marks data outdated that is older than <int> versions.
  --version=<int>
    Deletes all files that belong to the specified version(s).
  --allow-full-removal
    Disables the protection against removing the final fileset
1 Like

Why would I have to revert the changes afterwards. You mean if I want to continue with “keep unlimited versions” after the delete?

@kenkendk would I be too demanding/lazy a user if I expected a hint (or example) here about what format I should use for the time?

Yes, I guessed that your goal was to disable automatic deletion of backup versions and that you wanted to delete old backups every now and then manually.

It’s in the help text:

Duplicati.CommandLine.exe help date

Duplicati supports absolute and relative dates and times:
  now --> The current time
  1234567890 --> A timestamp, seconds since 1970.
  "2009-03-26T08:30:00+01:00" --> An absolute date and time. You can also use the local date and time format of your system like e.g. "01-14-2000" or "01 jan. 2004".
  Y, M, D, W, h, m, s --> Relative date and time: year, month, day, week, hour, minute, second. Example: 2M10D5h is
  now + 2 months + 10 days + 5 hours.

Use the export to commandline feature in the GUI to learn how a specific setting in the GUI can be translated to the commandline.

Note that there is another great feature for thinning out backup versions in the advanced options:

  --retention-policy
    Use this option to reduce the number of versions that are kept with increasing version age by deleting most of the
    old backups. The expected format is a comma seperated list of collon sperated time frame and interval pairs. For
    example the value "7D:0s,3M:1D,10Y:2M" means "For 7 day keep all backups, for 3 months keep one backup per day
    and for 10 years one backup every 2nd month

Thanks for that. But that is a different help text. Above, we were looking at Duplicati.CommandLine.exe help delete and while I had no doubt that the information could be found somewhere, my point was precisely: can we not make life easier for people and give them a hint right where they need it, i.e. when looking at help delete?

Seeing that I need to query help date in order to learn the format, I would even argue more strongly for that because delete help speaks about <time> (not <date>) so that it’s not very intuitive, even when you know how to use Duplicati.CommandLine.exe help.

Yes, but that is only in canary at the moment, right?

help time and help date return the same help text, so Duplicati.CommandLine.exe help time would work also.

Times and dates can be used with a lot of commands. In that case, the same text should be displayed lots of times for all commands it applies to. If every advanced option should be explained that can be applied to a specific command, the help text would be unreadable.

Luckily there is a sort of index that you can use when looking for help about a specific feature. Date and time are listed there too:

Duplicati.CommandLine.exe help

See duplicati.commandline.exe help <topic> for more information.
  General: example, changelog
  Commands: backup, find, restore, delete, compact, test, compare, purge, vacuum
  Reparing: repair, affected, list-broken-files, purge-broken-files
  Debugging: debug, logging, create-report, test-filters, system-info, send-mail
  Targets: aftp, amzcd, azure, b2, box, cloudfiles, dropbox, file, ftp, googledrive, gcs, hubic, jottacloud, mega,
  onedrive, openstack, s3, od4b, mssp, sia, ssh, tahoe, webdav
  Modules: aes, gpg, zip, 7z, console-password-input, mssql-options, hyperv-options, http-options, sendhttp, sendmail,
  runscript, sendxmpp, check-mono-ssl
  Formats: date, time, size, encryption, compression
  Advanced: mail, advanced, returncodes, filter, default-filter-sets, <option>

Commandline features a integrated in the GUI bit by bit, so I guess deleting, listing, comparing etc. will get a nice place in the GUI someday.

The --retention-policy option is indeed only available in the latest Canaries, but I expect a new Experimental/Beta in the very near future.

2 Likes

I think the above is the answer to a question that has been worrying me, since the storage space where my backups go is costing me a lot. I got worried when Duplicati 2 told me yesterday it had “10 versions” and after today’s backup run, “11 versions”, and I became alarmed that this might mean I am paying to store multiple copies of files. I recall the earlier version of Duplicati that I was happily using had allowed me to set that only a specific number of full backups be retained, which I could set to 2 (if feeling nervous) or 1 if feeling like saving costs. In setting up the backup in Duplicati 2, I found nowhere for me to tell it how many to retain, so am I right in interpreting the quote above as meaning it will only retain one copy of each file, just the most recent version?

When editing a job in step 5 (Options) there’s a section called “General options” with a select box labeled “Keep this number of backups” which lets you select a limit based on age, number of versions, or unlimited.

If you set the number of versions to 1 then yes - Duplicati will retain only the most recent version of a file.

Any setting other than that and multiple versions of the CHANGED PARTS of the file will be stored. For example (assuming default Duplicati settings), if you have a 23MB word document then Duplicati will store all 23MB of that file. If you then change (or add or delete) one word in that file, Duplicati will store an additional 100KB of ONLY the changed content.

If you made NO changes to that 23MB file, but another file happened to be changed and a backup ran then when looking at the Restore options it would appear there were multiple versions of the 23MB file backed up, but in reality only a single initial version and that 2nd 100KB change are actually stored in your destination.

I suppose this is something one has to set when looking at the Settings for an existing Backup - “Options” appears at the foot of the screen, but only offers some alarming looking complex commands from a long pick-list.

Everything else you kindly explain is just what I hoped for.

What browser are you using that your seeing Options at the “foot of the screen”?

Thanks for the feedback on the “alarming looking complex commands”! As a developer it can be difficult to view things from a new user point of view. Based on the current text, what did you imagine the different choices would do?

If you feel you know now what the will actually do, do you have any suggestions for text that might make sense to a new user (while still fitting in the interface layout)?

I have Firefox 57.0.4 (64 bit) as my default browser.
When I select “Settings” from the menu at the left hand side of the main Duplicati screen, shown on opening the program, and see then the screen headed “Settings”, by scrolling down, the last section I see has the subheading “Default options”. That was what I meant by “at the foot of the screen” - sorry for being so imprecise.

In the section of the ‘Settings’ page under the heading ‘Options’, I was hoping to see a simple way to see how many backups I had originally told the program to retain, and a chance to change it if I had changed my mind. Whilst I was pleased to see the opportunity there for experienced users to work with advanced options, in the section labelled “Add advanced option”, I hoped there would also be a way of adjusting a few setting that were not advanced, in this instance, the number to retain.

… I am happy to provide feedback as a very novice user, who learnt a bit about programming on a Spectrum 48K and a BBC-B but since then has only been an avid computer user, but I have not delved into the meaning of any of the commands listed under the advanced options. Since they are clearly labelled as “advanced” I am not convinced the need alteration themselves. I expect I could try to learn what they mean from the manual, if need arose, but I hope I do not need to do that.

It sounds like you’re describing this section of the main menu Settings page, is that correct?
image

If so, then it might be that what you’re looking for is in step 5 (Options) of the job Edit menu, which looks a bit like this: