Only backup files newer than X days

Hi,

I am new to Duplicati.

I have a situation where the backup drive is smaller than the amount of data that needs to be backed up.

So, I would like to exclude files from the data with a certain criteria - specifically, I donā€™t want to backup anything older than 3 months.

How can I do this?

Thanks,
Sridhar

Hi sridharb, welcome to the forum!

Duplicati does not currently have any filters that look at timestamps.
There is an option to exclude large files if that helps.

There is also an option to exclude files with specific attributes.

Thanks for your reply.

How can I solve my problem? I donā€™t have space to backup all files. Thatā€™s why I would like to limit it to files newer than 3 months.

I know that in other threads people have created scripts to mark specific files as having the Archive bit set etc and then use Duplicati to filter on that.

Ok, letā€™s say that this happens. Now, after about a week, I run the backup again. There is a weekā€™s worth of new files that needs to be backed up and a weekā€™s worth of old files to be removed from the backup. I can remove the Archive attribute from the old files and include them in the new files.

Will Duplicati remove those files from the previous backup and save the new files? I ask this because if it doesnā€™t, I run out of space again.

Thanks,
Sridhar

Not directly.

Duplicati will start by making a new backup, uploading the new volumes.
If you choose to delete old versions, Duplicati will mark them as deleted (but not remove anything).
Afterwards, if some of the remote volumes have a high percentage of deleted data, then those remote volumes will be merged and deleted.

This makes it hard to use in your scenario, because you cannot directly control how much remote space is used. You can set the compact threshold to something low, but it will give you a large overhead due to repeated compact runs.

I am not aware of another program that could solve your issue, but maybe someone from the forum has an idea/recommendation.

@sridharb Hello
Maybe you could use How to backup files older than X days
Only in backwards :slight_smile:

Iā€™m surprised that something this simple isnā€™t already in Duplicati. Canā€™t we have an ā€“exclude-older-than=1M or an include-newer-than=2W using modified date?

Welcome to the forum!

Can you explain what your end goal is? Besides the initial backup, Duplicati already avoids reprocessing files that havenā€™t been touched.

If your goal is to keep backup versions only for a certain amount of time, you can achieve that with retention settings.

I do a full backup to an external hard drive that I then take off site. My internet is crap so big cloud backups arenā€™t an option. Iā€™d like to use Duplicati to make a backup of only the files that have changed since that last full backup. This reduces space and time required to do the backup, especially for the cloud. Why backup files that are already safe somewhere else?

The other use is to keep a backup of only those photos that Iā€™ve taken in the last 60 days. This is in case I accidentally delete something or something happens and I need to recover those files. I donā€™t need anything older than that so it can roll off the back end of the backups.

The problem is that while the file exists (and it will continue to exist), Duplicati will continue to back it up even though it doesnā€™t need to be anymore.

Unless Iā€™m misunderstanding, retention settings control how long a file is kept in the backup after itā€™s been deleted. Versions control how many copies are kept in the backup. If the file exists but is three months old, how do I keep it from continuing to be backed up?

This is how Duplicati already works. Each backup only backs up the changes since the prior backup. As long as you are using the same external drive you should be good. You can then take it back off-site after the updated backup is performed.

This isnā€™t correct. Duplicati only retains or purges entire backup versions, not individual files within a backup version.

Edit to add: If you delete a file, you have until the retention settings prune off the last backup to occur just before the file was deleted in order to restore it. So if thatā€™s what you mean, then yes that is correct. But Duplicati doesnā€™t retain files based on when they are deleted on the source system. CrashPlan (and maybe others) had that feature but Duplicati does not.

As mentioned, Duplicati wonā€™t process files that havenā€™t changed since the last backup. Static files will not consume any more storage on your backup device.

Iā€™m not being clear. Iā€™m talking about two separate backups. One is a hard drive. Every week I do a straight file sync of whatā€™s on my NAS to that drive. Iā€™m not using Duplicati for that. Iā€™m using FreeFileSync.

The second backup is a cloud backup of anything thatā€™s changed since that last physical hard drive backup. I would wipe the database and the block files and start over every time so that the only thing in the cloud was what I would need to recover if my house burned down after I took that full hard drive backup offsite. As I said before, I donā€™t have the bandwidth or data in my internet package to allow managing lots of files in a cloud backup location so I only want to store critical files that have changed since the last physical backup.

If I have 60,000 photos and Iā€™m only concerned about the most recent 3 months then why do I have to back up the 57,356 images that were taken before that (and are safely archived already).

It doesnā€™t seem like an option to only evaluate files for backup that are newer than 2W or even since a particular date should be that hard. Itā€™s already considering modification dates to decide if the file has been updated since the last version in the backup. Why canā€™t it compare that to 2 weeks (or months) to see if it should include the file in the backup?

I really like Duplicati and have donated to support it -twice- but this minor feature omission seems strange.

Ah okā€¦ That clears it up for me. Interesting use case.

While Duplicati currently doesnā€™t have an option to do exactly what you want, some have gotten it to work by using a pre backup script that sets a flag on files they want to back up (archive bit for instance).

Whew! Iā€™m glad I managed to make it clear. It is a strange situation. I use a hotspot for my home internet so speeds are limited but more importantly I have a 100GB/month data cap. Iā€™m always thinking about that data cap when I plan anything as we are usually pretty close (and sometimes over) that cap.

I am already doing something like what you mention. Iā€™m using XCOPY to bring the files updated since the last physical backup into a folder, then using Duplicati to back up that folder. It works but it doubles the work since the files have to be copied form the original location into a temp location, then backed up. It works but itā€™s not pretty.

Iā€™ve tried playing with archive bits (which oddly enough Duplicati CAN use) but Iā€™ve got a bunch of files that I canā€™t seem to reset the archive bit on. I need to work on it because it might solve my particular problem. I could reset all the archive bits when I do the physical backup and then tell Duplicati to only backup the files with the archive bit set, which should only be files modified since the last full physical backup.

Thanks for you help and clarification. I definitely need to do some experimenting to fully understand how the retention works since I didnā€™t get it. Any helpful articles?

That sounds like a great solution.

In the future, if enough people request native support in Duplicati for backing up files based on modification date, I think it could be added pretty easily.

Iā€™m not sure if there are any good articles written up. But start by reading some of the other threads in this forum. Just search for ā€˜retentionā€™ and youā€™ll find many.

The most important thing to know is that ā€œversionsā€ refers to backup snapshots (versions), not individual file versions. When Duplicati prunes backups it does so at the backup snapshot level - not individual file level.

Good luck!

I would definitely like to see this option.

The way archive bit works is this: when a file has been created or modified in Windows, the archive bit is flipped on (ā€œFile is ready for archivingā€).

Maybe Iā€™m old schoolā€¦ when I was an enterprise server admin years ago, I was setting up backup jobs for ARCServe and Backup Exec to run. So the way I setup the backup job to run is this:

  • Weekly full backup - backup all files regardless of Archive bit status, and flip all the Archive bits of all files to off as AS or BE backed up each file
  • Daily differential backup - backup all files that have Archive bit set on, but donā€™t flip the Archive bits to off

That way, each week I have a full backup + a differential backup of the changes since the last full backup every other day. If I ever need to restore the server files, I would restore the last full backup + the latest differential backup.

If you flip the Archive bit to off for the Daily backup, youā€™ll create an incremental backup. It will work too, but in case of full server restore, you will need to do a full backup, plus all the incremental backups for each day since the last backup - which will take longer to restore all the files.

So thatā€™s how I leverage the archive bit for backup.

Youā€™re speaking my language. I worked with Backup Exec a lot in the late 90ā€™s and into the 2000ā€™s. Archive bit was definitely the way to enable differential or incremental backups (of course to tape).

We really donā€™t have to use this approach nowadays with Duplicati or most other modern backup applications. They can detect file changes by using other methods: metadata change from previous backup, filesystem change journal, etc.

Also with block level deduplication (something definitely not available in the Backup Exec-to-tape days), if a file is changed the entire file doesnā€™t have to be backed up again - just the changed bytes.

With Duplicati every backup is both an incremental (only changes are processed) AND a full backup (all files can be restored from a single point-in-time snapshot). You get the best of both worlds.

Would love to see this feature! I have another use case for this: I would like to back up my latest surveillance camera videos to some online storage (i.e. B2). The idea is that in case the NVR is stolen, I still have a copy of my last few hours of surveillance footage. I am not interested in backing up anything older than few days.