Backup files only if modification date > certain date in the past

Bart80 · August 13, 2019, 7:07pm

Hi,

I would like to backup files in one single folder structure, but only if the modification date of the files is after a given date. There are files that won’t fit to that criteria, but Duplicati must ignore these files.
Is this possible? I can’t seem to find an option for that.

Thanks,

Bart

drwtsn32 · August 14, 2019, 8:58pm

No, I don’t believe there is any option to exclude or include files based on date. You could submit this as a feature request “issue” here -

In your situation, what date are you talking about? Creation date or modification date? Would your usage be a fixed date or would it be a file age? I can think of a few options that might be useful to different people:

Exclude files that are older than X days
Exclude files that are younger than X days
Exclude files that are created (or modified?) before X date
Exclude files that are created (or modified?) after X date

I think they could all be excludes - what would work as an include should also work as exclude by just reversing the comparison.

ts678 · August 15, 2019, 12:43pm

Agreed. It’s one of the recurring requests (either for filter-out-older or filter-out-newer direction), such as:

Filter for new files

which got inspired by

Filter: older than x days

which points to some ways you could try to script this yourself (along with caveats on things like files repeatedly aging out then coming back into the backup – which might result in some excess uploads. Possibly you’ll get lucky and when the file goes “deleted-due-to-age”, its data blocks will linger awhile.

Bart80 · August 16, 2019, 1:38pm

Hi,
Is this also possible for non-Windows OS?
I am running this on my NAS (Synology). The exclude-files-attributes option could help, but none of the possible attributes seems to exist in the NAS.
For instance, I could take the “read-only” attribute (the files are not supposed to change anyway once they are no the NAS) but if this attribute does not exist in the NAS OS it won’t help.
Except if Duplicati considers files where nobody has write access to as having the Read-only attribute. But I’m not sure if that’s the case…

Thanks,

Bart

drwtsn32 · August 16, 2019, 2:39pm

Yes, the commands should be adaptable to a Linux type system (which the Synology is). I would have to do some testing on attribute matching – “read only” is more complex on Linux.

Let me know exactly what you’re looking for and I can help out. Do you want a script that marks files (read only or otherwise) based on their last modification time, or creation time? Creation time may be better, because it should indicate when the files were placed on the NAS.

And once you decide which timestamp to focus on, should the script flag ones that are OVER a certain number of days (for exclusion by duplicati)? If so what is that number?

Bart80 · August 16, 2019, 4:57pm

I would like the script to look at the modification date. Should any file change, it doesn’t get filtered out.
About the limit date: the limit date itself is fixed (so no today - X days; but really a date in history). Aim is to filter out the files that are backed up to the cloud via another system - it’s a heavy volume which I rather not repeat if not necessary.
How do you plan on having Duplicati recognize the file (ie. determine if it must be ignored or not) ? Via a certain attribute? Which one?

Thanks,

Bart

warwickmm · August 16, 2019, 5:09pm

You may want to be careful with how symbolic links are treated (if you’re using the follow option) and whether the reported modification date pertains to the target or the link itself.

drwtsn32 · August 16, 2019, 5:44pm

Ok, it could be configured with a specific hard-coded date… so any file with a modification time after that date would be included in Duplicati backup. That is what you are looking for?

Not sure yet. ReadOnly was mentioned above, so one option is to have a pre-backup script set all files to read only that are modified BEFORE that cutoff date. Then the filter in Duplicati would exclude read only files.

We would have to consider unintended filtering and check for edge cases where a file may be considered read only by Duplicati but not meet the modification time criterion.

I also don’t really know how the read only detection works with Duplicati. On Linux there are three permission levels and any number of them (including none of them) may be read only. I did a quick test on a folder with varying permissions set:

Capture

I configured a test backup job and set --exclude-files-attributes=ReadOnly. In the above list I definitely expected test.4 and test.5 to be excluded. Unfortunately it didn’t work as I expected. Duplicati backed up all 5 files:

It is possible it’s because I’m running Duplicati as root, and the root account can bypass these permissions.

ts678 · August 16, 2019, 9:21pm

To toss in another option for Linux, thanks to the find command, you could enumerate the files that meet your criteria, and feed the paths into the –changed-files option. If it chokes on size, do in –parameters-file.

Real-time backup isn’t really what you’re seeking, but the idea of carefully creating a file list is found there.

drwtsn32 · August 16, 2019, 11:01pm

I like that idea even better - no messing with permissions.

ts678 · August 16, 2019, 11:52pm

There are some questions that can be tested, for example what happens when a file drops from the list? Ordinarily in a backup, I think the disappearance of a file that was there before is treated as a file deletion but whether or not that’s the case with --changed-files is less clear. A guess is that it works that way too.

Some find options that might help here are:

-newer file
File was modified more recently than file .

To find only those files newer than a backup that has touched a date-keeping file, e.g. in –run-script-before for Duplicati, and maybe the same file can be touched somehow before running the other backup (or even ONLY there – Duplicati can figure out what files have changed since its last backup, without external help).

-printf, -fprintf

May simplify production of output in a suitable format for --changed-files to swallow. Format is in prior link.

As a side note, a bit of Google searching finds people using other file attributes as approximations for the Windows archive attribute. It’s Linux filesystem-dependent whether one can actually use advanced ones, however I also saw people using what stat calls the “time of last status change” to hold extra information. Duplicati doesn’t (I think…) use it, but find has -ctime ability, as well as the probably more usual -mtime. This is a side note because it might call for another script to examine the extra info to produce the file list. Seems simpler to just build it directly, because find is so good at finding files based on date information.

Bart80 · August 17, 2019, 11:54am

If you can really explicitly put filepaths in --changed-files this seems a good solution. But what is the syntax? I can’t really get it clearly out of the documentation. And with the --parameters-file: if the number of files provided to --changed-files is too big, it would be required to have the --changed-files parameter multiple times in the parameters file. Is this possible?
Also, thanks a lot. Learning a lot here!

drwtsn32 · August 17, 2019, 1:50pm

@ts678 - I was thinking about this a bit more and I don’t know how we could utilize --changed-files (or --parameters-file) while still utilizing the built-in scheduler.

I was looking at the source and believe both parameters need to be set at the time the backup job is triggered. There is no opportunity for --run-script-before to modify either of them.

AimoE · August 17, 2019, 2:31pm

The parameters file can be re-generated in a pre-backup script triggered by the backup job itself. The filename need not change, just the contents.

The find command, btw. is also available for Windows users via Cygwin.

drwtsn32 · August 17, 2019, 6:44pm

Yep I understand that the parameters file can be adjusted by the pre-backup script… I just didn’t know that Duplicati would re-read the file if it sees it was modified. I’ll test it out when I get a chance…

ts678 · August 17, 2019, 7:22pm

Below prior link has a description from the likely original developer that goes beyond what manual has:

path separated list (using ; on Windows and : on others).

I doubt it’s possible to repeat, but I also doubt there is a limit (except maybe memory, if you have little). Processing of the parameter file probably is all Duplicati and C# code, which I think tends to have few hardcoded limitations, unlike the plan of passing enormous length to command startup which may hit:

Is there a maximum length for a shell script command?

How long can the command line be? (looks really old, but the point is that argument length has a limit)

I’m not sure it does, so good question about operation ordering that might require the file list to be in the parameters file before the job run, which means --run-script-before run time may be late. If all else fails, building file list could be done outside of Duplicati, then read in either by CLI start or (ugly) proper timing.

Another timing point – earlier recipe glossed over this, but you want to be able to handle a failed backup somehow, because the design involves moving a time marker of some sort saying that backup handled everything up to that point (and maybe a bit beyond because backup is not instant). Need to ensure that. Rolling the time marker back on failure would be awkward, but not moving it forward until good is easier:

Say the time of last backup is start of Monday for 1 hour. Tuesday starts another just for files newer than start of Monday. Some might have been between 12 AM and 1 AM and backed up already. No problem…

If Tuesday backup fails and nothing is done to reset the time marker, Wednesday will start at Tuesday so backup of the last 23 hours of Monday will have been missed. Solution may be to define the time marker from a time point of view before the backup, but not actually put it in effect until after a successful backup. Putting a file-based time in effect can be done using mv rename which leaves the modification time alone.

Scripting options

run-script-example.sh

It seems like there ought to be some way to stitch this together, but clearly it’s not completely trivial to get.

EDIT:

github.com

duplicati/duplicati/blob/193a1a3ac34049c545eb52885b65fe8723dd5156/Duplicati/Library/Modules/Builtin/run-script-example.sh#L49-L51


      
          ###############################################################################
          # Changing options from within the script 
          ###############################################################################

might be a way to set the --changed-files without worrying about whether --parameters-file is read again.