i just stumbled over something (i’m mostly using the gui and there the “tickboxes” and the filterlist in “text mode”) regarding filters.
When i filter some subfolders/files out of an backup job, these filters are added to the filterlist. Great!
But when i delete these folders/files physically on the disk - the filters are still there. They just don’t have to do anything anymore.
If i do this frequently, i get an ever growing filterlist with more and more filters that (can) do nothing.
But nonetheless, they will be handed over and applied every time the job is started.
If i’m looking in the future, i get a lot of useless filters filtering non existent things, right? Perhaps slowing things down for no reason because beeing processed anyway?
Could this somehow prevented?
Perhaps a regular check if things that are included/excluded do exist in the first place?
Can you give an examples what you do? I thought about a purpose to add, filter, remove, add, filter, remove, add, filter remove folders in a backup?
I noticed this on two occasions so far:
- i did exclude 4 apple devices backup folders created by itunes. i just got rid of one of the devices and itunes deleted the backup folder for this device on the filesystem. The exlude-filter in Duplicati still exists and would forever, if i did not stumble upon it.
- i backup a work folder for a client of mine. In this folder i created a (do this regulary) temp folder with an distinct name for a project. Because it is a temp folder i exclude it from backups. After the project is finished, i move everything in the right places and delete the temp folder. Exlude-filter still exists (forever).
I’d like to keep my habits of working and have the backup follow them and not to have change my habits because of the backup rules. Not to be misunderstood: i like duplicati, i like my data safe and like to understand what my backup software is doing.
But it doesn’t matter much how often you do this, after a long period of using duplicati, you could end up with lots of useless filters that do nothing (i just had two of them) - that’s my thought.
I’m guessing most users don’t have that many filters in play. I, however, am NOT one of those users so I can definitely see your point.
Excessive filter rules (especially regular expressions) CAN have an effect on scanning time - particularly with large file counts. However, automatically removing rules that “no longer apply” can come around and bite you. For example, if your backup includes a symlink or mount point that isn’t ALWAYS available then a filter rule could be removed simply because a USB drive wasn’t mounted or the network was down.
Similarly, if I’ve got a soon-to-be-failing drive and I move all my MP3 files over to my new drive then a backup runs (pointing to the old drive) it could say “hmmm…no mp3 files found, guess I can get rid of this ‘include *.mp3’ rule…”.
I could get behind a command / GUI button to “evaluate filters” and spit out a list of what has no effect at this particular moment. This allows the user to make a conscious decision about what filters should be removed.
If implemented, that same process could be part of a backup with the results included in summary log / email messages.
…spit out a list of what has no effect at this particular moment.
That’s an idea i do like!
As this is a case for special occasions i do like the fact to trigger it every now and then. So you can wake up someday (after 10 years successfully backing up with Duplicati) and go “Hmmm, let’s see what the filters do…”
This way it sounds even easier to implement?
I don’t know specifics of how to code this but I suspect it would be something like:
evaluate-filters command (not parameter) so it could be run from the GUI “Command line …” page
- on the server side add code to the existing
test-filters command that keeps a counter of how many times each filter actually matches something
- make those counter results available in output
Note that I’m picturing full RULE testing, not individual parts. So if you’ve got
--exclude=*.tmp --exclude=*.mp3 you’d get TWO usage counts, but if you’ve got `–exclude[.*.(tmp|mp3)] you’d get ONE usage count.
While breaking up regular expressions rules into pieces is possible, it greatly increases complexity.
I would use this mainly as a check every now and then to see if there are useless filters (at the time of execution). Duplicati maintanance job to keep it running at full speed
After a first impression one could dig deeper.
I have made a small test. Some different regular expressions (from [a-z]* to more complex ones) and some test strings with a length between 150 and 200 (in the case of Duplicai so many files) in a c# program.
1.000.000 match-tests took about 2-3 Seconds. So you can test 50.000 different filenames, each with 50 different regex in less than 10 seconds.
So I would not be worried about the performance!
So I would not be worried about the performance!
Ok Thanks for testing this.
I do have around 10 backup jobs with around 20 reg expressions each and about 700k files total. Duplicati seems to handle this well at the moment.
I just like an orderly room here.