Skip-files-larger-than ignored

bug

#1

Hi,

I’m running Duplicati 2.0.3.9_canary_2018-06-30.

I have a rather slow uplink and have to backup a large quantity of photos - so I thought I’ll start with the smallest photos first and slowly increment this over time, I’ve setup this backup job in the GUI and tried to use ‘skip-files-larger-than’ in there as well as the CLI and result is the same, it seems like the filesizes are being ignored.

Example - here is a directory I wish to backup - I want only the 1st file of 3608644 bytes to backup and not the 2nd of 4049000,

$ ls -al /mnt/data/pics/mypics/
total 7516
drwxr-xr-x   2 myuser myuser    4096 May 31 17:21 .
drwxr-xr-x 359 myuser myuser   20480 Jul 13 02:11 ..
-rw-rw-r--   1 myuser myuser 3608644 Feb 12 14:57 20180212-145742.jpg
-rw-rw-r--   1 myuser myuser 4049000 Feb 12 14:57 20180212-145745.jpg

So I set my skip-files-larger-than to 36086700 to ensure that ONLY 20180212-145742.jpg gets backed up:

# mono /usr/lib/duplicati/Duplicati.CommandLine.exe backup "googledrive://duplicati/backuptest/?authid=xxx" /mnt/data/pics/mypics/ --backup-name=test-backup --dbpath=/root/.config/Duplicati/test-backup.sqlite --encryption-module=aes --compression-module=zip --dblock-size=50mb --passphrase=xxx --retention-policy="30D:1D,16W:1W,36M:1M"  --disable-module=console-password-input --skip-files-larger-than=36086700

Backup started at 17/07/2018 10:19:24
Checking remote backup ...
  Listing remote folder ...
Scanning local files ...
  3 files need to be examined (7.30 MB)
  2 files need to be examined (7.30 MB)
  Uploading file (7.32 MB) ...
  Uploading file (6.31 KB) ...
  Uploading file (1.11 KB) ...
  Deleting file duplicati-20180717T085325Z.dlist.zip.aes (1.11 KB) ...
  Deleting file duplicati-becf96572df9445049e4fbfa68f43a73e.dblock.zip.aes (7.49 MB) ...
  Deleting file duplicati-ibcd9c808dc7a4fdea6f9d94a28146e07.dindex.zip.aes (6.39 KB) ...
Checking remote backup ...
  Listing remote folder ...
Verifying remote backup ...
Remote backup verification completed
  Downloading file (1.11 KB) ...
  Downloading file (6.31 KB) ...
  Downloading file (7.57 MB) ...
  0 files need to be examined (0 bytes)
  Duration of backup: 00:07:00
  Remote files: 6
  Remote size: 14.91 MB
  Total remote quota: 90.28 TB
  Available remote quota: 80.28 TB
  Files added: 2
  Files deleted: 2
  Files changed: 0
  Data uploaded: 7.33 MB
  Data downloaded: 7.58 MB
Backup completed successfully!

As can be seen above - 2 files was added and not the expected 1.

This reflects in the WebGUI as well, and doing a ‘find’ I can see it has just backed up:

# mono /usr/lib/duplicati/Duplicati.CommandLine.exe find ""googledrive://duplicati/backuptest/?authid=xxx" /mnt/data/pics/mypics/20180212-145745.jpg --backup-name=test-backup --dbpath=/root/.config/Duplicati/test-backup.sqlite --encryption-module=aes --compression-module=zip --dblock-size=50mb --passphrase=xxx
Listing files and versions:
/mnt/data/pics/mypics/20180212-145745.jpg
0       : 17/07/2018 10:19:28 3.86 MB
1       : 16/07/2018 22:54:52  -

At first I assumed it was related to file allocation sizes locally, however even not backing up files larger than 1 byte / 1k / 1mb has the same effect.

Pointers as to why this is happening will be appreciated.


Critical issues with the current canary
Critical issues with the current canary
#2

Odd - I wasn’t able to replicate this issue, however I checked using the test-filters command on Windows.

If you run test-filters in the CLI and add --console-log-level=profiling and --console-log-filter=+*.Controller*;+*.TestFilterHandler*;-* does it show the files being excluded as expected?


#3

Thanks @JonMikelV,

I just replicated it with a brand new backup I’m afraid - I would expect only testfile_1 to be matched here:

# ls -al /root/backmeup/
total 24
drwxr-xr-x  2 root root 4096 Jul 19 12:19 .
drwx------ 10 root root 4096 Jul 19 12:16 ..
-rw-r--r--  1 root root 1024 Jul 19 12:17 testfile_1k
-rw-r--r--  1 root root 4096 Jul 19 12:18 testfile_4k
-rw-r--r--  1 root root 5120 Jul 19 12:19 testfile_5k

$ mono /usr/lib/duplicati/Duplicati.CommandLine.exe test-filters /root/backmeup/
–backup-name=test --dbpath=/root/.config/Duplicati/88897383746870906568.sqlite
–encryption-module= --compression-module=zip --dblock-size=50mb
–retention-policy=“1W:1D,4W:1W,12M:1M” --no-encryption=true
–disable-module=console-password-input --skip-files-larger-than=1025
–console-log-level=profiling --console-log-filter=’+.Controller;+.TestFilterHandler;-*’
The operation TestFilters has started
Starting - Running TestFilters
Including source path: /root/backmeup/
Including path as no filters matched: /root/backmeup/testfile_1k
Including path as no filters matched: /root/backmeup/testfile_4k
Including file: /root/backmeup/testfile_1k (1.00 KB)
Including path as no filters matched: /root/backmeup/testfile_5k
Including file: /root/backmeup/testfile_4k (4.00 KB)
Including file: /root/backmeup/testfile_5k (5.00 KB)
Running TestFilters took 0:00:00:00.346
Matched 3 files (10.00 KB)

Odd!


#4

I can confirm that test-filters on my 2.0.3.9 Linux installation also doesn’t seem to recognize any --skip-files-larger-than rules.

I’m not sure I’ve got a 2.0.3.8 floating around to test with, but I’ll check.

Note that in my testing I added --skip-files-larger-than=1024 to a test job and now when I edit the options I can no longer see it, but it DOES appear in the “Export” -> “As Command-line…” list.

I suspect there’s an issue with the formatting of the parameter (like maybe it needs a b, k, M, G, T after the number) causing it to not be parsed correctly.


#5

I was wondering about units too, especially after seeing code:
return Library.Utility.Sizeparser.ParseSize(m_options[“skip-files-larger-than”], “mb”);
which I think will result in MB being the default without a suffix.

Commandline help size says you can use these, and I don’t think the case matters:

B - Bytes
kB - Kilobytes
MB - Megabytes
GB - Gigabytes

I was trying to test my theory, but hit a different issue so I’ll just post this note as-is.
If it turns out that the default unit here is truly MB, the Web UI needs to suffix bytes.
It doesn’t now, which means it may give a very different filtering than was intended.


"Unexpected difference in fileset" errors [2.0.3.9]
"A task was canceled" error Duplicati - 2.0.3.9_canary_2018-06-30
#6

@ts678 Excellent!

That is definitely it - the documentation can definitely be more clear on this!

$ mono /usr/lib/duplicati/Duplicati.CommandLine.exe test-filters /root/backmeup/ \
--backup-name=test --dbpath=/root/.config/Duplicati/88897383746870906568.sqlite \
--encryption-module= --compression-module=zip --dblock-size=50mb \
--retention-policy="1W:1D,4W:1W,12M:1M" --no-encryption=true \
--disable-module=console-password-input --skip-files-larger-than=1025B \
--console-log-level=profiling  --console-log-filter='+*.Controller*;+*.TestFilterHandler*;-*'
The operation TestFilters has started
Starting - Running TestFilters
Including source path: /root/backmeup/
Including path as no filters matched: /root/backmeup/testfile_1k
Including path as no filters matched: /root/backmeup/testfile_4k
Including file: /root/backmeup/testfile_1k (1.00 KB)
Including path as no filters matched: /root/backmeup/testfile_5k
Excluding file due to size: /root/backmeup/testfile_4k (4.00 KB)
Excluding file due to size: /root/backmeup/testfile_5k (5.00 KB)
Running TestFilters took 0:00:00:00.323
Matched 3 files (10.00 KB)

Thanks!


#7

Thanks for letting us know what worked for you!

I’m flagging this as a #bug, though I think it may be a regression as I seem to recall this existing then being fixed at some point in the past.


#8

@JonMikelV

I finished confirming that the GUI in both Configuration --> Edit and Advanced --> Commandline does not add a “B” suffix when you set the drop-down to “byte”. I couldn’t find (though didn’t rummage in old source) where the default was bytes, but to me it’s more natural (avoiding documentation need) for no suffix to mean no multiplier.

I’d also note that ParseSize sometimes looks like it uses KB not MB, though default KB seem speeds, not sizes.

I’m not sure how Duplicati manages changes that would disrupt current jobs and users, but this would be one…


#9

Ditto, GUI aside, since the smallest value is bytes I assumed no multiplier is assumed and no type needs to be given.

Thanks all for the help and confirmations!


#10

I can see the logic in that.
I tried to pick the multiplier that made most sense for each option, such that it would be “easiest” to use.

I see now that it makes it even less transparent what happens, as you cannot guess what that multiplier is.

For the GUI at least, I will fix it so it adds the B suffix to the values.
For the commandline, a change here would probably break existing backups, but maybe it is a good idea to deprecate “naked” numbers, and give a warning if size multipliers are not set explicitly.