[solved] Improve Performance for unchanged files

Hi Forum,

i am new to Duplicati, great software, thanks to all delveloper !!!

Here my Question:

I have 1 TB of Data, mainly large files. I ran a Backup to WebDAV share on the same network, which took more than 48h which is ok for me.

Now I run a second backup with only 2 Files (2x2,5G) added. This Backup is going to take aprox. 24h much to slow for my needs. The CPU load is at about 95%

Can someone

  • help me to say if that is normal or if there is something going wrong?
  • give me some tips to tune things?

This is what i have already done:

  • Increased blocksize to 2MB
  • Increased dblock size to 1GB
  • changed file-hash-algorithm to MD5 (not sure if this may be counterproductive)
  • set zip-compression-method to “none”
  • turned encryption off

Is there a way to make duplicati skip the hashing if the metadata (mtime, filesize) is unchanged? (the data in that fileset is not so important so that i perhaps do not need all the safty duplicati can bring).

I use Duplicati 2.0.2.1 on Debian Jessie, Celeron CPU J1900 @ 1.99GHz

I paste the exported config below.

Thank you for any advice.
Gerd

------ config ------
{
“CreatedByVersion”: “2.0.2.1”,
“Schedul{
“CreatedByVersion”: “2.0.2.1”,e”: {
“ID”: 1,
“Tags”: [
“ID=3”
],
“Time”: “2018-01-25T20:00:00Z”,
“Repeat”: “1D”,
“LastRun”: “2018-01-24T01:00:00Z”,
“Rule”: “”,
“AllowedDays”: null
},
“Backup”: {
“ID”: “3”,
“Name”: “medienunenc”,
“Tags”: [],
“TargetURL”: “webdav://192.168.X.X:8080//duplicati/light/medien?auth-username=XXXXXXX&auth-password=XXXXXXXXX”,
“DBPath”: “/root/.config/Duplicati/QKSQLICPJU.sqlite”,
“Sources”: [
"/mnt/data/shares/medien/"
],
“Settings”: [
{
“Filter”: “”,
“Name”: “encryption-module”,
“Value”: “”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “compression-module”,
“Value”: “zip”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “dblock-size”,
“Value”: “1GB”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “keep-time”,
“Value”: “6M”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–no-encryption”,
“Value”: “true”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–blocksize”,
“Value”: “2MB”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–tempdir”,
“Value”: “/mnt/tmp2”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–file-hash-algorithm”,
“Value”: “MD5”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–zip-compression-method”,
“Value”: “None”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–run-script-before-required”,
“Value”: “/usr/local/bin/weck-nas.sh”,
“Argument”: null
},
{
“Filter”: “”,
“Name”: “–run-script-timeout”,
“Value”: “200s”,
“Argument”: null
}
],
“Filters”: [
{
“Order”: 0,
“Include”: false,
“Expression”: “/mnt/data/tmp2/”
},
{
“Order”: 1,
“Include”: false,
“Expression”: “/mnt/data/shares/medien/stuffnotinbackup/”
},
{
“Order”: 2,
“Include”: false,
“Expression”: “**/.Trash-1000*/”
}
],
“Metadata”: {
“LastErrorDate”: “20180124T123703Z”,
“LastErrorMessage”: “The remote server returned an error: (401) Unauthorized.”,
“LastDuration”: “00:06:02.9563580”,
“LastStarted”: “20180124T124027Z”,
“LastFinished”: “20180124T124629Z”,
“LastBackupDate”: “20180124T124341Z”,
“BackupListCount”: “2”,
“TotalQuotaSpace”: “0”,
“FreeQuotaSpace”: “0”,
“AssignedQuotaSpace”: “-1”,
“TargetFilesSize”: “1185487153487”,
“TargetFilesCount”: “2214”,
“TargetSizeString”: “1,08 TB”,
“SourceFilesSize”: “1205697367501”,
“SourceFilesCount”: “61513”,
“SourceSizeString”: “1,10 TB”,
“LastBackupStarted”: “20180124T124027Z”,
“LastBackupFinished”: “20180124T124629Z”
},
“IsTemporary”: false
},
“DisplayNames”: {
"/mnt/data/shares/medien/": “medien”
}
}

I use Duplicati 2.0.2.1 on Debian Jessie

Where are you getting this information from?

Hi drakar, it is running for about 12h now and it is half done …

I have forgotten to say that the size of the (yet) uploaded files corresponds with the size of the added files. Duplicati does not upload everything a second time, so that is ok.

Gerd

OK. I was just going to add, if you have 1TB of already backed-up files, and add (for example) 10GB of new files, Duplicati has a tendency to do the upload for the newly-added files first and during that time, its total “remaining” will NOT account for already-completed data (this has been one of my criticisms since I was new here). Therefore you can not usually extrapolate how long a backup will take based on the currently running backup job, unless you start actually timing uploaded dblocks via the Information log view.

It’s possible in your case that your changed settings are causing Duplicati to remove old dblocks and upload new ones in compliance with your new settings - i’m not 100% sure whether it would always do this, but you should check.

For your reference, I have a ~800GB backup set on Backblaze B2, and when my regular scheduled Duplicati backup job runs with no / minimal changes, it takes less than 10 minutes.

No, thats not the case, i checked on the target, only 4 dblocks are changed. And i did not change anything in the config between 1st and 2nd backup…

Thanks for your numbers. That tells me that there must be somthing wrong with my setup.
Gerd

1 Like

Hi Gerd,

I started using Duplicati this week. It is really great software, thanks to all developer and people supporting this project!

Here my backup times:

My first successful backup saves 270GB data - many ISO 2-4GB images (about 100GB) and about 65000 smaller files. Source machine is a Laptop with files saved on SSD disk, CPU Intel i5-7300, 2.60 GHz. Installed is RHEL 7.4, Duplicati 2.0.2.1_beta_2017-08-01.
Duplicati config - “No encryption”, “No compresseion” (but data are comressed??), MD5 for files, 100MB block size. Target is QNAP NAS (RAID5 SATA disks), Network is 1GB on same switch. Transport protocol is WebDav, no SSL.

This backup takes less than 4h, with about 74GB/h. It is a great speed for my usage.

Now the second backup - there were 2 ISO images added with about 5-6GB.
This takes about 13 minutes. The CPU was 30-40% idle.

My absolutely first try was to backup the 270GB using FTP, with encryption on the NAS. It was very very slow, and I have stopped it after 32 hours (remaining 100GB). The CPU was 80-90% used, and was 75-77 C° hot :slight_smile:

Here the results (in the GUI) from the second backup:

image

image

image

Regards
Fred

I did some more testing and i think i have found my error:

I may have changed ACLs recursively between 1st and 2nd run and forgott about it. That seems to cause Duplicati to check the hashes of all dblocks. It finds out that they are all unchanged and it does not upload them again, but the checking take its time.

Yours
Gerd

The files with hashes for names you’re seeing with “No encryption” and “No compression” are blocks. They should be 100KB in size unless you changed the blocksize argument on your backup.

These blocks are just your files split into smaller pieces which allows for deduplication within volumes (folders) and allows for storing files larger than the max volume size (default 50MB) in multiple volumes.

There’s an article that explain it here:

If you like there’s an option to skip all other validation and only checking filetimes, check-filetime-only

This flag instructs Duplicati to not look at metadata or filesize when deciding to scan a file for changes. Use this option if you have a large number of files and notice that the scanning takes a long time with unmodified files.