Backup over VPN with low transfer speeds

whitenack · July 26, 2018, 12:31pm

Hi all,

tl:dr… When running the backup over a VPN, should I run Duplicati on the source or destination?

I need a little understanding of the backup process so I can figure out how to set up my backups. As I understand it, Duplicati scans the prior backup files and then scans the source files and folders to find ones that are new or have been edited, then backs up those files.

I have two machines connected via VPN. The internet speeds between the two are slow and I have a lot to backup. I know I can seed the initial backup by physically transporting an external HD between the two machines. What I want to know is how best to run the ongoing backups throughout the day.

What is best… Is it best to run the backup on the source machine, which puts the scanning of updated files on the same side as the data to be backed up but has to read the prior backup info across the slow VPN tunnel? Or is it best to run the backup on the destination machine, which has quick access to the prior backup information but has to scan the new files across the VPN? Obviously the backup has to travel across the VPN in both scenarios, but I’m trying to figure out if it matters on the rest of the backup process.

Thanks,

ts678 · July 28, 2018, 4:19pm

For a little understanding, please see How the backup process works. Knowing a bit about the three internal file types, you can then see what goes over your link by using web UI Reporting->Show log, which also shows sizes.

For an experiment, you could probably do a listing from the remote of all your local files, to see how slow it is. That time (which probably depends on file count) can then be compared to an estimate of backup time which depends on new file size or change amount within existing files (after Duplicati’s block algorithm) tries to help.

whitenack · July 28, 2018, 9:51pm

Thanks for the reply ts678.

How do I do a listing of all the files?

Just to experiment, I set up a backup from the remote and began the backup. It has been running for 5 hours now and the status bar says it is still just Counting (16500 files found, 8.54 GB). I believe this is going to be too slow to be practical. I will have to figure out a different way.

Pectojin · July 28, 2018, 10:20pm

Yes

It actually doesn’t because there will be a local record of what data is on the remote site

Very bad idea. This will give terrible performance. Any “changed” file has to be hashed, which means Duplicati has to then download the entire file to read it. And sometimes the file hasn’t even changed but Duplicati double checks (if for example some metadata was updated).

I always advice to keep Duplicati on the source machine unless you have very specific use-cases or if it’s simply not possible (e.g. data on a “dumb” network share from a router or something)

ts678 · July 29, 2018, 12:11am

TL;DR Argument for running in source sounds convincing, but you can do tests if you want more convincing.

Thanks to @Pectojin for the more experienced answer. I was worrying about get/update/put updates of the internal format files across the VPN. Still wondering how much unexpected slowdown a “compact” might take.

One test I did was to lengthen a tiny (now 19 byte) file to see what “get” and “put” operations went in the log.
Here, I got two 1KB dblock and two 1KB dindex files sent over. It would have been easier to just copy the file, however this is quite a special case, e.g. one of the “very specific use-cases” that Pectojin mentioned earlier.

I also suspect remote scan for source changes may be slow. To approximate that in normal commands, open cmd.exe (not PowerShell) and see how long “dir /s” takes. You might also compare that to a “tree” command.

If you’re feeling very ambitious and especially if Linux is involved, rsync can send file deltas over efficiently to the remote (then you can back that up). Its drawback is that (like the tree walk earlier), the two sides have to compare notes (e.g. on timestamps) to decide whether some update needs to be sent across, whereas in the Duplicati design there’s local information to compare to files. The test I described needed no file “gets” at all.

Pectojin · July 29, 2018, 12:17am

Compacting does not depend on local files and actually doesn’t depend on remote files either… Unless a volume needs to be compacted

In the case that volumes need to be compacted they’ll be downloaded and re-bundled into fewer volumes.

This would be quicker on the remote system, but it’s not usually a difficult task, as compacting will usually only affect a small number of volumes (whatever small changes were “removed” by for example retention policy)

ts678 · July 29, 2018, 12:38pm

To verify context and review, the assumptions so far might be:

Backup resides remotely, and third-party host isn’t wanted.
If source disaster occurs, remote backup gets carried back.
This is needed either way, but configuration change will vary.
No other requirements of remote machine but to host backup.
Please ignore my rsync note if source files aren’t replicated.
If they are replicated, then there is much more to consider.

Picture I’m assuming:

1 Source file machine
2 Destination machine
D Duplicati
B Backend backup data

Scenario 1
1 D B <–> 2 shows initial backup, to carry to 2, to continue.
1 D <–> 2 B shows D scanning 1, and backing up to B remotely.

Scenario 2
1 <–> 2 D B shows D scanning remotely to update local backup.
The view is that this is likely the slower of the two options.

Backup over VPN with low transfer *speeds*

Backup over VPN with low transfer speeds