Questions about BIG backups

cubatilles · April 26, 2018, 7:28am

Hi,

I’m new to Duplicati and still discovering how it works internally. Please excuse me if my questions are already covered in other posts, but searching through the forum didn’t find the anser to my questions.

My scenario: I’m backing up my data drive, which is about 6TB of data right now, with duplicati, to JottaCloud Unlimited account.

I’ve setup volume sizes to 1GB as even with 1GB volumes it will create about thousunds of volumes. I’ve started the backup job yesterday, and after several hous with the backup going on, I’m calculating it to complete in about 19 days. That’s… ok, I didn’t expect a short time, given the scenario. But this leds me to some questions:

-What will happen if/when some of the files which are being backed up change before the backup ends? Well I guess they will be backed up up as they are when Duplicati tries to backup that specified file, and incremental backuyps would do the job. But… here’s my second question:

-Lots of things can happen in 19 days. What if my computer has a hard reboot? When Duplicati starts again and tries to resume backup, some files will have changed against what was already backed up. How will Duplicati handle this situation? I mean, there will be tons of volumes alrteady packed and uploaded, and some files already backed up on those volumes will have changed in size and so, some will have been created or deleted and such. Will duplicati be smart enough to resume the backup job without having to discard already backed up and uploaded volumes, and without losing integrity?

-Finally, do you have any recommendations in my scenario? I have a fast computer (I7-6700k w 16GB RAM) and fast connection (1Gb down / 300Mbit up fibre connection). The data being backed up is consisting of more than a million of files, lots of the very small (few KB), and others very big (few GB).

Congratulations for this great opensource project, and thanks for your attention.

JonMikelV · April 26, 2018, 3:03pm

Hi @cubatilles, welcome to the forum!

While a 1GB “Upload volume size” / dblock setting should work just fine, you should understand the implications of it beyond “fewer destination files”. I’d suggest giving a quick read to the

When Duplicati checks those files again it will see the changes and back up just those changes. It shouldn’t be a problem at all.

Duplicati tries very hard to be resilient to things like this. Most likely when it runs again it will just start scanning files and uploading any changes it sees just like any other backup run.

Just note that some people get confused by they progress bar which first counts UP the number of files it finds (so everything in your source folders) then starts counting DOWN the number of files left to process (checking if there are any changes). If might look like it’s backing up everything all over again - but it’s not.

With that many files you may find that the local database gets a bit big and it may cause your jobs to run more slowly than you would see if you had the same amount of data but in fewer files.

There are some updates coming to help resolve this issue so if you can stand some potentially slow backups for a while and are willing to update when the changes are published you should be fine.

If you would rather not wait or don’t want to update to the latest canary versions when they come out, consider breaking your file counts into a few smaller backup jobs. You’ll have less deduplication benefits, but since each job will have it’s own database you won’t have to worry as much about performance issues in that area.

mikaitech · April 26, 2018, 7:02pm

I am just another user but I figured I would share my experience…
I have a backup running solely on the local network with backups stored locally. When I did the new initial backup that was 186GB, approximately 350,000 files, some of which were likely locked and open on employees desktops. That one took 18 hours, so approximately 10GB per hour. Again this was all local network and local storage, not online.
We ran another that was 602GB with 170,000 files (which almost no one has access to but administration, who would not be in). It was started on a Friday evening after business hours (6PM), and finished Sunday morning at 1AM, so 31 hours.
So for 6TB, you’re looking at 10 days to 3 weeks timeframe for a full single backup. Bonus is that subsequent checks and backups should only take a few hours since everything after the initial is incremental.

JonMikelV · April 27, 2018, 3:55am

Thanks for doing that!

Can you confirm what version of Duplicati you are using (some performance updates are in the latest canary) and, if you’re willing, some machine specs like CPU cores, RAM, OS, and if you were using spinning disks or SSDs?

cubatilles · April 27, 2018, 10:41am

Thank you all for your explanations and real-life cases. I think I’m gonna let duplicati finish the initial backup (which will take about 21 days on my calculations) and doing incremental backups every day or two.

I’m on beta standard build, and I will not change to canary even if my backup is low processing too much files. At the end, it’s ok to me if restoration will be possible even if it takes a lot of time.

Thanks!

mikaitech · April 27, 2018, 1:21pm

In my case the setup is:

duplicati-2.0.3.6_canary_2018-04-23-x64
Although first initial backups were done via 2.0.3.4-canary

Testing/development server since it is an older low end server:
Windows Server 2008 R2 SP1 x64
Xeon E5405 x2 (8 cores @ 2.0GHz)
32GB RAM

Originating files live on local network servers using a bunch of large SSDs (1.6TB each) spread into large RAID10 arrays, connected to the network via 10Gb and 1Gb connections.
Duplicati backups are handled on the above mentioned server and written locally to a local partition (4x2TB enterprise 7200RPM drives in RAID5, so 6TB of space).
Lowest/slowest denominator would be the 1Gb Ethernet connection.

drakar2007 · April 27, 2018, 2:06pm

My recommendation in general, without knowing the exact details of your total backup set, would be to consider 2 revisions to your approach:

Consider splitting your backup job out into 2 or more separate backup sets; for example, very large (i.e. media/movie/music) files are likely to take up loads of space but not change very often, so certain backup job settings could be optimized for these, versus a separate backup job consisting of small / often-changed / dynamic files.
It might be too late for this approach for you, but when initially backing up my ~1TB of data to B2, I started with a smaller subset, got that backup completed, added some more, got that one completed, etc. This mainly benefitted my peace-of-mind in knowing that I had certain blocks of data backed up 100% before moving on to others, and it can be done in priority of importance (hardest-to-replace stuff first like personal pictures, etc, then onto large but technically-replaceable stuff like common media).