I just started using Duplicati’s new version (Canary) and of course it’s awesome. In my testing I made it a service so I could test the shadow copy features, again…AWESOME!
I wanted to know though…how in the heck do the MS SQL backups work…really? Someone else posted this once but I never got a technical answer.
1 - Does it dump a full db dump (for giggles lets say 100 GB) for the first backup?
2 - Does it dump an incremental backup based on the log file differential?
3 - Does it use VSS and snapshots to do the differential of whatever and so after the 100 GB does it just upload the differential…say, 1 GB to your backup store?
I haven’t played around with this yet but I have a sneaky suspicion that Duplicati basically dumps the entire db then takes snap shots BUT do I have to take a full db backup again sometime…then more snapshots? What’s the schedule and how does it work?
I’m not positive on this (perhaps @kenkendk or @kees-z can confirm) but I expect Duplicati is using VSS snapshots to get access to the raw mdf and ldf files and backing them up essentially treating them like any other file. I don’t believe it’s creating an actual .bak file like you’d get from running a backup from within MS SQL.
If that’s the case, then restoring from one of these backups would likely be more along the lines of attaching an existing database than importing from a .bak file.
If you ever need to restore your database and it has to be done quickly, you might be better served by using normal MS SQL backup jobs to create .bak files, then using Duplicati as a “safety layer” to backup the .bak files. This would allow for faster restores by importing an already existing local backup but still provide for Duplicati handled offsite storage in case of total server disaster (in which case you’ll probably have other things to deal with while Duplicati is restoring the .bak file which can then be restored to MS SQL).
Please note that this is just my personal opinion which should probably not be considered “the right way to do things”.
Hi Mike - .bak method is what I’m doing now but there’s no rdiff tech from rsync in a clean and meaningful way. I’ve tried using cygwin’s rsync on Windows then gzip’ing the .bak files with the --rsyncable option and rsync derps. It doesn’t see any diff. So…I’m hoping the rdiff tech in Duplicati will see the differentials in things like the OS snapshots, SQL snap shots, etc.
I’ll find out by trial here with SQL is going to do. If it dumps everything or just parts with rdiff tech.
Thanks for your feedback. Your thoughts on the .bak process are a viable option also.
As long as the alignment of the blocks doesn’t shift too much with each .bak file I expect it will work as expected, though you might want to keep a tight reign on the “keep this many versions” count so you aren’t storing so much history that things get slow.
I have rather large DB’s…about 40 different files, the largest is about 200 GB, the rest totaling about 120 GB’s. So…320 GB’s of .bak files.
I’m favoring about a 4 GB chunk on each chunk for backup, the default is 50 MB but I read through the file blobbing document they have about it and I think my bandwidth is solid enough to handle it. So far so good.
I think with ANY backup software the first one takes a LONG time (potentially) and this seems to also be the case. The initial OS snap shot of a 66 GB OS took like 13 hours, it compressed on the disk down to 49 GB and I was pleased with that.
The MSSQL one has been running since I posted this (roughly) and still has 100 GB left to go AND that is backing up over the LAN > server share that is sharing out a folder to a USB disk which I’ll sneaker net (drive it over) to my office’s NAS then manually copy to the destination mount point and folder. I’ll then switch the target from the local network share to ssh and the mount point with folder path and we’ll see how she goes!
I think I’ll let it do 2 backups (this one and another tomorrow) and see what I get for total size. Again I suspect the rdiff tech will only copy the diff and with it being the holiday the volume on the DB is low for my client.
Thanks again and I’ll update this when I get more results.
@JonMikelV has answered the question. Basically Duplicati will use AlphaVSS to query the SQL VSS writer to find out what databases are present on the system. It then treats them like a normal file and backs them up. They can be restored either through the SSMS (Management Studio) or through traditional SQL restore commands. Basically Duplicati will create consistent copies of the database files (if recovery mode is simple) or database and logs (for other recovery modes).
For speed rather than changing the dblock size, you’ll find that changing the hash blocksize to something like 500kb and changing the hashing algorithm to something less taxing on your CPU. You may want to do a number of test runs to find what works best for the size of databases that you’re trying to backup.
Also the backup speed doesn’t get better after the first initial run. And please make sure you set the “disable-filetime-check” option. Mounted databases will not change their time stamps until they are dismounted, so Duplicati will ignore them as by default it identifies changed files based on their time stamps. Setting this option will mean that Duplicati will hash the files anyways.
My best practices around backing up MS SQL Databases
- Backup SQL databases in a separate job (don’t combine SQL and file system data in a single job)
- Change the advanced options on the job to change the blocksize to something larger from the default, if your databases are over 20GB in size.
- Change the advanced option on the job to enable “disable-filetime-check”
- Pre-backup script to dump the DBs into flat files in a folder (back this up as well as it will de-duplicated)
- Do periodic recovery testing, I perform my restore testing once every 2-3 months.
The above should serve you well for your SQL backups. Good luck and thanks for using Duplicati.