How to backup the backup?

ts678 · September 11, 2019, 6:56pm

I’m not sure where this is heading. The main backup may not be OK. The secondary may not be OK because it may pick up a bad main backup (this can be covered by versioning), or get a direct attack.

Orchestration of the two stage backup is an issue. If security were not a concern, you might consider having third party duplicati-client drive two copies of Duplicati, but having Duplicati get remote control exposes it to attacks on its web server (which is not hardened) or just by attacker stealing credentials. Reducing risk could be done at the Duplicati level using its IP controls, but external firewalling is safer.

Another thing you can do is to use –upload-verification-file and utility-scripts/DuplicatiVerify.py to check backup integrity before doing the secondary backup. This might also find non-security integrity issues.

A simple attack would do something like encrypt original source files. Next step up might try to get the backup by getting access credentials from Duplicati (or whatever). If a tool can get there, and attacker attacks the tool, then attacker can get there. Defense must be server-side, e.g. using files immutable normally, and with ability to alter that policy well-protected, and without the credentials on uploader PC.

It really depends on how far you want to go. Ordinary ransomware is probably easiest to stop. Lengthy assault by skilled attackers in the computers are harder, as they will try to move through your systems. Unless your data is very high-value, I’d worry more about the simpler forms of attacks, but it’s your call.

Offline backups may be an option, but for online all I can say is make sure remote access can’t destroy backups. Don’t even give the uploading systems (which may be totally compromised) a way to do that.

Better still would be to restrict the web interface to localhost, ssh to remote, and browse to localhost GUI. Using HTTPS protects against eavesdropping, and gives you some assurance you’re on the correct site, otherwise someone could steal credentials via MITM. HTTPS doesn’t block attacks on the web server or even simple password guessing. Something like SSH is more hardened, and has attack mitigation tools, however mitigating rapid password guessing can sometimes leave you open to denial-of-service attacks. Also note the earlier recommendation to not leave the web server accessible. Best to firewall, if possible. Better still to firewall SSH if possible, and stick to localhost. Depends on how seriously you want security. There have been several forum users who worry about specific crypto algorithms of SSH they find weak. Security can have weak spots, so please keep overall system view in mind, and use layers of protection.

EDIT: “ssh to remote” refers to port forwarding. Basically create an encrypted tunnel to do your browsing.

emedeiros · September 11, 2019, 9:54pm

Firewall only allows access from one single IP.

And I’ll take a look at the options you suggested. Thank you!

Sami_Lehtinen · September 12, 2019, 10:29am

I do use rsync, robocopy and rclone to backup the backups + backup directory snapshotting with versioning. But due to serious Duplicati reliability issues I would recommend running separate backup sets, instead of copying the possibly non-restorable backups to multiple destinations. At least in that case there’s possibility that you can restore from another source, when the restore fails with one set. - And do test your backups with full restore. The duplicati test option unfortunately isn’t reliable.

Edit: Repair getting rewritten is great news. Let’s hope it finally ends the extremely serious non-restorable backup issues. I’m sure the issue isn’t technically big, but the ramifications of non-restorable backups are just huge.

I can be sure that I can restore that backup using Duplicati on a third computer without any problems?

No you can’t, unless you test it often. And if you do, you’ll find out that it works mostly, but at times it doesn’t. See my post linked above.

Edit:
Database recreate - It can be slow, sometimes finds issues, but it exists.

Yeah, it can take weeks and fail after that.

Sami_Lehtinen · September 12, 2019, 10:59am

The main point of MY observations and questions here is based on the possibility of advanced ransomware, or someone with remote access, attacking the destination of the backups of the infected computer.

This is exactly why I’ve been asking for remote compact. It would allow the backup clients to work with create only access and without modify / delete permissions. Compaction would be done by secondary system with delete / modify access. Of course it would also reduce bandwidth needed, because there’s no need to download & re-upload data when compacting. But the primary aspect for that was security considerations. Anything which has been uploaded, can’t get deleted.

Of course you can do this also by having versioned backups of the backups. But the way above, would be slightly more elegant.

drwtsn32 · September 12, 2019, 2:49pm

Interesting idea, but it would be a big design change. I think one of Duplicati’s strengths (and in some situations a weakness) is that the back end where data is stored is just dumb storage. There is no remote engine at all.

If there WERE a remote engine, then Duplicati could talk to it instead of the remote storage. The remote engine would be responsible for writing data, handling compaction, pruning, etc. This is more along the design of CrashPlan. There are advantages and disadvantages to it.

I prefer Duplicati the way it is now - client software + dumb remote storage. I’m not having the issues you describe in that other thread, so maybe that is why I’m more confident in its current design.

ts678 · September 12, 2019, 3:27pm

and even better but worse for performance is to do it to a different system, or use –no-local-blocks=true.

I said some of this a different way:

and I suppose I should also add that at least one of those ways shouldn’t involve “beta” quality software. Newest Canary (not yet next Beta) has fixed quite a number of integrity bugs, BTW, but has little testing.

Can be said for security of backup in other ways, not just from attack, except with attacks, making them work through a chain of defenses is good because any step might stop them. For regular use, a chain is unwanted because any step might stop YOU. This unfortunately sounds like it’s leading into a tradeoff…

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747 fix helps speed issue but again is not yet in a Beta. There’s a conflicting wish between fast Betas vs. more fixes…

Sami_Lehtinen · September 12, 2019, 4:05pm

I didn’t mean having a remote engine. I’ve actually written about this in some post earlier.

But from my old perspective of software product manager. Sure, these both options easily add lof of complexity and things that could go wrong. And therefore, it’s totally pointless to address this kind of “nice to have” features, before core features are rock solid.

One way of doing that would be “logging the changes”, either the client could send file which tells what to compact and how, or in case of “remote / local alternate compact” the log would contain information what has been compacted and how to update the local database accordingly. - But that’s just adding more complexity and making things more brittle for a small optimization / security improvement.

Have you tried restoring your database from the storage without local database? If you haven’t done that, you might now know that you’ve got a bomb waiting for you. - But enough about that. It’ll get fixed, when it’s get fixed. And as I’ve said, it’s probably from technical perspective small issue, which could and should get fixed in several stages of the processing.

But if we talk about this thread, snapshotting the backups and backing up the backups (with duplicati or alternate versioning software) both work, without creating the complexity of providing write only access.

Just wanted to warn people about the fact, that making backups of the backups, still might leave you with non-restorabe backup situation.

drwtsn32 · September 12, 2019, 4:57pm

Yep, I have done test restores multiple times, with and without local databases. I have also run through database recreation process on all my machines as a sanity check. Most recently I started doing full database recreations just this past week.

For some reason one of my machines it is not able to recreate the database using just dlist and dindex files, and says it needs to download almost 500 dblock files. Each block takes 15 minutes or so to process so it takes a number of days to fully recreate. Seems to point to some sort of bug. I’m not familiar enough with that part of Duplicati’s architecture to be able to troubleshoot it myself though.

If I knew what I know now, I’d use larger dedupe blocks on these large backups. 100KB blocks on a backup that has 800GB source data creates a LOT of blocks. And I know many here have even larger backup sets.

emedeiros · September 13, 2019, 3:17am

I just performed a test in Windows: there was a Source folder, a Destination folder (this would be where the source would be backed up), and a Remote folder (this would be the second location, where Destination would copy the Duplicati files to). I downloaded 3 installation files around 30 MB and put them on the desktop. Then I created a task, to backup from Source to Destination, with volume size of 3 MB, set to keep only 1 backup, and with the “–upload-verification-file” option.

The first run had 1 file on the source, and ended up with 24 files on the destination, that were copied to Remote.

The second run had 2 files on the source, and ended up with 48 files on the destination, that were copied to Remote, that ended up with 49 files (23 overwritten).

The third run had 3 files on the source, and ended up with 72 files on the destination, that were copied to Remote, that ended up with 74 files (47 overwritten).

The fourth run had 2 files on the source, and ended up with 50 files on the destination, that were copied to Remote, that ended up with 77 files (47 overwritten).

The fifth run had 1 file on the source, and ended up with 26 files on the destination, that were copied to Remote, that ended up with 80 files (23 overwritten).

The I used the “Restore” function of Duplicati (not from inside the task) to analise Remote, and I could see all the 5 versions. First I restored the third one, with three files, and they seem to be OK: I started to run the installation programs, but didn’t follow through. Then I restored the most recent one, with one file, and got a warning, “Failed to apply metadata to file:”, but the file mentioned was the destination folder for the restored file; the restored file was there, and seemed to be OK, I also started to run the program.

Can I conclude that it can be done like this, even if it is messy? To have unnecessary files on the remote destination (the backup of the backups) is a small price to pay, and maybe I can find a way of using the JSON file you showed me (–upload-verification-file) to make a clean up. But should I also think of the possibility of the original Duplicati files being tampered with, instead of just deleted?

P.S.: I didn’t make it clear: I was simulating a scenario where the Duplicati files, the original daily backup, were copied to a third location on a daily basis, adding and overwriting files on the third location, but never deleting anything (unless there was a good, safe method of isolating and deleting the unnecessary, old files, maybe using the JSON file you mentioned).

Sami_Lehtinen · September 13, 2019, 8:01am

Yes, I’m using de-dupe blocks of 64 MB and dblocks of 4 GB for many backups which are mostly static and huge.

Yeah, that’s the way it works, in disaster recovery situation, and that’s something everyone should test for. Other test are more or less cheating with the testing and not doing it properly.

If it works. And in my post you’ll notice that doing the test once, isn’t nearly good enough. In my cases there are thousands of backup runs, and after that some of the backups are slightly corrupted but still restorable, and a few are completely destroyed.

In my case, I’ll also automatically test the files after restore for data integrity and data freshness, which is something the Duplicati doesn’t of course test for. Using the database validation tools and some scripts to check for key data files content.

drwtsn32 · September 13, 2019, 1:35pm

I don’t think this is a supported workflow, so would it be reliable? I don’t know enough about the inner workings of Duplicati to know for sure.

Also I don’t know if it would ultimately help in a ransomware situation (where your duplicati backup on the local computer get encrypted) unless you are only syncing NEW files to your remote destination, not changed files. Then again maybe that doesn’t happen… someone pointed out elsewhere that duplicati doesn’t change existing files - it only deletes and creates new ones. So if a file is CHANGED it’s not something you would want to sync. Not sure.

Anyway your setup would make me nervous!

emedeiros · September 13, 2019, 4:27pm

Yes, I know! I’m not sure about that at all.

emedeiros · September 13, 2019, 4:31pm

Well, if the ransomware on the user’s computer deleted its Duplicati backups on the server, there would be no problem. If it somehow changed the backup files and renamed the files in the process, there would be no problem either (I would only have a lot of crap on the third location, along with the good files from the day before). But if it tampered the backup files but keeping their original names, then it would destroy my remote backups as well.

emedeiros · September 13, 2019, 4:44pm

What about this alternative system I’m testing? The users’ computers sync selected folders and files with my server: files on the server are created, overwritten or deleted, in a total mirror sync. Then the actual daily backup, using Duplicati installed on the server, is created, using the directory containing the synced users’ folders as the source, and using a directory (not accessible by the users’ computers) on the same server as the destination, and then another identical backup is ran on the same server, but having a remote server as the destination (S3, a server on another branch etc…).

This way, if users’ files are deleted or damaged by ransomware, they will be synced to the server (the ransomware doesn’t even need to be “smart” and target the backups, right?), and the server will make a Duplicati backup of them (or their lack of files), but they will be just files on the latest version, according to the retention rules I specify.

I don’t know if I had an awesome original idea (even if I’m not the first one to have it and I’m just not well informed), or it I’m doing something stupid or unnecessary.

P.S.: A downside is the extra work of keeping track if the syncing part is working properly.

drwtsn32 · September 13, 2019, 4:49pm

I have seen ransomware that encrypts files without changing the filenames. More often it seems to add a suffix, but not always.

emedeiros · September 13, 2019, 4:50pm

Yeah. I don’t think it would be viable.

drwtsn32 · September 13, 2019, 4:55pm

It’s not stupid, you aren’t the first to talk about backups of backups! No one wants something to happen that destroys both your main data AND the backup.

Several of us are doing backups of backups using a variety of methods. How far you go depends on your level of paranoia

emedeiros · September 13, 2019, 5:18pm

But in this case I don’t even think of this experimental philosophy as “backup up of a backup”: I don’t consider the syncing part as a backup, although if some computer just crashes or catches fire I can consider the mirrored files as a backup.

And in one scenario I use Nextcloud for the syncing part, that has recycle bin and versioning that the users and their computers just cannot access (cannot be tampered with using the NC client app’s credentials). Even if the users accidentally screw up with new files between Duplicati backup tasks, NC can help them.

The fact that I can use Duplicati on the server to make all the actual backups, without all that hassle we’ve been discussing here, is awesome.

ts678 · September 13, 2019, 8:12pm

It’s definitely non-standard, probably not tested, not guaranteed to be supported, and may have problems, however there’s also a chance it will work – probably not as well as your test, and depending on damage.

Repair command deletes remote files for new backups #3416 is an example where unnecessary files got deleted inadvertently – they were actually necessary but Duplicati’s old DB didn’t know them, so deleted… Your scheme would actually help in this case, but I gave it more as an example of an odd-usage accident.

Your test is also over-simplified. In real life, compact would pack still-in use blocks into new dblock files so partially-filled ones (with mostly deleted blocks) could be deleted. This can happen over and over, but your scheme gets hugely redundant by possibly keeping many copies of a still-in-use block in multiple dblocks.

Block-based storage engine
How the backup process works
How the restore process works

Keeping multiple copies of every data or metadata block ever seen in theory gives you all the bytes you’d need to put everything back together, but it’s not something you want to do by hand. At the default 100 KB block size, a 1 TB backup is like a 10 million piece jigsaw puzzle, without any guidance on how to rebuild, assuming your DB and dlist were lost. My question would be – how much do redundant blocks help you?

Redundant backups (especially if made different ways) are a much more friendly format than data blocks.

A supported way of wasting space by keeping deleted blocks around is –no-auto-compact=true, which at least won’t make multiple copies of active blocks over and over, and I “think” may even bring wasted ones back into active service if need be. Of course encrypting ransomware can wipe it all out. More said below.

That would actually be a lot like the @Sami_Lehtinen idea of “remote compact” but with a different split, where the cleanup of Remote (2nd level backup) would be deletions, and 1st level does new file additions.
You might need to find a storage system with that level of control, if you’re going to rely on access control.

Most ransomware tampers. The whole idea is to to be able to recover the files after the ransom gets paid.

Ransomware describes some methods. Intent matters. Wiper (malware) intends to destroy, not get paid.

How insurance companies are fueling a rise in ransomware attacks

Ransomware that became known to not follow through with file restore would soon be getting no ransom.

Regarding the newer ideas, one downside is that some sync programs don’t handle locked files, whereas Duplicati (with help from VSS) does better. Other than that, you’re adding some distance so the amount of damage ransomeware on a source computer can do is limited, assuming all it can do is corrupt files that have multple versions of backups. Make sure you don’t have login credentials sitting on source computers.

emedeiros · September 13, 2019, 8:37pm

In one case I’m using SyncBackPro, that handles VSS. In the other I’m using Nextcloud: there are no tasks to be scheduled, so as soon as the users close a document, it gets synced. The only computer holding some credentials have them inside encrypted files, but they aren’t even the relevant ones to this subject.

Another problem with Nextcloud is the need of a second tool to backup some extra items (that would be backed up along with the rest if Duplicati was used instead, or in the other case using SyncBackPro), like browser favorites files (instead of syncing the browsers profile folders), PST files that I keep out of Nextcloud folders etc…I’m still using Cobian Backup for these menial help tasks, with the destination being a folder that is synced by Nextcloud (and then sent to the server). By the way: my next step is to fix this situation of huge PST files that users insist in keeping, even if they have Office 365 Exchange accounts, and I’m proposing to start to make backups from Office 365 (would move the contents of the PST to the available mail storage, and there would be backup).