How to backup the backup?

Don’t ever mix backups from different computers together in the same folder unless you use the –prefix option to make them distinguishable. Mixing old backup files from the same computer is totally standard provided Duplicati does the mix. In the example I gave, the 106 files after second backup sit side by side.

Losing terminology trail again. rsync is not a backup, so incremental and differential in a backup sense doesn’t exist. Mirror is a mirror. If retention settings delete a backup version, its blocks are now wasted space (EDIT: provided nothing that remains uses a given block), and that waste is reduced by periodic compact if available. Regardless, if original backup is OK, then mirror of original backup is OK. As noted, this leaves open the possibility of ransomware attack on files from first backup, but more sophisticated backup would still backup the damage to second backup provided that timing landed so that that could happen. The difference is that (assuming second backup wasn’t itself wiped out by ransomware), an earlier version of first-level backup could be restored, to get original source files. As mentioned, this is cumbersome and much slower, but you can do it if you wish.

I’m not following the retention settings concern at all. Can you explain the processing you have in mind, assuming you disagree with the processing I just described (all Duplicati process + mirror everything)?

What is the operating system on your second backup target (computer C)? If it supports snapshots that may be your best bet. Ransomware running remotely cannot affect snapshots.

Alternatively, if it’s an operating system that supports filesystem level deduplication (Windows Server, for instance), then maybe you can even get clever and do something more like your original idea where you’d sync the entire Duplicati backup folder on computer “B” to a different folder each day on computer “C”.

This is probably only workable on a LAN unless you have a super fast internet connection. Each daily folder will be quite large, BUT with filesystem level dedupe you will mitigate that issue.

Either way, if your computer “B” backup gets corrupt and it gets copied to computer “C”, you can either roll back to a previous snapshot or just look at an earlier dated folder.

You can copy and paste using Windows’ File Explorer and have a backup. You can also use rsync with the proper options to create incremental or differential backups on the destination, comparing the files from previous tasks (you need a script for that actually).

But the only single point for me is that original backup may not be OK. I’m not talking about the original backup containing versions of files that were encrypted by ransomware on the source computer (or lacking files that were just deleted maliciously), but about the original Duplicati backup files being wiped out or even encrypted by ransomware from the remote computer.

I understand that you are trying to help me (and you are helping me indeed), but the point is not doing something cumbersome and slow, and that would increase the cost.

In two environments I’m trying this model: mirroring the files on the users’ computers to a local NAS, and doing the actual daily backups with Duplicati at night, inside the NAS, inside the same pool, and also a secondary daily backup, also with Duplicati, to a remote location. In one case I’m using Nextcloud (the server is on the NAS), in the other I’m using SyncBackPro (to a SFTP server on the NAS). Nextcloud is good in syncing changes automatically to its server, but I’m still working on a way of checking if clients are connecting to the server properly (I will use the “lastseen” command on a script) and if they are not tampering the settings somehow.

P.S.: I forgot to say: on that same environment with Nextcloud plus Duplicati being tested there are also backups from each computer to a SFTP server on the NAS using Duplicati, and that’s the scenario I’m focusing here (forwarding the latter backups somewher else). I started to use Duplicati recently, and it’s so neat, I even have HTTPS on every single computer’s web interface, and it would be great if I could stick to it only.

I’m not sure where this is heading. The main backup may not be OK. The secondary may not be OK because it may pick up a bad main backup (this can be covered by versioning), or get a direct attack.

Orchestration of the two stage backup is an issue. If security were not a concern, you might consider having third party duplicati-client drive two copies of Duplicati, but having Duplicati get remote control exposes it to attacks on its web server (which is not hardened) or just by attacker stealing credentials. Reducing risk could be done at the Duplicati level using its IP controls, but external firewalling is safer.

Another thing you can do is to use –upload-verification-file and utility-scripts/DuplicatiVerify.py to check backup integrity before doing the secondary backup. This might also find non-security integrity issues.

A simple attack would do something like encrypt original source files. Next step up might try to get the backup by getting access credentials from Duplicati (or whatever). If a tool can get there, and attacker attacks the tool, then attacker can get there. Defense must be server-side, e.g. using files immutable normally, and with ability to alter that policy well-protected, and without the credentials on uploader PC.

It really depends on how far you want to go. Ordinary ransomware is probably easiest to stop. Lengthy assault by skilled attackers in the computers are harder, as they will try to move through your systems. Unless your data is very high-value, I’d worry more about the simpler forms of attacks, but it’s your call.

Offline backups may be an option, but for online all I can say is make sure remote access can’t destroy backups. Don’t even give the uploading systems (which may be totally compromised) a way to do that.

Better still would be to restrict the web interface to localhost, ssh to remote, and browse to localhost GUI. Using HTTPS protects against eavesdropping, and gives you some assurance you’re on the correct site, otherwise someone could steal credentials via MITM. HTTPS doesn’t block attacks on the web server or even simple password guessing. Something like SSH is more hardened, and has attack mitigation tools, however mitigating rapid password guessing can sometimes leave you open to denial-of-service attacks. Also note the earlier recommendation to not leave the web server accessible. Best to firewall, if possible. Better still to firewall SSH if possible, and stick to localhost. Depends on how seriously you want security. There have been several forum users who worry about specific crypto algorithms of SSH they find weak. Security can have weak spots, so please keep overall system view in mind, and use layers of protection.

EDIT: “ssh to remote” refers to port forwarding. Basically create an encrypted tunnel to do your browsing.

Firewall only allows access from one single IP.

And I’ll take a look at the options you suggested. Thank you!

I do use rsync, robocopy and rclone to backup the backups + backup directory snapshotting with versioning. But due to serious Duplicati reliability issues I would recommend running separate backup sets, instead of copying the possibly non-restorable backups to multiple destinations. At least in that case there’s possibility that you can restore from another source, when the restore fails with one set. - And do test your backups with full restore. The duplicati test option unfortunately isn’t reliable.

Edit: Repair getting rewritten is great news. Let’s hope it finally ends the extremely serious non-restorable backup issues. I’m sure the issue isn’t technically big, but the ramifications of non-restorable backups are just huge.

I can be sure that I can restore that backup using Duplicati on a third computer without any problems?

No you can’t, unless you test it often. And if you do, you’ll find out that it works mostly, but at times it doesn’t. See my post linked above.

Edit:
Database recreate - It can be slow, sometimes finds issues, but it exists.

Yeah, it can take weeks and fail after that.

The main point of MY observations and questions here is based on the possibility of advanced ransomware, or someone with remote access, attacking the destination of the backups of the infected computer.

This is exactly why I’ve been asking for remote compact. It would allow the backup clients to work with create only access and without modify / delete permissions. Compaction would be done by secondary system with delete / modify access. Of course it would also reduce bandwidth needed, because there’s no need to download & re-upload data when compacting. But the primary aspect for that was security considerations. Anything which has been uploaded, can’t get deleted.

Of course you can do this also by having versioned backups of the backups. But the way above, would be slightly more elegant.

Interesting idea, but it would be a big design change. I think one of Duplicati’s strengths (and in some situations a weakness) is that the back end where data is stored is just dumb storage. There is no remote engine at all.

If there WERE a remote engine, then Duplicati could talk to it instead of the remote storage. The remote engine would be responsible for writing data, handling compaction, pruning, etc. This is more along the design of CrashPlan. There are advantages and disadvantages to it.

I prefer Duplicati the way it is now - client software + dumb remote storage. I’m not having the issues you describe in that other thread, so maybe that is why I’m more confident in its current design.

and even better but worse for performance is to do it to a different system, or use –no-local-blocks=true.

I said some of this a different way:

and I suppose I should also add that at least one of those ways shouldn’t involve “beta” quality software. Newest Canary (not yet next Beta) has fixed quite a number of integrity bugs, BTW, but has little testing.

Can be said for security of backup in other ways, not just from attack, except with attacks, making them work through a chain of defenses is good because any step might stop them. For regular use, a chain is unwanted because any step might stop YOU. This unfortunately sounds like it’s leading into a tradeoff…

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747 fix helps speed issue but again is not yet in a Beta. There’s a conflicting wish between fast Betas vs. more fixes…

I didn’t mean having a remote engine. I’ve actually written about this in some post earlier.

But from my old perspective of software product manager. Sure, these both options easily add lof of complexity and things that could go wrong. And therefore, it’s totally pointless to address this kind of “nice to have” features, before core features are rock solid.

One way of doing that would be “logging the changes”, either the client could send file which tells what to compact and how, or in case of “remote / local alternate compact” the log would contain information what has been compacted and how to update the local database accordingly. - But that’s just adding more complexity and making things more brittle for a small optimization / security improvement.

Have you tried restoring your database from the storage without local database? If you haven’t done that, you might now know that you’ve got a bomb waiting for you. - But enough about that. It’ll get fixed, when it’s get fixed. And as I’ve said, it’s probably from technical perspective small issue, which could and should get fixed in several stages of the processing.

But if we talk about this thread, snapshotting the backups and backing up the backups (with duplicati or alternate versioning software) both work, without creating the complexity of providing write only access.

Just wanted to warn people about the fact, that making backups of the backups, still might leave you with non-restorabe backup situation.

Yep, I have done test restores multiple times, with and without local databases. I have also run through database recreation process on all my machines as a sanity check. Most recently I started doing full database recreations just this past week.

For some reason one of my machines it is not able to recreate the database using just dlist and dindex files, and says it needs to download almost 500 dblock files. Each block takes 15 minutes or so to process so it takes a number of days to fully recreate. Seems to point to some sort of bug. I’m not familiar enough with that part of Duplicati’s architecture to be able to troubleshoot it myself though.

If I knew what I know now, I’d use larger dedupe blocks on these large backups. 100KB blocks on a backup that has 800GB source data creates a LOT of blocks. And I know many here have even larger backup sets.

I just performed a test in Windows: there was a Source folder, a Destination folder (this would be where the source would be backed up), and a Remote folder (this would be the second location, where Destination would copy the Duplicati files to). I downloaded 3 installation files around 30 MB and put them on the desktop. Then I created a task, to backup from Source to Destination, with volume size of 3 MB, set to keep only 1 backup, and with the “–upload-verification-file” option.

The first run had 1 file on the source, and ended up with 24 files on the destination, that were copied to Remote.

The second run had 2 files on the source, and ended up with 48 files on the destination, that were copied to Remote, that ended up with 49 files (23 overwritten).

The third run had 3 files on the source, and ended up with 72 files on the destination, that were copied to Remote, that ended up with 74 files (47 overwritten).

The fourth run had 2 files on the source, and ended up with 50 files on the destination, that were copied to Remote, that ended up with 77 files (47 overwritten).

The fifth run had 1 file on the source, and ended up with 26 files on the destination, that were copied to Remote, that ended up with 80 files (23 overwritten).

The I used the “Restore” function of Duplicati (not from inside the task) to analise Remote, and I could see all the 5 versions. First I restored the third one, with three files, and they seem to be OK: I started to run the installation programs, but didn’t follow through. Then I restored the most recent one, with one file, and got a warning, “Failed to apply metadata to file:”, but the file mentioned was the destination folder for the restored file; the restored file was there, and seemed to be OK, I also started to run the program.

Can I conclude that it can be done like this, even if it is messy? To have unnecessary files on the remote destination (the backup of the backups) is a small price to pay, and maybe I can find a way of using the JSON file you showed me (–upload-verification-file) to make a clean up. But should I also think of the possibility of the original Duplicati files being tampered with, instead of just deleted?

P.S.: I didn’t make it clear: I was simulating a scenario where the Duplicati files, the original daily backup, were copied to a third location on a daily basis, adding and overwriting files on the third location, but never deleting anything (unless there was a good, safe method of isolating and deleting the unnecessary, old files, maybe using the JSON file you mentioned).

Yes, I’m using de-dupe blocks of 64 MB and dblocks of 4 GB for many backups which are mostly static and huge.

Yeah, that’s the way it works, in disaster recovery situation, and that’s something everyone should test for. Other test are more or less cheating with the testing and not doing it properly.

If it works. And in my post you’ll notice that doing the test once, isn’t nearly good enough. In my cases there are thousands of backup runs, and after that some of the backups are slightly corrupted but still restorable, and a few are completely destroyed.

In my case, I’ll also automatically test the files after restore for data integrity and data freshness, which is something the Duplicati doesn’t of course test for. Using the database validation tools and some scripts to check for key data files content.

I don’t think this is a supported workflow, so would it be reliable? I don’t know enough about the inner workings of Duplicati to know for sure.

Also I don’t know if it would ultimately help in a ransomware situation (where your duplicati backup on the local computer get encrypted) unless you are only syncing NEW files to your remote destination, not changed files. Then again maybe that doesn’t happen… someone pointed out elsewhere that duplicati doesn’t change existing files - it only deletes and creates new ones. So if a file is CHANGED it’s not something you would want to sync. Not sure.

Anyway your setup would make me nervous!

Yes, I know! I’m not sure about that at all.

Well, if the ransomware on the user’s computer deleted its Duplicati backups on the server, there would be no problem. If it somehow changed the backup files and renamed the files in the process, there would be no problem either (I would only have a lot of crap on the third location, along with the good files from the day before). But if it tampered the backup files but keeping their original names, then it would destroy my remote backups as well.

What about this alternative system I’m testing? The users’ computers sync selected folders and files with my server: files on the server are created, overwritten or deleted, in a total mirror sync. Then the actual daily backup, using Duplicati installed on the server, is created, using the directory containing the synced users’ folders as the source, and using a directory (not accessible by the users’ computers) on the same server as the destination, and then another identical backup is ran on the same server, but having a remote server as the destination (S3, a server on another branch etc…).

This way, if users’ files are deleted or damaged by ransomware, they will be synced to the server (the ransomware doesn’t even need to be “smart” and target the backups, right?), and the server will make a Duplicati backup of them (or their lack of files), but they will be just files on the latest version, according to the retention rules I specify.

I don’t know if I had an awesome original idea (even if I’m not the first one to have it and I’m just not well informed), or it I’m doing something stupid or unnecessary.

P.S.: A downside is the extra work of keeping track if the syncing part is working properly.

I have seen ransomware that encrypts files without changing the filenames. More often it seems to add a suffix, but not always.

Yeah. I don’t think it would be viable.

It’s not stupid, you aren’t the first to talk about backups of backups! No one wants something to happen that destroys both your main data AND the backup.

Several of us are doing backups of backups using a variety of methods. How far you go depends on your level of paranoia :slight_smile:

1 Like