Backup of backups

emedeiros · September 10, 2019, 4:14pm

It would be great if Duplicati had this new kind of task, that could be scheduled, to let us make backup of Duplicati backups.

Imagine client A makes daily backups on server B using Duplicati, and server B also has Duplicati installed: we would be able to create this Duplicati task on server B to “forward” client A’s backups to a third location. We would need to provide the location of client A’s backups along with the cryptography keys if applicable, and server B’s Duplicati would access that backup the same way it does when we restore a backup, and then make complete, differential of incremental backups of the backups. If we wanted to restore a single document from the backup on the third location, we would just use Duplicati interface to do it, the usual way.

One thing I always have in mind now is the fact that ramsomware is more and more trying to reach and destroy the backup destinations of infected computers, so I decided to deal with the possibility that client A’s backup on server B gets corrupted or deleted by ramsomware infecting client A. Server B would make backup of the clients’ backups to a third location anyway, in case of a fire etc…

To just copy the client A’s backups to the third location on a daily basis, keeping them on different folders, would be very time and space consuming. Maybe making client A only keep five backups (Monday to Friday) and then having server B copy those files every Friday night, keeping weekly folders on the third location, would be less time and space consuming, but if ramsomware on a client destroyed its files and its backups on, let’s say, a Thrusday, I would only have the previous week’s backup on the third location.

To use whatever tool in server B, like rsync, to make incremental copies of the client A’s backups to the third location using different folders would make it much less time and space consuming, but if I had to restore a single document from backup on the third location we would need to download those complete and incremental Duplicati files and only then use Duplicati to restore them.

Bottomline: Duplicati is awesome, and now I want to use Duplicati for everything.

Sami_Lehtinen · September 14, 2019, 9:44am

I would personally say that this is something you can trivially script if you want to. Plus the requirement for server B to have the backup sets data encryption keys makes this moot. That’s something which is totally unacceptable from security standpoint.

It’s just better to run the backup of the backup, without decrypting the data. Sure you’ll end up also having the “stale” data in the blocks in the final backup, but so what. It’s a small price to pay for the extra security of having backup version of the backup, in the unlikely (?) situation where ransomware or APT attacker is able to detect Duplicati running and decides to destroy / corrupt / encrypt the Duplicati’s backup storage.

Another way of achieving something like this was the method I’ve asked a long time ago, where you could run separate compact process (without database), which would allow configuring the backend so that the actual backup clients have create / write access only, without possibility to modify / delete existing data and preventing sabotaging the existing backup sets using that access path.

emedeiros · September 14, 2019, 5:07pm

Are you talking about using a script on server B, using Duplicati’s command line options, to make Duplicati restore the content from the latest version only of the backup files (the Duplicati backup from computer A stored on server B) to a temporary directory, and then using this content for the Duplicati backup to the third location? Would it involve restoring the actual original files to this temporary directory and then backing them up to the third location? Would Duplicati know that it would be an incremental or differential task? Or would this temporary directory not be actually temporary, but more like some kind of permanent mirror containing all the backed up and then restored files from computer A, to be then backed up to the third location?

Or does Duplicati has command line options that would allow me to restore and then backup not the actual files, but only the bytes pertinent to the last version of the backup?

What if a company use AD remote profiles and the backup is done using Duplicati on the server containing the user’s remote folders: would it be unacceptable from security standpoint, since the backup sets data encryption keys would be on the server? Why is it safer to have the data encryption keys stored on computers that people use every day then to have them stored on a server? I understand that it’s a bad idea to keep the encryption keys on the very computer used as the destination of the backups, but this server B would in fact be responsible for backing things up to external locations, and it’s hard to do this dealing with the files created by Duplicati without having to deal with a lot of traffic and space consumption to keep a lot of versions.

emedeiros · September 14, 2019, 5:41pm

Yes, I know. But if I used Duplicati or a similar tool for this, I would need to restore the entire original Duplicati file set for a computer first, and only then be able to restore the original actual files (imagine if I only need a couple of files). But it’s like you said: there is a price to pay in every option.

I don’t understand (and I don’t understand Duplicati that well in the first place): would this compact process, without the database on the computer A, be ran on server B, having the third location as the destination of the result of the compact process?

Sami_Lehtinen · September 15, 2019, 8:41am

Nice thinking, you’ve got lots of good questions there.

Yes, it would mean restoring backup, and running backup of the files to another location.
Duplicati 2 doesn’t do “incremental” backups anymore, it’s always differential task.
If you keep the temporary directory content in the path between the runs, you can use local blocks option to make the restore faster, afaik. Haven’t tested that, but that’s how it’s supposed to work. Also restoring backup needs the encryption keys + it’ll do the database rebuild. Depending on backup size, this task can be extremely slow.
But everything is a trade off, it’s up to your own preferences. Like what kind of situations you’re protecting the data against and how’s your system CPU, disk I/O, Storage capacity, network I/O features are balanced, etc.
No option to do just the last backup version. But if you want to do that, then just take backup of the backup files, then you’re efficiently doing exactly that. It’s good to remember that in normal conditions the data of the “latest version” is actually scattered all around the files, unless you’ve got some kind of very specific use case where that doens’t happen. This is what makes the backing up very efficient (in terms of disk space and network bandwidth), but can make the restore really slow process. But personally I’ll consider this is a good trade off to have in most of use cases.

Well thought again. Sure this is something you’ll have to consider case by case. Also one thing to do between different security zones is that the “backup server” which executes the backups, doesn’t need to be the source of the data being backed up, nor the destination where data is backup up to. This allows locking up the server with the encryption keys to much higher security level than the file server which is the source of the files and many users got credentials / network access to it.
No, really, it’s not hard to deal with the files created by Duplicati. This is exactly why Duplicati is so awesome. It’s very efficient to sync or backup (ie. with or without version history) the files created by Duplicati to secondary locations. This is exactly the benefit we get from the de-duplication and static container files used by Duplicati. Drawback of course is that restore and compaction are but slower, as well as the de-duplication process itself is energy consuming (cpu, ram, i/o on the server creating the backup).

Sure, yes in this specific case. But I see the requirement of restoring the backup of the backups, as something which should be extremely rare occurrence. That’s only necessary when something else went seriously wrong. Again balancing between trade-offs. - I’ve done that once, due to one of RAID drives getting literally head crashed (I took it a part to see it) and operating system corrupting the FS on the remaining disk. Then I restored backups from the backup of the backups.

Yet this caused secondary problem. The Duplicati repair task wasn’t up to the situation, where the remote data storage got rolled back for “one version”. Well working repair task should be able to handle that. Sure, it means that one version is lost, and in case there were anything being compacted, that data might be relocated in new files in the storage directory and naturally also some old files might be missing. But in logical terms it would only mean losing one version, and some probably time & bandwidth consuming reconstruction work. But it shouldn’t mean that the whole backup set gets broken as happened in this case. Sure, the old backups were restorable, but… You couldn’t get the system to continue backing up to those sets anymore due to mismatch between remote storage and local storage. With newer versions hopefully the repair & database rebuild can handle that kind of situation.

Technically it doesn’t matter “which server” does the compaction process. It’s all about running the compaction without having the database, and the storage location getting a few new files and some files deleted in the process. After that the real question is getting the database on server A synced after that compaction so that everything won’t get messed. But there’s different thread about all this stuff here.

In the current networked world these things can be combined in so many different ways. Talking about specific server is just kind of confusing. It’s so easy to forget properly generalizing aspects, if you’re talking about your own specific use case. (This of course happens to everyone)

emedeiros · September 15, 2019, 1:37pm

In all my thoughts here I consider the possibility of ransomware or remote control malware affecting not only the files on the users’s computers, but also their backup destinations (while kind of neglecting the possibility of the same threats on servers, I admit).

And if we are to consider this danger, of the backups being tampered, we must consider the possibility of it happening without the need to access the backup destinations, without the need of their locations and credentials: all the ransomware or remote dude needs to do, after the encryption and removal of original files, is to use the command line options of the backup program to retrieve data from its database, especially the number of versions of the retention policy, and then make the backup program run the tasks the same (or higher) number of times of the number of versions above, maybe needing to add dummy files between tasks, being careful to also disable any kind of notifications (e-mail, syslog etc…). If the backup program encrypts the tasks’ settings, then there are still good chances of being able to run all the tasks a lot of times: only the first run on each task would take longer, according to the amount and size of files.

If such approach is successful, then the Duplicati backup files on the destination will be rendered useless to the owners and guardians of the files (there would be only hijacked files to be restored, encrypted by the attack), but not for Duplicati itself, and it would keep forwarding the problem, unless there is some mechanism to interrupt any process in case of any defined proportion or quantity of changes in the original files. That’s why I though of a system that would let the admins define the number of versions being “forwarded”, being one the default option. In case of deletion of the backup files, or their very encryption from the source, whether changing their names or not, Duplicati would fail on the server anyway.

emedeiros · September 15, 2019, 2:29pm

I think it’s like you said: there is always a price to pay. I was just thinking of reducing the price, while being able to use Duplicati in all the steps.

There has been a recent discussion in an older topic, “How to backup the backup?” (How to backup the backup?), about this, it would be great if you could read what I wrote and give more of your helpful thoughts.

Sami_Lehtinen · September 15, 2019, 3:55pm

Yes, that’s one of the risks I want to have covered at least to reasonable level. And that’s exactly why I said. “with or without version history”. If you read there between the lines, I said backing up the backups with version history. Which means that you can restore the version of the backup of the backups, if and when required in case there’s a time in past when the data is corrupted intentionally or accidentally.

I’ll check the other discussion.