Do the sqlite files contain confidential data?

Dear all,

I learned from the Duplicati documents that for a disaster recovery it is beneficial to have …
(1) Duplicati-server.sqlite,
(2) xxxxxxxxxx.sqlite,
(3) the backup configuration file
… although (1) and (2) can be recreated from the backup data on the destination and for (3) I mostly need the backup password (and the SSH key if applicable).

I think about automatically copying these three files to the filehoster which I use as backup destination after each backup. For the backup data itself I have no concerns as they are encrypted. But what about the sqlite files? Do they contain any confidential information that must kept private? The backup configuration file can be encrypted on export so I assume I can copy it to the filehoster without concern?

Thank you!

Greetings,
Christian

Oh, sorry for my posting, I just checked the files using a sqlite database reader.

The “Duplicati-server.sqlite” file contains the backup destination configuration including username and password and the “xxxxxxxxxx.sqlite” file contains a list of all backed up files. As all data is in plain text, it’s better not to have them on a server not fully under my own exclusive control - or encrypt the files before loading them up.

Greetings,
Christian

Hi Christian,

You can encrypt the sensitive fields in the database with:

Or also with this new feature:

Hi Marcel,

thank you for pointing me to these pages. Encrypting the server sqlite file seems an exzellent feature to secure the critical data contained. Is there also a way to encrypt the second sqlite database? It seems to contain no passwords but all filenames and this is also data I’d like to have encrypted.

Otherwise I will follow the suggestion of a forum user who created a second backup job which saves only the two sqlite files. As they are only a few hundred megabytes in size, the local file for this backup can easily recreated.

Greetings,
Christian

No the idea with this database is that it exists on the same filesystem as the files themselves. Since the filesystem is available, so are the filenames, so it does not make sense to encrypt them.

If you upload a copy of the database to some place, you should certainly encrypt the whole database.

FYI

Although it looks like you’re on your way to backing up the server and job databases, I don’t think Duplicati-server.sqlite can be recreated from the destination, but the job database usually can be.

Sure, but I want to avoid the recreation of the job database in case of a disaster recovery. Several search hits lead me to the conclusion, that this may take a long time with large backup sets. So it could be helpful to have this database with the backup to be able to copy it back instead of recreating it.

Right, I am trying to set it up this way. Okay, so the developers somehow thought that the server.sqlite will be stored offsite (as there is a way to encrypt it with a passphrase). But I thought this might be a good feature for the job database as well.

The database contains sensitive information (passphrases and credentials) which should not be stored in plain-text on disk. It is also often picked up by accident when backing up the app-data or user homes.

The intention is that this is not required to have a backup of that database. If it is still the case that local database rebuilding is time consuming, I think we should look into the process and optimize it. It should not be time consuming, besides the need to download the .dindex files.

Maybe somebody should do some benchmarks of recreate versus restore database restore.

One thing I think people have found is that the database backup can get quite slow. Although Duplicati backup just uploads changes found, the database has info on everything ever seen. Depending on how scattered the database changes are, these may deduplicate very poorly…

But even if the best case can get back in business faster than DB restore, what of bad cases?

unless .dindex files don’t do full job. Speed may get a whole lot slower as .dblock files get read.

Generally I phrase this in terms of how much downtime hurts, in a disaster. Personal users can probably grumble and wait, should they be unlucky. Any users can test recreate periodically, as
Backing Up Duplicati notes. There’s a user who stopped backing up DBs as they previously did.

The intention is that this is not required to have a backup of that database. If it is still the case that local database rebuilding is time consuming, I think we should look into the process and optimize it. It should not be time consuming, besides the need to download the .dindex files.

When I have some time (maybe on Saturday) I will set up a virtual machine and try to restore some files from my 170 GB offsite backup without providing the job sqlite database. I will post the results here as a basis for further discussion.

1 Like

So, here are the results of my tests.

I use a Hetzer StorageBox (located in Germany) with 1 TB which is accessible via SFTP as offsite backup destination. I created a job to back up 175 GB of data. The only value I changed was the remote volume size (100 MB instead of 50 MB). I am located in Austria, thus not too far away from the location of the server. My internet connection speed is typically 25 MB/s download and 2 MB/s upload. The antivirus software was switched off to avoid any side effects or alarms interrupting the backup/restore process.

Backup of 175 GB of data
The full backup took 31,5 hours (!) which is mainly caused by my upload bandwidth limit. The size of the backup on the remote storage is 159 GB.

Today I restored (a) a part of the data and (b) all data. I set up a virtual machine with Windows 10 (64 bit) to ensure I have a fresh install with no old Duplicati settings. I connected an external harddisk to have a realistic read/write environment. In both cases I did not copy any server or job database nor did I import the job file but let Duplicati create all data from the backup. I only had to specify the backup location and the backup password. I completely resetted the virtual machine after each test.

(a) Restore of 19,3 GB of data

  • create the list to select which files to restore: less than 1 minute
  • re-create the database: 9 minutes
  • restore and verify 19,3 GB data: 25 minutes
  • total: 35 minutes

(b) Restore of 175 GB of data

  • create the list to select which files to restore: less than 1 minute
  • re-create the database: 11 minutes
  • restore 175 GB data: 150 minutes (2,5 hours)
  • verify 175 GB data: 140 minutes (2,3 hours)
  • total: 302 minutes (5 hours)

This leads me to the following conclusions:

  • my backup is working and data can be restored :slight_smile:
  • it is easy to do a disaster recovery even without job and database backup (of course I need the login data for the offsite storage and the backup password)
  • the time to re-create the databases is insignificant compared to the time to restore the data
  • the reports in the internet talking about days to re-create the database either refer to an old version of Duplicati or are the result of some bottleneck (e.g. slow remote storage bandwith)
  • for large offsite backups, a sufficient (and working!) internet connection is essential

Greetings,
Christian

2 Likes

Some might also be very large backups at small blocksize. Old default was 100 KB, resulting in enormous database to track multi-terabyte set of blocks. Also bad is final 10% of progress bar:

Third pass finishes search of all dblock files. Arguably though, a full restore may also fetch all.
If watching a progress bar, the final three passes are 10% each starting at 70% of the full bar.
Generally it’s the 90% to 100% which people notice due to the complete search for the blocks.

but bandwidth can make up for a slow recreate, and slow recreate is rarer now that Duplicati is better at not making backups causing it. Those were bugs, but storage systems have bugs too.

It would still be good to occasionally test Recreate, but there’s no easy safe button. Moving old database to a different name (just in case Recreate fails), then running Repair is one good test.

As noted, time-to-restore is more important to some users. Others might just take their chances.
A very serious time-is-money business situation really ought to have multiple backups anyway…

Thanks for the testing.

Thanks so much for reporting the numbers!