Complete loss of backup files, incomplete databases

jimmuller · April 13, 2025, 1:27pm

I have been trying Duplicati for a a few days. I have had several intolerable errors.

Platform: Dell laptop
Intel(R) Core™ i5-8265U CPU @ 1.60GHz 1.80 GHz
8.00 GB (7.85 GB usable)
Recent installation of Windows 11 Pro 24H2
Firefox 137.0.1 (64-bit)
Backup location: specific folder on a Local Toshiba 2 TByte USB hard drive.
I configure the backups to save 2.

Twice I have had all of my backup files disappear. Gone. Vanished. IIRC, this has happened when I try make a second backup of one I had previously saved successfully.

Several times I have created a new backup, told it to run, then seen it complain about more than 80 files missing in the database. Why were those filenames not entered into the database when I created it just a moment earlier? On at least one of those occasions I have repaired the database and then seen it complain about 11 more missing.

I have tried to edit or delete a backup entry but the page on Firefox was completely unresponsive.

On at least one occasion my computer has seemingly frozen, unresponsive to any mouse or keyboard input but responsive only to Ctrl-Alt-Del. Then by terminating the DuplicatiTray in the System Tray I was able to regain control over the computer.

I am going to try again with different folders for my various backups. However these problem are disquieting.

ts678 · April 13, 2025, 2:42pm

I assume current version from download page, but About can confirm it’s 2.1.0.5.

Disappear from what? The USB hard drive folder? The Duplicati Restore tree view?

Sort of like below? Exact messages(or screenshots can help describe situations well.

I got that by deleting my job database on Database screen, then asking for backup.

If you don’t recall the exact message, try Show log for your job. Any logs add clues.
Sometimes a normal job log won’t be created, but About → Show log will have a log.

Is error very instant or after backup ends? While running, what info does GUI show?
There should be information in the status bar at top, as well as details in job section.

Are you talking about some sort of manual create, or the results from a backup run?

Did you actually see filenames mentioned? Names would likely be files on the drive.
I’m still not sure what message you saw. It can also complain about extra files there.

At least on Windows 10, Ctrl-Alt-Del brings up a Windows full screen with Task Manager as one option. If you did that and ended Duplicati, it would also be nice to look at Performance tab there to see what’s using what amount of resource. It sounds like system overloaded…

The system I’m typing on sometimes overloads, but it’s usually from too many Firefox tabs. Does yours have a drive-busy light? When mine overloads, the drive is usually quite busy because memory filled. I can see approaching danger on the Memory stat in Performance.

This system has 32 GB of physical RAM, and virtual memory using the disk is the rest.
Your situation may differ, but something’s probably overloaded, so check resource use.

You always want a different folder for each backup. If you don’t, that’s probably the issue.

The new documentation doesn’t call this out (prior one did), but database works like this:

The local database

The database is essentially a compact view of what data is stored at the remote destination

Each job has a Database screen with a different database, matching different destination.

Prior to running a backup, Duplicati will do a quick scan of the remote destination to ensure it looks as expected. This check is important, as making a backup with the assumption that data exists, could result in backups that can only be partially restored. If for the check fails for some reason, Duplicati will exit with an error message explaining the problem.

It sounds like this is what happened, but I’m not sure of the sequence or exact message.

jimmuller · April 14, 2025, 9:57am

Let’s get the important item out of the way first.

You always want a different folder for each backup. If you don’t, that’s probably the issue. … The new documentation doesn’t call this out (prior one did), but

Really??? I did not see that requirement described anywhere. If it was described, it certainly wasn’t obvious. Given that it can cause the catastrophic loss of a previously successful backup, that’s a major flaw. Not only should that requirement have been stated in a way that couldn’t be missed, but the program itself should have checked whether a previous backup had been saved to that destination folder and prevented that destination’s re-use. If the program can keep a database of what files have been processed, surely it can keep a record of where its backups exist. Other backup programs I have used always asked for a destination but then managed its contents without further specificity from the user. With so many zipfiles created in the destination folder I assumed the program would keep them properly identified. [FWIW, I am a senior software engineer, have been doing science application programming for over 40 years, have BS Physics Va Tech 1971, PhD Geophysics MIT 1983. If I wrote something that allowed a user to make that big a mistake it would never be accepted by anyone. I might even lose my job because of it.]

Now to answer some of your questions. My usual process was to create a Backup entry for select folders on the C: drive, no encryption, give it a destination, typically F:\DuplicatiBackups, no schedule since a detachable disk may not always be there and anyway the contents of those selected folders almost never change, and tell it to save a fixed number of backups, either 1 or 2. When it said the Backup entry had been created I would click Run Now. I would then repeat the process to back up a different set of C: folders. You are now telling me I should have assigned a different backup location for each new entry.

The files lost were the backup zipfiles on the USB disk drive. The error messages typically said that files were missing in the database, please run Repair. It looked like the database hadn’t been completed but apparently I was creating a database conflict. The error messages should have told me what the real problem was even if they didn’t lock me out from making that mistake.

For completeness sake, the About page said:
You are currently running Duplicati - 2.1.0.5_stable_2025-03-04

jimmuller · April 14, 2025, 10:21am

A followup. I made two backups in different destinations on my USB drive That seems to have worked okay.

I checked restoration by deleting a folder on my C: drive from the first backup, then had Duplicati restore it. The restoration worked, restored the folder I had deleted. However the message bar at the top of the screen said it had restored the second backup, i.e. the name of the most recent backup I had saved, not the one I had just restored. I repeated the test but with a deeper subfolder in that first backup, and again it restored the deleted folder but displayed that it restored the name of the second backup I had made. You have an off-by-one bug or an uninitialized variable or display string.

ts678 · April 14, 2025, 4:01pm

As I said:

Maybe the developer can at a minimum update the new documentation that was created.

Old manual had more detail, including the warning below, but documentation isn’t a great replacement for protective code. Strangely, I can’t find any GitHub issues just on this.

When the dust settles in the discussion here, maybe someone should file one, since discussion in forum is easy to lose track of, whereas issues get tracked more formally.

Creating a new backup job is in the older long version of the manual. New one lacks it.

Yes and no, IMO. I’m not the dev, and there’s probably some early development history.

Command Line Interface CLI came before the server and GUI, and it is still in use today.

Each command also requires the option --dbpath=<path to local database>, but if this is not supplied, Duplicati will use a shared JSON file in the settings folder to keep track of which database belongs to each backup. Since there is no state given, the remote url is used as a key, because it is expected to uniquely identify each backup. If no entry is found, a new entry will be created and subsequent operations will use that database.

By intent, command line runs can run independently of each other and of what GUI does (however if you want to run a GUI job from command line, use ServerUtil to request that, which also keeps your CLI run of Export As Command-line from colliding with a GUI run).

Back to databases, the model above associates a destination folder (given in URL form) with database assignment, so if one happened to put different source trees into a single destination with different Duplicati.CommandLine.exe (Windows name) runs, database is experiencing whiplash in terms of what each version asks, but keeps destination straight.

You can cause the same change-of-mind in the GUI too, and it’s fine. Each version keeps what Source it’s told, and one folder of Destination files has data from the varying Source.

Different (intentionally or not) Source through one job database to one Destination works.

What doesn’t work is multiple job databases thinking they are the DB for same destination because (right at the initial backup for new job) the new job will see files it’s not expecting. Pushing through that might work, but the new job will then lay surprises for the original job, probably including some extra files, and (I think) potentially some missing ones from doing compact or something which will delete files that the original job had put on the destination.

So returning to the question, one piece at a time:

The job database keeps track of Source files, and the Destination files from processing.

The “it” here is a bit vague, and I’ll dig in because you have the experience to follow this.

CommandLine does have the map of destination URLs to databases in dbconfig.json, and in a sense that’s “its backups” although it doesn’t know which have fallen into disuse, and (I think) making it forget a mapping is kind of a hand-edit of a text file, so not elegant.

GUI has server database Duplicati-server.sqlite and a definite idea of “its backups”. Destination is expressed in URL form, and it could and should question destination reuse.

I’ll now do a test with a recent Canary public-testing build to make sure it still doesn’t look.

I exported a job and imported it to create a new one in same Duplicati. Try to save it, see:

Fix the name, and it saves with no complaint, so now I have two jobs with two databases. Import doesn’t import the old database path, which is probably a good idea), but sets the situation where two jobs with two databases think they both own one destination, so bad.

One could try to catch the error based on destination URL, but a given destination could support multiple protocols, such as FTP/FTPS, and SSH/SFTP, and WebDAV, and SMB.

One could have the Destination Test connection button look more closely for files that might be Duplicati files, but false positives could occur, and Duplicati supports option for:

  --prefix (String): Remote filename prefix
    A string used to prefix the filenames of the remote volumes, can be used to store multiple backups in the same remote folder. The prefix cannot contain a hyphen (-), but can contain all other characters
    allowed by the remote storage.
    * default value: duplicati

which handles what I view as a rare case (but I think it’s been used). This unknown is in Advanced options, so is hard to handle as early as Destination, but it could sort of try.

Another potential unwanted end is picking some folder full of other existing files. Duplicati only looks at its own files, but the user might not want backups polluting, say, Documents.

At job Save time, additional checks happen in both JavaScript and run by Duplicati server. Server could look at the final combination of URL and prefix and flag any seeming conflict, however it would still be fooled if a destination used different accesses in the different jobs.

It also could get a surprise at actual backup time because it has not seen destination files which might have been put there by a Duplicati on another computer. No global tracking…

For a local check based on database information, it would seem nearly as good to check whether a previous backup had been configured to that destination folder, though I gave ways where that could fail. Actually looking is safer, but a look at Save time may surprise.

Destination access at other times (e.g. Backup) should be no surprise, and it’s done right away at start of backup, and destination use is prevented by failing backup with a popup.

Popup error has accidental corruption in mind, and says to Repair, which might be wrong.

A backup is some dlist (file list and reassembly info), dblock (data), and dindex files. When things go perfectly, everything aligns as expected. If not (interruptions can do this), there may be missing or extra files perceived. An extra file happening this way looks much like an extra file produced by any other backup run by this or some other Duplicati server.

They are properly identified in the job database, but that’s per-job not per-Duplicati server. Possibly some of this is historical to when the CLI database was per-destination, which is ideally treated like a job with some consistency, e.g. of the source files that it’s backing up.

So first question is on ways to avoid misconfiguration, and then second is how to limit the damage when it happens anyway. One way to kill a backup was to restore a stale job DB from some other backup (maybe a drive image), try to backup, and get told to do Repair which will try to reconcile Destination to stale DB-of-record. I think that will be stopped by.

github.com/duplicati/duplicati

Add a check for stale database

master ← feature/prevent-dataloss-from-outdated-database

opened 12:21PM - 06 Feb 25 UTC

kenkendk

+7 -0

This commit adds a check to verify that the remote database is up-to-date before… proceeding to repair the remote storage. Without this check it is possible to start repairs with an old database, and this will cause the remote data to be deleted. This fixes #4579 Thanks to @Jojo-1000 for [suggesting the low-impact fix](https://forum.duplicati.com/t/prevent-data-loss-when-repairing-outdated-local-database/16389/1).

So that one does a sanity check on backup version times I think (I’m not a C# developer).
If Backup funnels users into Repair when re-use is attempted, a similar time check could realize that mostRecentLocal doesn’t exist, yet after prefix filtering, Destination has some.

This scenario is fine if one is using GUI Database Recreate (delete and repair), repair has this two-way behavior I’m not sure I like where it either tweaks things or does DB recreate:

`Usage: Duplicati.CommandLine.exe repair []

Tries to repair the backup. If no local db is found or the db is empty, the db is re-created with data from the storage. If the db is in place but the remote storage is corrupt, the remote storage gets repaired with local data (if available).`

I think the help text supports my thinking that it has corruption in mind, not config dangers. Regardless, making Repair more sensitive to other situations may be an approach to this.

I’m thinking that there was a recent idea of having the DB get recreated automatically if destination exists but DB doesn’t (maybe as a convenience), but I can’t find that currently.

Regardless, the dev might know unless I imagined it, and I’ve given enough talk already, and I’m not the person knows the code or history. We’ll see if the dev will give comments.

The web UI in 2.1 is a bit flaky for me, but I’m not seeing this, if I understand correctly.
There’s not AFAIK a show of the job name after the restore. It does show in progress.

and afterwards there’s a sometimes-optimistic success message in main screen area.
Status bar returns to idle state, giving the schedule status for next job, or saying none.

Problem needs more information, but so far I can’t repro any kind of wrong-name bug.
Maybe the dev will recognize it, but more important I think is comment on damage by configuring two jobs to one destination, then maybe running Repair when told to do it.

suggests that the dev was receptive to the idea, but then lost track of intent for a check. Previous work was spare-time. Now it’s paid, by Duplicati, Inc. Planning is more formal.

Anyway, it’s on the table again. Don’t need to use my ideas, but something needs work. Preferably this won’t get to design-by-committee, but you might also have design ideas, keeping in mind the existing design, as I’ve tried to describe it in quite some detail here.

For your restore issue clarification, I’d suggest a new topic, leaving this for original topic.
Knowing that most went well when not falling into a bad-config-that’s-too-easy was nice.

EDIT 1:

Add option to automatically create database if none exists #5932 is what I tried to recall, which maybe means if that gets added and chosen, the second job on same destination builds a second database for the same destination, instead of a popup error on problem. Similar problem to manual configuration of two jobs into same destination results though.

jimmuller · April 14, 2025, 10:19pm

Thank you for the prompt and detailed responses. FWIW, I think I am okay with how Duplicati works now that I know a few of its foibles. I won’t bother you with anything else unless I run something new.

ts678 · April 14, 2025, 11:23pm

Well I’d still like to see if we can improve this at least a little bit someday. It’s a hazard.

A note in the manual should be easy, and after that it’s in code, so need dev comment.

https://github.com/duplicati/duplicati/projects?query=is%3Aopen

looks like the planning for 2.2 and future, and there’s also stuff on Duplicati’s Roadmap.

As new issues can show up at any time, there’s probably some room to get more in 2.2, although maybe not the ultimate design. We’ll see if the dev will add any tracking issues.

Thanks for your understanding, and sorry about the unwanted surprise from a config that anyone could easily do. On some backup systems, it’s even advisable (for deduplication).

I’ll now point to your original post on this, but I’m glad you opened a new topic for support.

jimmuller · April 16, 2025, 9:26pm

Thanks for the quick response. FWIW, the source of my problem was that nothing in either documentation or the running UI mentioned the need for each backup entry to have its own destination instead of a location to place the backups. The User Interface seemed so intuitive that reading a manual didn’t seem necessary. Even if documentation had said this I or any other new user would likely not have seen it.

I have no other recommendations at this time other than the improved messages I have already described.

kenkendk · April 23, 2025, 12:16pm

I have been looking into this issue because it can be super confusing, and generally makes it seem like Duplicati is just super broken.

The problem is that Duplicati tries to be super flexible when there is a discrepancy between what it thinks should be there and what is really there. In the ideal world the expected state should never change, but in the real word things happen and storage can drop or move files.

I have a PR here that changes the error message. New wording is:

“Found {0} remote files that are not recorded in local storage. This can be caused by having two backups sharing a destination folder which is not supported. It can also be caused by restoring an old database. If you are certain that only one backup uses the folder and you have the most updated version of the database, you can use repair to delete the unknown files.”

Input on the message is appreciated.

I have also registered an issue for checking for multiple backups sharing a path and another issue for checking if any backup data already exists.

nmiyazaki-chapleau · April 23, 2025, 3:04pm

FYI, I currently am using 2 backup jobs that are completely identical apart from their scheduled run time, so I can run a backup at 10AM and at 2PM specifically. Duplicati’s current scheduling UI does not allow doing this, and that’s how I got around that.
Then again, in my case, it’s the same database so it’s a different context - but it’s useful to know there are some applications to have the same remote destination, although they’re workarounds.

From previous experience importing and export configs, if the backup’s DB path points to a non-existing .sqlite file, it silently creates it and pulls all the file data from remote into the DB before doing the backup operation. Which makes sense, otherwise importing a config would never work.

ts678 · April 23, 2025, 5:03pm

Not seeing it, but it might depend on options. Database screen DB Delete, then backup:

Found 9 remote files that are not recorded in local storage, please run repair

It would work after a manual DB recreate (by Repair with no DB), which is inconvenient.

kenkendk · April 26, 2025, 1:50pm

Smart trick

But I see that as a workaround for a limitation in the scheduler. If the scheduler allowed multiple entries, it would not be a problem. The database and scheduler actually supports multiple schedules, but the UI and data-exchange assumes there is only schedule.

Other workarounds I have heard of is using cron or similar to invoke duplicati-server-util to start the backups at the desired times.

If you have --auto-repair set, then it will figure out the database is missing, and recreate it before running the backup.

ts678 · April 26, 2025, 2:26pm

At least on 2.1.0.117 Canary, this seems true, but one wouldn’t know it from warnings:

2025-04-26 09:59:42 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-20250426T135902Z.dlist.zip
2025-04-26 09:59:42 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-b87d91d250bdb49e3ad971b8d11e95988.dblock.zip
2025-04-26 09:59:42 -04 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraUnknownFile]: Extra unknown file: duplicati-i0dab61906a9b49c6b26cf5cc856e7759.dindex.zip
2025-04-26 09:59:42 -04 - [Warning-Duplicati.Library.Main.Operation.BackupHandler-BackendVerifyFailedAttemptingCleanup]: Backend verification failed, attempting automatic cleanup
RemoteListVerificationException: Found 3 remote files that are not recorded in local storage. This can be caused by having two backups sharing a destination folder which is not supported. It can also be caused by restoring an old database. If you are certain that only one backup uses the folder and you have the most updated version of the database, you can use repair to delete the unknown files.

Some information from the databases:

OOVJUVUXHA.backup

Operation table
Backup          9:59:42
Repair          9:59:42

OOVJUVUXHA.sqlite

Operation table
Recreate        9:59:46
Repair          9:59:48
Backup          9:59:48
Backup          9:59:50

jimmuller · April 26, 2025, 3:13pm

I will reply since I started this thread. Thank you for responding to this.

I have not read through any manual, so if it contains a warning then I have not seen it. The fact is, this UI seems so intuitive that most users, including myself, will probably try to use it without reading any manual. So a warning in a manual is not a solution.

I just tried another experiment. I created and successfully ran a test backup. Then I created a second backup with the same name and same destination. The screen reported that a backup with that name already existed and it would not let me continue. That’s good. I then changed that backup’s name but left the destination the same. It let me save and run it but first gave me the Please Repair message. I did not pay attention to the exact error, only that some filenames somewhere were missing or unexpected. The exact wording isn’t clear enough to tell me what the problem is, and the invitation to fix it with Repair is a dangerous suggestion. I then tried to restore a file from the original test backup, and as expected it gave me an error about the restoration file missing. This is exactly the behavior I saw originally, so no surprise. I checked for a Duplicati update and got none. If you have added error messages to an unreleased or beta version I wold not have seen them.

Thanks for following up on this.

jimmuller · April 26, 2025, 3:22pm

Here is a second, seemingly important thought. I’ll state up front that I am not trying to follow the discussion of how the databases are managed or how possible concurrent backups are managed. I am thinking from a user’s perspective.

Perhaps this point is moot or obvious but I’ll mention it anyway. If backups are stored remotely or on a different disk but depend on a listing of any sort on the originating disk such as in a Duplicati folder in a Windows AppData folder, then it will not provide backup against a disk crash. It will also not provide backup against having to migrate backed up folders to a different disk or different computer completely. The only safe way to do that is to keep the listing of what is in a backup zipfile in the same place as the backup itself.

ts678 · April 26, 2025, 4:39pm

It’s mostly covered automatically, but recovery or moves may need manual steps.

The local database

The database is essentially a compact view of what data is stored at the remote destination, and as such it can always be created from the remote data.

or so one hopes, but basically the backup files are the backup. Local database isn’t.

Migrating Duplicati to a new machine shows you can copy or recreate database.

There is a server database holding the job configurations that a crash might lose.

Import and export backup configurations describes how one may preserve config.

Restore from configuration can be tried if you saved an exported configuration file.

Direct restore from backup files needs less information. Save at least that much…

kenkendk · May 1, 2025, 7:11am

I agree. For now I have made a change to the next version that will state this problem if you click the “Test connection” button. It will then detect existing backups in that folder and tell you not to do it.

We can consider if the Test click should be mandatory, or at least nudge you towards it (like: It is highly recommended that you test the connection, do you want to do it now?).

There is a small corner case: if you create the backup schedule, but do not run any backups, the folder is empty, so a second backup configuration will not detect it. Once you start running the backups, it will be the same outcome as you describe.