Complete loss of backup files, incomplete databases

I have been trying Duplicati for a a few days. I have had several intolerable errors.

Platform: Dell laptop
Intel(R) Core™ i5-8265U CPU @ 1.60GHz 1.80 GHz
8.00 GB (7.85 GB usable)
Recent installation of Windows 11 Pro 24H2
Firefox 137.0.1 (64-bit)
Backup location: specific folder on a Local Toshiba 2 TByte USB hard drive.
I configure the backups to save 2.

Twice I have had all of my backup files disappear. Gone. Vanished. IIRC, this has happened when I try make a second backup of one I had previously saved successfully.

Several times I have created a new backup, told it to run, then seen it complain about more than 80 files missing in the database. Why were those filenames not entered into the database when I created it just a moment earlier? On at least one of those occasions I have repaired the database and then seen it complain about 11 more missing.

I have tried to edit or delete a backup entry but the page on Firefox was completely unresponsive.

On at least one occasion my computer has seemingly frozen, unresponsive to any mouse or keyboard input but responsive only to Ctrl-Alt-Del. Then by terminating the DuplicatiTray in the System Tray I was able to regain control over the computer.

I am going to try again with different folders for my various backups. However these problem are disquieting.

I assume current version from download page, but About can confirm it’s 2.1.0.5.

Disappear from what? The USB hard drive folder? The Duplicati Restore tree view?

Sort of like below? Exact messages(or screenshots can help describe situations well.

I got that by deleting my job database on Database screen, then asking for backup.

If you don’t recall the exact message, try Show log for your job. Any logs add clues.
Sometimes a normal job log won’t be created, but About → Show log will have a log.

Is error very instant or after backup ends? While running, what info does GUI show?
There should be information in the status bar at top, as well as details in job section.

Are you talking about some sort of manual create, or the results from a backup run?

Did you actually see filenames mentioned? Names would likely be files on the drive.
I’m still not sure what message you saw. It can also complain about extra files there.

At least on Windows 10, Ctrl-Alt-Del brings up a Windows full screen with Task Manager as one option. If you did that and ended Duplicati, it would also be nice to look at Performance tab there to see what’s using what amount of resource. It sounds like system overloaded…

The system I’m typing on sometimes overloads, but it’s usually from too many Firefox tabs. Does yours have a drive-busy light? When mine overloads, the drive is usually quite busy because memory filled. I can see approaching danger on the Memory stat in Performance.

image

This system has 32 GB of physical RAM, and virtual memory using the disk is the rest.
Your situation may differ, but something’s probably overloaded, so check resource use.

You always want a different folder for each backup. If you don’t, that’s probably the issue.

The new documentation doesn’t call this out (prior one did), but database works like this:

The local database

The database is essentially a compact view of what data is stored at the remote destination

Each job has a Database screen with a different database, matching different destination.

Prior to running a backup, Duplicati will do a quick scan of the remote destination to ensure it looks as expected. This check is important, as making a backup with the assumption that data exists, could result in backups that can only be partially restored. If for the check fails for some reason, Duplicati will exit with an error message explaining the problem.

It sounds like this is what happened, but I’m not sure of the sequence or exact message.

Let’s get the important item out of the way first.

You always want a different folder for each backup. If you don’t, that’s probably the issue. … The new documentation doesn’t call this out (prior one did), but

Really??? I did not see that requirement described anywhere. If it was described, it certainly wasn’t obvious. Given that it can cause the catastrophic loss of a previously successful backup, that’s a major flaw. Not only should that requirement have been stated in a way that couldn’t be missed, but the program itself should have checked whether a previous backup had been saved to that destination folder and prevented that destination’s re-use. If the program can keep a database of what files have been processed, surely it can keep a record of where its backups exist. Other backup programs I have used always asked for a destination but then managed its contents without further specificity from the user. With so many zipfiles created in the destination folder I assumed the program would keep them properly identified. [FWIW, I am a senior software engineer, have been doing science application programming for over 40 years, have BS Physics Va Tech 1971, PhD Geophysics MIT 1983. If I wrote something that allowed a user to make that big a mistake it would never be accepted by anyone. I might even lose my job because of it.]

Now to answer some of your questions. My usual process was to create a Backup entry for select folders on the C: drive, no encryption, give it a destination, typically F:\DuplicatiBackups, no schedule since a detachable disk may not always be there and anyway the contents of those selected folders almost never change, and tell it to save a fixed number of backups, either 1 or 2. When it said the Backup entry had been created I would click Run Now. I would then repeat the process to back up a different set of C: folders. You are now telling me I should have assigned a different backup location for each new entry.

The files lost were the backup zipfiles on the USB disk drive. The error messages typically said that files were missing in the database, please run Repair. It looked like the database hadn’t been completed but apparently I was creating a database conflict. The error messages should have told me what the real problem was even if they didn’t lock me out from making that mistake.

For completeness sake, the About page said:
You are currently running Duplicati - 2.1.0.5_stable_2025-03-04

A followup. I made two backups in different destinations on my USB drive That seems to have worked okay.

I checked restoration by deleting a folder on my C: drive from the first backup, then had Duplicati restore it. The restoration worked, restored the folder I had deleted. However the message bar at the top of the screen said it had restored the second backup, i.e. the name of the most recent backup I had saved, not the one I had just restored. I repeated the test but with a deeper subfolder in that first backup, and again it restored the deleted folder but displayed that it restored the name of the second backup I had made. You have an off-by-one bug or an uninitialized variable or display string.

As I said:

Maybe the developer can at a minimum update the new documentation that was created.

Old manual had more detail, including the warning below, but documentation isn’t a great replacement for protective code. Strangely, I can’t find any GitHub issues just on this.

When the dust settles in the discussion here, maybe someone should file one, since discussion in forum is easy to lose track of, whereas issues get tracked more formally.

Creating a new backup job is in the older long version of the manual. New one lacks it.

Yes and no, IMO. I’m not the dev, and there’s probably some early development history.

Command Line Interface CLI came before the server and GUI, and it is still in use today.

Each command also requires the option --dbpath=<path to local database>, but if this is not supplied, Duplicati will use a shared JSON file in the settings folder to keep track of which database belongs to each backup. Since there is no state given, the remote url is used as a key, because it is expected to uniquely identify each backup. If no entry is found, a new entry will be created and subsequent operations will use that database.

By intent, command line runs can run independently of each other and of what GUI does (however if you want to run a GUI job from command line, use ServerUtil to request that, which also keeps your CLI run of Export As Command-line from colliding with a GUI run).

Back to databases, the model above associates a destination folder (given in URL form) with database assignment, so if one happened to put different source trees into a single destination with different Duplicati.CommandLine.exe (Windows name) runs, database is experiencing whiplash in terms of what each version asks, but keeps destination straight.

You can cause the same change-of-mind in the GUI too, and it’s fine. Each version keeps what Source it’s told, and one folder of Destination files has data from the varying Source.

Different (intentionally or not) Source through one job database to one Destination works.

What doesn’t work is multiple job databases thinking they are the DB for same destination because (right at the initial backup for new job) the new job will see files it’s not expecting. Pushing through that might work, but the new job will then lay surprises for the original job, probably including some extra files, and (I think) potentially some missing ones from doing compact or something which will delete files that the original job had put on the destination.

So returning to the question, one piece at a time:

The job database keeps track of Source files, and the Destination files from processing.

The “it” here is a bit vague, and I’ll dig in because you have the experience to follow this.

CommandLine does have the map of destination URLs to databases in dbconfig.json, and in a sense that’s “its backups” although it doesn’t know which have fallen into disuse, and (I think) making it forget a mapping is kind of a hand-edit of a text file, so not elegant.

GUI has server database Duplicati-server.sqlite and a definite idea of “its backups”. Destination is expressed in URL form, and it could and should question destination reuse.

I’ll now do a test with a recent Canary public-testing build to make sure it still doesn’t look.

I exported a job and imported it to create a new one in same Duplicati. Try to save it, see:

Fix the name, and it saves with no complaint, so now I have two jobs with two databases. Import doesn’t import the old database path, which is probably a good idea), but sets the situation where two jobs with two databases think they both own one destination, so bad.

One could try to catch the error based on destination URL, but a given destination could support multiple protocols, such as FTP/FTPS, and SSH/SFTP, and WebDAV, and SMB.

One could have the Destination Test connection button look more closely for files that might be Duplicati files, but false positives could occur, and Duplicati supports option for:

  --prefix (String): Remote filename prefix
    A string used to prefix the filenames of the remote volumes, can be used to store multiple backups in the same remote folder. The prefix cannot contain a hyphen (-), but can contain all other characters
    allowed by the remote storage.
    * default value: duplicati

which handles what I view as a rare case (but I think it’s been used). This unknown is in Advanced options, so is hard to handle as early as Destination, but it could sort of try.

Another potential unwanted end is picking some folder full of other existing files. Duplicati only looks at its own files, but the user might not want backups polluting, say, Documents.

At job Save time, additional checks happen in both JavaScript and run by Duplicati server. Server could look at the final combination of URL and prefix and flag any seeming conflict, however it would still be fooled if a destination used different accesses in the different jobs.

It also could get a surprise at actual backup time because it has not seen destination files which might have been put there by a Duplicati on another computer. No global tracking…

For a local check based on database information, it would seem nearly as good to check whether a previous backup had been configured to that destination folder, though I gave ways where that could fail. Actually looking is safer, but a look at Save time may surprise.

Destination access at other times (e.g. Backup) should be no surprise, and it’s done right away at start of backup, and destination use is prevented by failing backup with a popup.

Popup error has accidental corruption in mind, and says to Repair, which might be wrong.

A backup is some dlist (file list and reassembly info), dblock (data), and dindex files. When things go perfectly, everything aligns as expected. If not (interruptions can do this), there may be missing or extra files perceived. An extra file happening this way looks much like an extra file produced by any other backup run by this or some other Duplicati server.

They are properly identified in the job database, but that’s per-job not per-Duplicati server. Possibly some of this is historical to when the CLI database was per-destination, which is ideally treated like a job with some consistency, e.g. of the source files that it’s backing up.

So first question is on ways to avoid misconfiguration, and then second is how to limit the damage when it happens anyway. One way to kill a backup was to restore a stale job DB from some other backup (maybe a drive image), try to backup, and get told to do Repair which will try to reconcile Destination to stale DB-of-record. I think that will be stopped by.

So that one does a sanity check on backup version times I think (I’m not a C# developer).
If Backup funnels users into Repair when re-use is attempted, a similar time check could realize that mostRecentLocal doesn’t exist, yet after prefix filtering, Destination has some.

This scenario is fine if one is using GUI Database Recreate (delete and repair), repair has this two-way behavior I’m not sure I like where it either tweaks things or does DB recreate:

`Usage: Duplicati.CommandLine.exe repair []

Tries to repair the backup. If no local db is found or the db is empty, the db is re-created with data from the storage. If the db is in place but the remote storage is corrupt, the remote storage gets repaired with local data (if available).`

I think the help text supports my thinking that it has corruption in mind, not config dangers. Regardless, making Repair more sensitive to other situations may be an approach to this.

I’m thinking that there was a recent idea of having the DB get recreated automatically if destination exists but DB doesn’t (maybe as a convenience), but I can’t find that currently.

Regardless, the dev might know unless I imagined it, and I’ve given enough talk already, and I’m not the person knows the code or history. We’ll see if the dev will give comments.

The web UI in 2.1 is a bit flaky for me, but I’m not seeing this, if I understand correctly.
There’s not AFAIK a show of the job name after the restore. It does show in progress.

image

and afterwards there’s a sometimes-optimistic success message in main screen area.
Status bar returns to idle state, giving the schedule status for next job, or saying none.

image

Problem needs more information, but so far I can’t repro any kind of wrong-name bug.
Maybe the dev will recognize it, but more important I think is comment on damage by configuring two jobs to one destination, then maybe running Repair when told to do it.

suggests that the dev was receptive to the idea, but then lost track of intent for a check. Previous work was spare-time. Now it’s paid, by Duplicati, Inc. Planning is more formal.

Anyway, it’s on the table again. Don’t need to use my ideas, but something needs work. Preferably this won’t get to design-by-committee, but you might also have design ideas, keeping in mind the existing design, as I’ve tried to describe it in quite some detail here.

For your restore issue clarification, I’d suggest a new topic, leaving this for original topic.
Knowing that most went well when not falling into a bad-config-that’s-too-easy was nice.

EDIT 1:

Add option to automatically create database if none exists #5932 is what I tried to recall, which maybe means if that gets added and chosen, the second job on same destination builds a second database for the same destination, instead of a popup error on problem. Similar problem to manual configuration of two jobs into same destination results though.

Thank you for the prompt and detailed responses. FWIW, I think I am okay with how Duplicati works now that I know a few of its foibles. I won’t bother you with anything else unless I run something new.

1 Like

Well I’d still like to see if we can improve this at least a little bit someday. It’s a hazard.

A note in the manual should be easy, and after that it’s in code, so need dev comment.

https://github.com/duplicati/duplicati/projects?query=is%3Aopen

looks like the planning for 2.2 and future, and there’s also stuff on Duplicati’s Roadmap.

As new issues can show up at any time, there’s probably some room to get more in 2.2, although maybe not the ultimate design. We’ll see if the dev will add any tracking issues.

Thanks for your understanding, and sorry about the unwanted surprise from a config that anyone could easily do. On some backup systems, it’s even advisable (for deduplication).

I’ll now point to your original post on this, but I’m glad you opened a new topic for support.

Thanks for the quick response. FWIW, the source of my problem was that nothing in either documentation or the running UI mentioned the need for each backup entry to have its own destination instead of a location to place the backups. The User Interface seemed so intuitive that reading a manual didn’t seem necessary. Even if documentation had said this I or any other new user would likely not have seen it.

I have no other recommendations at this time other than the improved messages I have already described.