The backup storage destination is missing data files but repair did not succeded

public1 · March 30, 2023, 1:51pm

Hello all,

I have a new problem again, which I can’t fix at the moment. I´ve tried so look for similar questions/threads in this forum and also in the www but I did not found a solution that fits.

Host system: unraid, duplicati in a Docker container
I am running a backup through my photo collection. This involves 2TB of data. Since it is a very large amount of data and would also take about 9-10 days to upload everything at once, I have to interrupt the job occasionally as other important backups need to run in between as well. The photo collection is continuously changing as images are synced to this folder from the existing smartphones/tablets.
Now the error has been occurring for some time:
“2023-03-29 12:48:47 +02 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-MissingFileDetected]: Remote file referenced as duplicati-b88fe10b51xxxxdblock.zip.aes by duplicati-i1d5f9d8xxxx.dindex.zip.aes, but not found in list, registering a missing remote file.”

Attempt: Now when I perform a repair (or directly deleting and recreating) the database, I get the following error:
“The backup storage destination is missing data files. You can either enable --rebuild-missing-dblock-files or run the purge command to remove these files. The following files are missing: duplicati-bfef4f7axxxxx.dblock.zip.aes”.
Try: Command “rebuild-mussing-dblock-files”.
Backup started at 3/30/2023 3:43:52 PM
Checking remote backup …
Listing remote folder …
Missing file: duplicati-bfef4f7abe2e64a06b04dc655f83385fe.dblock.zip.aes
Found 1 files that are missing from the remote storage, please run repair
Fatal error => Found 1 files that are missing from the remote storage, please run repair

ErrorID: MissingRemoteFiles
Found 1 files that are missing from the remote storage, please run repair
Return code: 100

Attempt: Command "purge
ErrorID: CannotPurgeWithOrphans
Unable to start the purge process as there are 31726 orphan file(s)
Return code: 100

However, it could be that I don’t know how to execute commands correctly.

gpatel-fr · March 30, 2023, 3:37pm

Hello

I think that the proper way to upload a large amount of data in batch is to select part of the data and let the partial job finish, not to try to upload everything and interrupt the job (how ? not by killing it I hope…), so in your case creating a job with 1/10 of your source data, after it finishes add the next 1/10 for the next iteration, etc…

public1 · March 30, 2023, 3:39pm

Hi @gpatel-fr

this is a quite good idea for the next time when I will set up a new backup job but for now, this will not solve my problem. It would be great to keep the uploaded data and just to “re-activate” the existing job.

gpatel-fr · March 30, 2023, 4:24pm

I don’t know that such thing is possible in current Duplicati.

ts678 · March 30, 2023, 8:09pm

That’s what happens on next backup. It may scan the files, but uploaded data remains.
So how are you stopping the job? The next best thing is the Stop button then wait a bit.

public1 · March 31, 2023, 7:21am

Ideally, of course, the next time you start the backup, the incremental backup will also work. But exactly when I try this, all the error messages come up, which I described in the 1st post. Therefore also my question how I can fix the error messages.

ts678 · March 31, 2023, 12:56pm

Most certain way given lack of information is to start over again. This is especially relevant if backup is running at default blocksize of 100 KB, which is suitable for 100 GB of backups before things get slow. Because you have a “very large amount of data”, maybe set 2 MB unless you want to allow for growth. Going high is not much of an issue for photos, because blocksize is for deduplication of repeated data visible at the file block level. With photos, two shots of the same thing will look very different in the files.

Because you can’t change blocksize on an existing backup, this may be an incentive to start from new.

If you don’t want to do that, you can consider providing information such as link to database bug report.

If you don’t want to do that, it may turn into a Q&A session, e.g. are cited files some of the newer ones?

Having extra log file options turned on would help understand how it got bad, but it’s too late currently…
If problem is easily reproduced by something you did, say what and maybe it can be reproduced to lead eventually to a fix. Ideally Duplicati withstands lots of abuse, but there may still be holes to be identified. Some are currently issues, with steps to reproduce, but I’m not recalling known steps to this. Got steps?

In terms of what you tried (which are mainly aimed at losses you can’t recover from in any cleaner way):

The first question is whether you were trying to recreate the database, or maybe had auto-cleanup on.
Every dblock file should have a dindex file saying what’s in it, and every dindex should have its dblock.
Sometimes, especially when interrupted, things get mismatched. You have a dindex without its dblock. Was that something like the last dindex uploaded before interruption visible from lack of a flow of files?

Please clarify. It’s not a command. It’s an option to the repair command where it was mentioned.
Although there’s a misleading status in GUI top status bar, that quote looks like it’s Commandline,
where you’re running backup (unless repair at bottom is asking for repair, which is possible…).
Where does “rebuild-missing-dblock-files” fit in? Use dropdown at bottom to set advanced options.

The PURGE command purges files to your un-stated specifications. Did you try to tell it something?
I forget what orphan files are, but from quantity, maybe backup interruption messed up transactions.
Database bug report may reveal a lot (after a lot of studying by an expert) if you wish to provide one.
For a rough ballpark, is that maybe all the files uploaded in fraction-to-date of this 9-10 day backup?
Another way to get a smooth backup is Export As Command-line for shell, and leave it alone in GUI. Looking after it’s done is fine. You just don’t want both in the database at the same time. They clash.

EDIT:

Regarding orphan files, at least one definition seems to be files that are not in a backup version, per:

github.com

duplicati/duplicati/blob/25381bd1abe975972b696c0e1f116a5a4dac5d0c/Duplicati/Library/Main/Database/LocalPurgeDatabase.cs#L51-L59


      
          internal long CountOrphanFiles(System.Data.IDbTransaction transaction)

          {

              using (var cmd = m_connection.CreateCommand(transaction))

              using (var rd = cmd.ExecuteReader(@"SELECT COUNT(*) FROM ""FileLookup"" WHERE ""ID"" NOT IN (SELECT DISTINCT ""FileID"" FROM ""FilesetEntry"")"))

                  if (rd.Read())

                      return rd.ConvertValueToInt64(0, 0);

                  else

                      return 0;

          }

This is possibly explainable by what makes it into the database from transaction commit versus what is rolled back on restart after unexpected interruption.Still waiting for any comment on how that was done.

As an unfortunate side note on that, rollback also rolls back evidence of some history, so external logs are more reliable (though rarely actually set up unless someone is working hard on diagnosing issues).