Why restore needs so much time of verifying remote data

Hallo,
my backup was completet (122.000 files a 50mb) without encryption and without compression. 4 *.dlist.zip files was created, (i rerun the process). There arent so much files, most of them are over 3gb in size.
Now I want for testing to restore only one file. But Duplicati shows me only Verifying remote data ... (also after 1h) In log it tells Backend event: List - Started: ().
Why verifying is needed on restore process and why it takes so much time?
Would it be faster to set the block size to 500-1000mb, or how i can speed up? Does it take only one time so much time?

Also i see that my database is 8gb in size. For what is this used and why its so big? When it is for the backup, why it isnt inside the backup itself (home/AppData\Local\Duplicati)?

did you change the log level to verbose ?

well, there is not much to go on there. Raise the log level.

idem

to save the structure of your backups to speed things, for example when restoring. You did not start a restore from files by any chance ?

probably because your backup is big. I can’t parse what you are meaning by ‘122.000 files a 50mb’ by the way.

Because it’s the default location probably.

It should have started and completed a lot of those while you were setting up (assuming this is GUI).
Each one should get a “Started” and a “Completed”, with a time and count. Can you see the others?

If list got stuck, nothing’s going to move. It’s not processing time. Do you see signs of processing?
What is the destination type? I assume it’s remote, as a local list seems really unlikely to get stuck.

EDIT:

Here’s an example of live log from a Restore. Below it were many unshown list done during setup.
Above it is the actual restore process. Are you saying yours stopped at second line, without the first?

image

To me it also looks like the List command got stuck. We had another case of that recently. What backend are you using? You should probably set a reasonable timeout depending on volume size and your connection speed (maybe 1-10 minutes), so a retry can be triggered. Unfortunately the default is unlimited.

The option for most backends is --http-operation-timeout in the destination advanced options.

possibly is reporting the file count (122 thousand) and dblock size (default 50mb Remote volume size).
If so, any list that completes would report 122000 bytes because the unit is wrong – it’s a count here.

does not make any sense at Restore time, if it’s talking about rearranging everything on the destination.
Choosing sizes in Duplicati talks about this a bit and it could be raised slowly, but that’s not the problem.

Assuming that’s 122 thousand files using default remote volume size, this backup is over 3 TB, so large, thereby causing large database with slowness many places, but stuck list is more likely from network.

and because it’s generally not possible to get needed access to a database on a typical cloud service. These don’t provide the kind of fine-grained read and write capability that local filesystem can support.
There are occasionally wishes to copy it to Destination after backups, but that’s a do-it-yourself option.

so if you want it remote for safety, that’s a lot of copying. You can usually just Recreate a lost database.

EDIT 1:

The “Choosing sizes” document unfortunately doesn’t talk about it, but the value to increase for a large backup is blocksize, because the default of 100 KB gets slow after 100 GB. Unclear what your backup size is, but it’s on the home screen. Can you tell us? Large source + default blocksize = large database.

EDIT 2:

Unfortunately you can’t change the blocksize on an existing backup, so it’s fresh start or an experiment.
This is all guessing though, until some clarification of the posted numbers can be provided by @hubert.
It’s also more of a general tune-up question, so probably a side issue from the problem reported here…

Amazing Community, i newer exspected so fast and so much responses. Thanks.

My backup result are 122206 files by 50mb size. Yes i startet a a restore where i only select one file.

I run Duplicati - 2.0.7.1_beta_2023-05-25 under windows 10. My backup destination is a local disk drive (hdd sata)

It now finished the restore.
between 8:29 until 10:28 the pc was shut down.
the warning The option compression-level is deprecated: Please use the zip-compression-level option instead comes because i added boath params (deprecated and new, to be shure ;-).

here is my verbose log:

Nov 15, 2023 10:44 PM: The operation Restore has completed
Nov 15, 2023 10:43 PM: Testing restored file integrity: D:\tmp\xxxx.mkv
Nov 15, 2023 10:43 PM: Patching metadata with remote data: D:\tmp\xxxx.mkv
Nov 15, 2023 10:43 PM: Recording metadata from remote data: D:\tmp\xxxx.mkv
Nov 15, 2023 10:43 PM: Backend event: Get - Completed: duplicati-b2d4886f2525344e2ac86bd9da8b67278.dblock.zip (49,99 MB)
Nov 15, 2023 10:43 PM: Backend event: Get - Started: duplicati-b2d4886f2525344e2ac86bd9da8b67278.dblock.zip (49,99 MB)
Nov 15, 2023 10:43 PM: 1 remote files are required to restore
Nov 15, 2023 10:43 PM: Target file is patched with some local data: xxxx.mkv
Nov 15, 2023 10:41 PM: Target file does not exist: xxxx.mkv
Nov 15, 2023 10:39 PM: Restore list contains 68782 blocks with a total size of 6,56 GB
Nov 15, 2023 10:39 PM: Mapping restore path prefix to "Y:\xxxx\" to "D:\tmp\"
Nov 15, 2023 10:39 PM: Needs to restore 1 files (6,56 GB)
Nov 15, 2023 10:39 PM: Searching backup 0 (15.11.2023 07:21:22) ...
Nov 15, 2023 10:39 PM: Backend event: List - Completed: (119,34 KB)
Nov 15, 2023 7:47 PM: Backend event: List - Started: ()
Nov 15, 2023 7:47 PM: The operation Restore has started
Nov 15, 2023 7:46 PM: The operation List has completed
Nov 15, 2023 7:46 PM: The operation List has started
Nov 15, 2023 7:46 PM: The operation List has completed
Nov 15, 2023 7:46 PM: The operation List has started
Nov 15, 2023 7:46 PM: The operation List has completed
Nov 15, 2023 7:46 PM: The operation List has started
Nov 15, 2023 7:46 PM: The operation List has completed
Nov 15, 2023 7:46 PM: The operation List has started
Nov 15, 2023 7:46 PM: The operation List has completed
Nov 15, 2023 7:46 PM: The operation List has started
Nov 15, 2023 7:45 PM: Cannot find any MS SQL Server instance. MS SQL Server is probably not installed.
Nov 15, 2023 7:45 PM: Cannot open WMI provider \\localhost\root\virtualization\v2. Hyper-V is probably not installed.
Nov 15, 2023 7:45 PM: The operation List has completed
Nov 15, 2023 7:45 PM: The operation List has started
Nov 15, 2023 7:45 PM: The option compression-level is deprecated: Please use the zip-compression-level option instead
Nov 15, 2023 7:44 PM: The operation List has completed
Nov 15, 2023 7:44 PM: The operation List has started
Nov 15, 2023 7:44 PM: The option compression-level is deprecated: Please use the zip-compression-level option instead
Nov 15, 2023 7:44 PM: The operation List has completed
Nov 15, 2023 7:44 PM: The operation List has started
Nov 15, 2023 7:44 PM: The option compression-level is deprecated: Please use the zip-compression-level option instead
Nov 15, 2023 7:44 PM: The operation List has completed
Nov 15, 2023 7:44 PM: The operation List has started
Nov 15, 2023 7:44 PM: The option compression-level is deprecated: Please use the zip-compression-level option instead
Nov 15, 2023 7:44 PM: The operation Test has completed
Nov 15, 2023 7:44 PM: Backend event: Get - Completed: duplicati-i60d65d7f339e45f7b47aa1a4d2944905.dindex.zip (135,66 KB)
Nov 15, 2023 7:44 PM: Backend event: Get - Started: duplicati-i60d65d7f339e45f7b47aa1a4d2944905.dindex.zip (135,66 KB)
Nov 15, 2023 7:44 PM: Backend event: Get - Completed: duplicati-20231113T181243Z.dlist.zip (3,60 MB)
Nov 15, 2023 7:44 PM: Backend event: Get - Started: duplicati-20231113T181243Z.dlist.zip (3,60 MB)
Nov 15, 2023 7:44 PM: Backend event: List - Completed: (119,34 KB)
Nov 15, 2023 7:42 PM: Cannot find any MS SQL Server instance. MS SQL Server is probably not installed.
Nov 15, 2023 7:42 PM: Cannot open WMI provider \\localhost\root\virtualization\v2. Hyper-V is probably not installed.
```

Was this a one-time thing, or is it reproducible? If you try to restore again (ideally with verbose log on from the start), does it take as long?

It is concerning that the HDD list takes so long, but maybe the block is somewhere else that does not show up in the logs.

If it only was this one time, it will be difficult to find out what caused it.

i will try again next day :wink:

The easiest way to get a list of the destination is to push the Verify files button for the backup.
You can watch in live log at Information level, and you’ll probably get one list and three get events.

I messed up my screenshot description. The lower list were probably by internal list a.k.a. find to populate the tree view. It’s the same name but in a different context. Your slow one is backend event.

The verification is also done by default at start of each backup. You might notice if it took three hours.

i start yesterday evening a new restore. PC was down from 23:23 until 9:07. I want now to stop it and click stop now but nothing happed.
I also test my hdd speed by copy files on explorer, 120mb/s.

`

  • Nov 16, 2023 9:40 AM: Cancellation Requested

  • Nov 16, 2023 9:19 AM: Cancellation Requested

  • Nov 16, 2023 9:10 AM: Cancellation Requested

  • Nov 15, 2023 11:14 PM: Backend event: List - Started: ()

  • Nov 15, 2023 11:14 PM: The operation Restore has started

  • Nov 15, 2023 11:14 PM: The operation List has completed
    `

will restart pc now and start verify files

Update:

before restarting the job finished:
`

  • Nov 16, 2023 9:46 AM: The operation Restore has completed

  • Nov 16, 2023 9:46 AM: Target file is not patched any local data: D:\tmp/xxxx.mkv

  • Nov 16, 2023 9:46 AM: Local source file not found: Y:\videos\movies\xxxx.mkv

  • Nov 16, 2023 9:45 AM: Target file does not exist: D:\tmp\xxxx.mkv

  • Nov 16, 2023 9:42 AM: Restore list contains 70012 blocks with a total size of 6,68 GB

  • Nov 16, 2023 9:42 AM: Mapping restore path prefix to “Y:\xxxx” to "D:\tmp"

  • Nov 16, 2023 9:42 AM: Needs to restore 1 files (6,68 GB)

  • Nov 16, 2023 9:42 AM: Searching backup 0 (15.11.2023 07:21:22) …

  • Nov 16, 2023 9:42 AM: Backend event: List - Completed: (119,34 KB)

  • Nov 16, 2023 9:40 AM: Cancellation Requested

  • Nov 16, 2023 9:19 AM: Cancellation Requested

  • Nov 16, 2023 9:10 AM: Cancellation Requested

  • Nov 15, 2023 11:14 PM: Backend event: List - Started: ()

  • Nov 15, 2023 11:14 PM: The operation Restore has started

  • Nov 15, 2023 11:14 PM: The operation List has completed
    `

I now run verify files. running at ~14% cpu useage.
`

  • Nov 16, 2023 10:34 AM: The operation Test has completed

  • Nov 16, 2023 10:34 AM: Backend event: Get - Completed: duplicati-b96b537bf22f848dca3a82a7c27d47ed2.dblock.zip (49,99 MB)

  • Nov 16, 2023 10:34 AM: Backend event: Get - Started: duplicati-b96b537bf22f848dca3a82a7c27d47ed2.dblock.zip (49,99 MB)

  • Nov 16, 2023 10:34 AM: Backend event: Get - Completed: duplicati-if0e58b0bc17f481cae5cde810818182f.dindex.zip (35,48 KB)

  • Nov 16, 2023 10:34 AM: Backend event: Get - Started: duplicati-if0e58b0bc17f481cae5cde810818182f.dindex.zip (35,48 KB)

  • Nov 16, 2023 10:34 AM: Backend event: Get - Completed: duplicati-20231113T181243Z.dlist.zip (3,60 MB)

  • Nov 16, 2023 10:34 AM: Backend event: Get - Started: duplicati-20231113T181243Z.dlist.zip (3,60 MB)

  • Nov 16, 2023 10:34 AM: Backend event: List - Completed: (119,34 KB)

  • Nov 16, 2023 9:50 AM: Backend event: List - Started: ()

  • Nov 16, 2023 9:49 AM: The operation Test has started
    `

here it’s strangely slow.

Here it is incredibly slow.

here it seems insanely slow.

There is something strange going on with your target hard disk. What is it exactly ? What’s the interface ? is it a networked disk ?

3,5" hdd connected by esata formated as exfat, 5TB.

How fast can it list the files in, say, a Command Prompt running a dir command to destination folder?

During any list attempt, you could also watch Task Manager’s Performance section. Is the drive busy?

Do you ever look at S.M.A.R.T. stats for that drive? If not, there are tools, for example CrystalDiskInfo.

If there is a bad sector on the drive, first the drive will retry, and then Windows will. Failure gets logged.

Data corruption and disk errors troubleshooting guidance (Microsoft)

Some of their guidance talks about NTFS and some doesn’t. I’m not certain which ones go with exFAT.

Windows is persistent, so possibly you have a situation where a huge number of tries eventually works.

Whatever disk areas that reads is probably good, but that doesn’t say anything about the other sectors.

EDIT 1:

Directory file set in Wikipedia suggests exFAT directory access may be slower than NTFS due to linear searching (with some help). I suppose it’s possible this is slowing things, but you’d have noticed before perhaps? It might depend on how many backups you’ve done so far, and whether you watched closely.

EDIT 2:

looks to me like a lot of lookups by name, but I’m not a .NET developer and didn’t research very hard.
Regardless, this would probably show up in the Task Manager test as that drive being kept quite busy.

EDIT 3:

If you want to watch at what I think is the driver level (which looks different, but is a little recognizable), Sysinternals Process Monitor can do that. You would want to filter (maybe to that drive?) to limit noise.

I don’t see how it’s possible to store 122000 files of 50 MB on a 5 TB disk.

Sorry my fault,
i dont saw the dindex.zip files.
Its like the half (61000)

dir was fast, took 15secs at 15% cpu useage.

i had some of this errors in the past. Maybe because i used my drive with a sata-usb addapter. That maked problems (after some time the drive was disconnected). So i used the esata.
And for that time (last 4 days) i didnt had any of this errors.

What was the history of this speed problem (assuming backup has been roughly constant size)? Theoretically any slowness from the Duplicati code should have been about as slow throughout.

nice tool.
will test it the days

I happen to have a 5 TB exFAT USB drive. I can get just the list output from it using a CLI tool like:

Duplicati.CommandLine.BackendTool.exe list “E:\HP4\Duplicati”

and Process Monitor gets very busy. Possibly a linear search with a far larger folder is scaling poorly.

I would think someone would have hit this before, if exFAT lookup speed is the issue, but who knows?
Possibly I will see if I can get a file count similar to yours artificially just to see whether it will list slowly.

EDIT 1:

shows the rate at which a Perl script is creating files, and BackendTool is churning through a file list.
For comparison, the internal NTFS drive takes 15 seconds to list all 122000 files, redirected to NUL.
The large USB drive is SMR so writes slowly, but actually Task Manager shows it not particularly busy. Possibly Windows memory buffering is saving it from the full brunt of reading through its directory info.

EDIT 2:

Where is time going in Perl?

image

Where is time going in Duplicati list?

image

Info from Sysinternals Process Explorer.

The file creation took 1 hour 52 minutes. The list (started later) is still running, without competition.
Even as I type, it’s spent far longer than NTFS on the admittedly faster 7200 RPM NTFS SATA drive.
I don’t have an easy way to change a filesystem on the same drive. Current suspect is exFAT speed.

EDIT 3:

The list took 2 hours 7 minutes, but some of the time was spent while other things were using drive.
CPU load towards the end was 25% and drive load was about 0%, so probably Windows did caching.
CPU is an old quad core, so that might be as much of the total CPU as one process is able to use up.