When are old blobs read?

I have a virtual Windows file system that is backed in a tape library (QStar). As data is written to the file system it’s is cached locally and then eventually written to tape. When one attempts to read the contents of a file that has been written to tape and isn’t in the cache the backup system pulls the file off of tape and puts it in the cache and the file can be read. Metadata is stored in a database, so accessing the timestamp or size of the file will not cause the file to be read back from tape.

I’m considering using this file system as the backup destination for duplicati. Issues that I’m concerned about:

  1. When duplicati deletes old files, does it just delete the blob or does it need to read the blob? I’m hoping it’s just deleted, then the file doesn’t need to come back from tape.

  2. Does duplicati read the old blob files other than when doing a restore? Do I need to disable verification of files for this or are only recent files verified?

  3. If I have a local database for the duplicati job, does the process of choosing files to restore require that the blobs be read? Or are the blobs only read once the restore process is started?

You can test this all yourself, e.g. watch About → Show log → Live → Information, but I think it works like

Delete only needs a delete, however sometimes a read leads to a delete, e.g. compact does many reads, uploads a replacement dblock and dindex when it gets it full, and deletes the older dblock and dindex files.

Compacting files at the backend
The COMPACT command
no-auto-compact can avoid deletes, but wasted space will build up, and this also can grow/slow database.

There’s compact and a little bit of default verification (usually 3 files, usually recent) you can stop if needed.

Verifying backend files
The TEST command
Backup Test block selection logic
backup-test-samples

Restoring files if your Duplicati installation is lost or a database recreate will read lots of files, ideally just the dlist and dindex files, but possibly the large dblock files if information is inadequate after reading dindex files. Duplicati.CommandLine.RecoveryTool.exe reads the dlist files and the dblock files, dodging dindex troubles.

No.

Sometimes not even then, as dblock file downloading is avoided if blocks are already available locally.
no-local-blocks

Duplicati will attempt to use data from source files to minimize the amount of downloaded data. Use this option to skip this optimization and only use remote data.

which is probably heading the way you want. Heading the other way, I think it downloads dblocks to get metadata (e.g. timestamps and permissions). This would be one of the last things set up in the restore. Watching live log at a modest level like Information shows downloads, but restore details need Verbose.

Duplicati is not tape-aware, doesn’t track physical placement, and works best with fast random access. Caching software can probably cover up some of that, but some might prove difficult to cover up totally.

tape backups, e.g. LTO LTFS #4200 has more discussion, although I guess your cache software helps.

EDIT:

If you don’t like live log, you can use log-file=<path> log-file-log-level=Information or something, and look perhaps with the help of some tool such as find, findstr, grep (if you one) or some other string finder.

2024-04-18 06:19:15 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b03e4226ef8eb40e9be85d1c4e2d1d282.dblock.zip.aes (49.91 MB)
2024-04-18 06:19:18 -04 - [Profiling-Duplicati.Library.Main.BackendManager-DownloadSpeed]: Downloaded and decrypted 49.91 MB in 00:00:03.4898511, 14.30 MB/s
2024-04-18 06:19:18 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b03e4226ef8eb40e9be85d1c4e2d1d282.dblock.zip.aes (49.91 MB)

The middle line is at Profiling level, which probably logs too much, but you can find DownloadSpeed with
log-file-log-filter, and then you won’t need an external string finder if all you care about are the downloads.