Backups taking extremely long despite minimal updates to database

Serotonin5141 · August 28, 2024, 3:12pm

I have a remote backup over SFTP that consistently takes roughly 8 hours despite the updated files only amounting to 100MB or less. The backup logs don’t explain the reason and there is a huge discrepancy between the figures in the breakdown vs the total backup time.

I’m hoping there might be a solution that doesn’t require redoing my backup from scratch. I’m on the latest build within a Proxmox VM on ubuntu 22 fully updated.

Remote backup log shows two important entries back to back:

Aug 28, 2024 3:54 AM: list
Aug 28, 2024 9:53 AM: put duplicati-bdh653746574857b08377afe44f36dh74.dblock.zip.aes

ts678 · August 28, 2024, 7:21pm

Welcome to the forum @Serotonin5141

The phase breakdown is of the specialty phases after the backup itself whose time is the rest.
I think some of the breakdown is based on the raw log design as visible in the Complete log.

Compacting files at the backend

Verifying backend files

The time in between is probably looking for enough changes to fill that dblock for an upload. Partially filled dblock files will, of course, upload at the end of the backup (not finished yet?).

You can watch About → Show log → Live → <level>. Verbose would be a good starter level.
That will show source files. Just examining all 1105795 files for signs of change takes awhile.
Candidate files are opened to see if anything actually changed. Any changes are backed up.

File analysis is block by block, and for a backup this size the default 100 KB blocksize is low.
If you’re at default, you’re stuck with it without a fresh start, but it might not hurt all that much.
Looking through 14 MB of files ought to be fast, however I’m wondering how long file walk is.

log-file=<path> and log-file-log-level=verbose is an alternative to looking at live log sometime.

Serotonin5141 · August 28, 2024, 7:39pm

Thank you.

I checked my config and block size is 128KB.

What would you suggest I change about my config to decrease the backup time:

A. If I can recreate the backup fresh
and
B. If I cannot?

And would the changes result in a significant time improvement, worthy of a fresh start?

ts678 · August 28, 2024, 8:01pm

Choosing sizes in Duplicati gives some guidance and tradeoffs. Some of us think 100 KB is enough for 100 GB, and one scales up, so 5 TB might be about 5 MB. But that won’t help file walk speed…

I suggest you do the log checks as advised, and see if it seems to be spending its time walking files.

Serotonin5141 · August 28, 2024, 10:26pm

In the remote logs there is an entry called list (same as shown in first post), which shows the various chunks as it checks each one for changes. This process makes up the entirety of the time between those two remote entries. Is this the file walk you speak of?

I believe it is. If so, what can be done to improve it?

ts678 · August 28, 2024, 10:52pm

I don’t know of anything that says what you say. Got any output from it supporting that idea?

is an example of a list at the start of the backup, which shows what the destination files are.

Do you have a start and an end time? If just one time, then it’s off doing something else after it.
That something would be backup work. To see the start and end time, get a log (live or file log)

is live log at Information level. The 7 is right, but bytes is not. It’s the number of destination files.

I doubt it, but you can click on your list to see if it expands to a list of files at the destination.

By walk, I mean every million-plus files in the Source has to be checked for clues of a change.
This is probably slower when in a VM, and if there’s a hard drive below, that slows it down too.
Possibly using an SSD would help. The suggested verbose log would also reveal file activities.

Possibly you can see how many files it does per second, to see how it extrapolates to a million.
The file counts are also on the GUI status bar. It counts up, until done, then works count down.

None of this is anything I can see, so please get some observations to support or refute theory.

Serotonin5141 · August 28, 2024, 11:07pm

The example list you posted is what I was referencing. I was misunderstanding its function and I was also mistaking the last accessed timestamps as current timestamps during backup.

I will enable a live log and report back tomorrow once the scheduled backup is complete.

Thank you.