Backup size (to go) grows after restart


#1

First backup is slow and taking weeks to finish (2.2TB roughly). Hence, it is natural that I restart the PC every now and then.

The question is why after a restart the backup has grown from (2.2TB to go) to (2.45TB to go). Also the progress bar has significantly reduced not to the same proportion of the above numbers.

The only one thing I see is that the number of files hasn’t grown accordingly. At the start of the exercise it was 1.9million files, while after the restart it is around 1.8million.

Please advise if there is anything wrong with the backup or it is only a reporting hiccup.

Thanks.


#2

It’s hard to say what the cause of an increase in file SIZE but not COUNT could be, but one potential example could with a virtual machine disk image.

The virtual disk image is a single file but if in the virtual machine you download 250G of files then the disk IMAGE that Duplicati sees will have grown by about 0.25T.

Of course that’s just an example - the same thing could happen if you’ve got any file file (database, log, etc.) that’s growing by 250G.

Note that when Duplicati first starts it scans all your Source and gets the counts and sizes then. So if a backup is taking a long time (like initial ones do), it’s possible that by the time Duplicati gets to a “further down” folder for it’s initial backup the contents of the folder could have changed causing the bar to appear “wrong”.


#3

This initial backup still going…I am trying to keep the system up but for some reason the backup load has make the system halt twice. I will probably open another trail to troubleshoot. In the meantime I can now confirm that the file count increase from 1.5million to 1.8million right after rhe reboot. Does it make any sense at all?

Thanks.


Backup has restarted from beginning after machine reboot
#4

Not really. I can think up some scenarios that might cause that but they’re pretty unlikely (for example, if you are including your destination in your backup and are using a a small dblock size which causes lots of files to be create).

What file count do you get form your OS when looking at your source folders? If you’re not using any aggressive filters in your backup the number should be fairly close to what Duplicati is showing.


#5

I believe the main reason for this is - between one backup and the next, assuming the backup process is restarting (and some files have been added/changed in the mean time) - Duplicati doesn’t necessarily process the files in order by ones that have already been completed. It also doesn’t check all files first to determine ones that have already been added.

Example: I have 10GB backed up successfully. Then I add 5GB more to the backup job and start a new backup. Randomly, Duplicati might (and probably will) handle some files from that 5GB before a majority of the 10GB that it doesn’t need to re-upload. But for a time, during this backup job, you might see “Remaining: 14GB”, etc.

Moral: take the “remaining” indicator with a HUGE grain of salt. I’ve begged in the past for this to be handled better but as of yet it hasn’t been made a priority.


#6

The solution approach I am taking is to backup small and run successful backups. And little by little adding new files. However the time taken each time increases as if new. For now 300GB out of the initial 2.2TB are succrssfully backup. Another reason for the restart is the power issues of handling one thunderbolt drive and two usb drives I am also facing. I managed to feed the thunderbolt out of external power to remediate but now the system halted between backup jobs. So duplicati might not be the reason. Surely I will find a job that will break the backup I am separating/grouping jobs to avoid impacting the successful runs. I will continue and report back. So far the two issues reported by the logs are:

  1. Permission (understandable if I try to backup /Library with a standard user.
  2. Metadata not backed up. In external sources.

Will report back with further data.


#7

Dbblock is 50MB. I will report back with numbers. In the meantime see below trail for new approach taken.


#8

That’s a solid approach that I’ve used it myself, keep us updated! :slight_smile: