File:// destinations - lot of unnecessary (?) IO load

Sami_Lehtinen · April 12, 2019, 7:26am

I’m just wondering if it’s really necessary to copy files from / to “storage url” and “temp path” back and forth, while backing up and especially when restoring / testing.

Yes, I do understand that in internal logic, it’s usually a good idea to always use temp paths. But in this specific case, it seems bit excess. Once again, especially when restoring.

I think it would be pretty perfect, if there would be option to skip usage of temp, when backup location is “sufficiently accessible”.

Any thoughts if this could be optimized? At least in my cases having this kind of option would cut the restore / test time by at least 50%.

Onurtag · April 14, 2019, 9:58pm

This is probably why it is suggested to use --use-move-for-put, --disable-streaming-transfers and --tempdir options for local backups.

Sami_Lehtinen · April 15, 2019, 4:56am

use --use-move-for-put, --disable-streaming-transfers and --tempdir

Excellent. I’m sure I’ve seen disable-streaming-transfers but didn’t realize that it affects also file:/// references. --use-move-for-put that I didn’t remember at all, but that’s because it’s under help file which I clearly hadn’t looked in enough detail.

But use-move-for-put does only help in very specific cases, probably in the case where tempdir and destination is the same volume, and while creating backup. I tested that and sure, it slightly speeded up this specific case. Where backup from source is taken to local secondary disk. But in restore case I have there are a few extra factors, why looking for --direct-destination-access type parameter.

Environment is like this:

Source drive
Temp Drive
Destination / target backup file-path (over SMB network share)

Why this case isn’t clear cut? Well, because I don’t exactly know the access pattern for the temp path. If it’s only used to linearly access data once, then it would be okish to move the temp path to the target. Which is ok, in case where temp drive and destination / target is the same drive / volume.

But what about when restoring data. Currently the system works in reverse. If I use move instead of put, it won’t help, because restore won’t be moving the source files anywhere. Also there’s secondary question, if the files are read more than once. If the access pattern is such that the files are accessed several times, then using temp does make sense. But if the access pattern is only “read once” then there’s no need for temp at all.

It also needs to be taken into account that I’m restoring directly from the storage path. Which means that there’s always database rebuild step involved. Does that change the access pattern? Probably it shouldn’t? Isn’t the database built from list and index files alone?

So in this case, I would read the data files directly from the source, and use temp folder for temporary data, like the recreated temporary database. Which means that the tempdir should be local, even if the data source (backup target path) is over SMB network.

Just thinking … Using separate temp drive already speeded up the restore process a lot. Because it made disk access much more efficient. But during the database rebuild step, I’m not sure if it’s a good idea to have the database and the tempdir data on the same drive. I’m just wondering if I can point separate dbpath for the database? Of course this wouldn’t be the problem if the temp (copy) is completely side stepped using direct access.