I am currently using Duplicati for a backup of some local files on my PC’s SSD to an external HDD. Now I noticed in the task manager of windows that there are constant write operations to my local SSD (where the source folder sits). They are up to 30MB/s.
Does anybody know what is happening here? Shouldn’t there only be reading operations? In the long run this would wear out the SSD.
Ahh, now I see the database file growing, I just did not think of that. Of course there would be some temp files, but the write rate seemed very high to me. But now with the database it makes sense (and this is also my first backup so a lot to write).
One can move the database there. That’s what I do, mostly to have a nice bundle if PC HDD ever dies. Without that, one would have to recreate a DB to do disaster recovery. It also balances the drive loads. Depending on size of your backup (current defaults are good for 100 GB), database use can grow to a higher level as backup continues, so then you can decide if you prefer slower, or faster plus SSD write.
Especially for initial backup, database will grow as it goes. This may mean more accesses (although many of them will probably be reads). If done on HDD, it’s slower. SSD is faster, but subject to wear. During my backup doing HDD onto portable HDD with database, Task Manager could see load shift.
What’s best depends on which factor you prefer to optimize.
How big is source? If big enough, blocksize bump may help.
This would require a fresh backup start though.
As a side note, drive letters change. Windows Drive Letters provides special handling for Destination, however the more other things get moved to a drive letter, the more things must change if letter does.
Currently my source is 550GB and will probably grow with ~100GB/Year. Right now I am using the default settings for block size and remote volume size.
My external drive is just connected via normal USB.
Regarding the drive letters, I have already thought about that and set it manually to a higher one so this one will always be free and should not change (at least it did not until now after a few unplugs/plugs. But in case it changes, i can just change the letter for the drive manually before the backup. Then I would not have to change the configs.
Speed is not my main focus, I am probably happier if my SSD just lives longer (although I know that does not matter too much nowadays if you not write crazy amounts).
Lets say I move the database to the external HDD and the performance is fine for me, do you anyway recommend a change to the block size and remote volume size for my case when the backup is not to a cloud?
Block size, yes, to keep database speed up, especially given the expected growth of the source area.
Remote volume size is less clear, and even depends on things like drive format. exFAT (unlike NTFS) does linear search, so slows down with a lot of files. Regardless, yours sounds similar to mine, and I scaled both values up by 10, so blocksize 1 MB and Remote volume size 500 MB (although thinking about it more, it depends on frequency of restore. For disasters, big is probably fine – doing frequent restores might find things like single-file performance better with a smaller Remote volume size). This hasn’t been benchmarked, and in general I would focus on a blocksize boost for a larger backup IMO.
is the GitHub issue, which you can trace back to the forum where we figured out the issue with exFAT. Backup was probably larger than yours in this case though. Also FYI Remote volume size can change (preferably slowly) if you don’t like it, whereas blocksize generally must be set right from initial backup.
Before data is uploaded to the backup target, each chunk of data is written multiple times to local storage. Probably database transaction is just a small part of the write activity you see in task manager.
Especially during the initial backup, I guess more data is written to local storage than the total size of the source data. This is because quite a lot of operations have to be applied to the source data before the backup data is uploaded:
All source data has to be split up in blocks of a fixed size (default is 100 KB). During the initial backup, deduplication does not occur, so all source data will be split up in 100 KB blocks and archived in a Zip file. Before this Zip file is created, raw blocks will be stored in .TMP files. This is the first write operation on local storage and is about the same size as the complete selection of source data. These temporary files will be stored at the location specified by the tempdir setting.
The temporary files containing your raw data chunks have to be archived and compressed in .ZIP archives (Remote Volumes). So all data will be sent to anew file where the blocks are stored in a compressed format. I’m not 100% sure if this is a separate operation, or if this is combined with the first step.
If encryption is applied (which is the default), the generated .ZIP files (remote volumes) have to be converted to encrypted archives (.ZIP.AES files). This is again a new write operation of the same source data. The encrypted archives will be created at the location specified by the asynchronous-upload-folder setting.
In the meantime, additional writing is performed to the local storage. If free internal memory is not sufficient, TEMP files will be created, writing to the local database is performed, etc.
So the total amount of source data will be written to local storage 2 or 3 times before it is uploaded to the backup location.
To minimize these write operations, use the tempdir and asynchronous-upload-folder settings to perform these actions on cheap storage.
On my Synology NAS, I have plugged in a 16GB USB key, created a small folder structure (/temp/duplicati/upload) and pointed the temp and asynchronous-upload folders to respectively/usbshare1/temp/duplicati/ and /usbshare1/temp/duplicati/upload/ (on Windows use something like X:\temp\duplicati and X:\temp\duplicati\upload).
@kees-z Thank you for your detailed insight! But now I have some more questions:
Good that you can confirm that, because thats what I also thought yesterday evening after some further thoughts and research.
Why is this even done on local disk and not in RAM? Of course on the local disk there will be much more space available but that can not be the reason. RAM would be way faster by just writing the end result to disk. And for online backups to a cloud the limiting factor is the internet connection anyway so you do not need to process much data in advance to keep up with the internet connection. Or am I missing something?
Now I will think about mounting a folder/virtual disk in RAM (I am on windows but I am sure one can set up something comparable to tmpfs on linux). Because using a USB stick is probably a big bottle neck for backups to a external HDD).
I am not deep into the topic, so what do you think about that? I am just curious and maybe there is also a good reason for all this.
That’s a good question. IIRC there is a pending PR for that, but it’s a complicated code and there is some concern of breaking stuff. On a more immediate mode, this is mainly a concern for the initial backup of a huge amount of data, that tends to be done on rather powerful systems that can afford a Ram disk, and it’s quite possible for this initial backup (or even all backups) to redirect temp files to this device. It is making the backup faster indeed.
Unless I am (recently) mistaken, you will need a server operating system. On client systems, you have only paid addons, freeware (very limited), and open source addons (not quite as reliable I think as tmpfs or the built in solution in Windows server OS)
Not sure how a USB stick would interfere with making backups to an external HDD. As long as drive letters don’t change, you can point TEMP and UPLOAD files to the USB stick and upload your backup files to the external HDD. USB3 has enough available bandwidth to send both through the same bus.