Attention SSD users: if your temp directory isn't ram-backed, Duplicati will prematurely wear your SSD

NotHere · December 27, 2018, 1:45am

Duplicati writes files to a temporary directory before uploading them to a backend, and similarly when doing file verifies, downloads the files to disk.

This results in an enormous amount of extraneous, unnecessary writes. During an initial backup, if you have 400GB of data on a 500GB SSD, of which 300GB is user data - it will cause three complete write cycles across the cells in the entire remaining free space (assuming your drive does not employ a multi-tier cache.) Every blockfile you verify post-backup is also written to disk. The less free disk space you have, the worse the wear will be.

Edit: just for the sake of completeness: this applies to drives which implement dynamic wear leveling, but not drives which implement static wear leveling. Static wear leveling re-locates data on flash cells which have low write cycles (even those in use), specifically to avoid this problem. As the industry goes to great lengths to hide from consumers what kinds of wear leveling a particular model or family of drives utilizes, and how much over-provisioning is used, it would be beyond unwise to assume your drive has static wear leveling. Further, do not assume that because the controller chip in your drive is capable of static wear leveling, that the manufacturer of your drive has it enabled.)

On mechanical drives, it’s mostly just going to slow things down.

On my Mac, I created a ramdisk as such:

diskutil erasevolume HFS+ 'ramdisk' `hdiutil attach -nomount ram://8388608`

Note the single and back quotes. This creates a RAM disk mounted at /Volumes/ramdisk that is ~4GB (2096*4096), which is gross overkill for 100MB file blocks. I haven’t seen the directory get over slightly over 512MB from one of my backups with 100MB blocksizes, so 1.5-2GB is probably plenty, but YMMV. Experiment; start big, trim down.

Also note that, at present in 11-28 beta, the temp directory setting in one job seems to carry over to other jobs, which is baffling…and if you run a job using the “command line” GUI, lots of options are ignored or misparsed. Verify that the job is running as intended.

You can use the pre and post script options in your backup job to create the ramdisk and then post-job to unmount the drive (edit: do so via diskutil eject /Volume/volumename); it’ll cease to exist and free up the RAM. Note that you MUST destroy the drive afterward, as Duplicati does not clean always clean up after itself.

Linux users have the option of using tmpfs if their /tmp directory is not already set up that way. For example, on debian, check out the man page for /etc/default/tmpfs to enable it; Arch Linux also has a good page on the subject, as usual. Using tmpfs is advantageous because /tmp memory will be consumed/released as needed, and it pulls from physical and virtual memory. Unfortunately, not an option on MacOS.

John_Doe · December 28, 2018, 2:51pm

Windows users can use ImDisk.

Download an install ImDisk. (https://sourceforge.net/projects/imdisk-toolkit)
Run RamDisk configuration.
Set size.
For fastest performance, set a size that you want to constantly be allocated in ram.
If you want the memory to only be used as needed, tick “Allocate Memory Dynamically”
and set the size to something that you would like to be a maximum amount of memory
you would like the RamDisk to use. This is slightly slower.
Tick “Launch at Windows Startup” and “Create TEMP Folder”
Go to advanced.
Enter a volume Lable (I personally call it RamDisk and use Drive Letter R: ).
Tick “Use AWE physical memory”
This makes sure that the ramdisk is never written to the page file and always stays in memory.
In Duplicati, go to setting and set “asynchronous-upload-folder” and “tempdir” to you new Ramdisk Temp
folder. (E.G. R:\Temp)

Optional. If you want a more useful ramdisk, you can set a folder to sync the data to on shutdown in the Data tab. This will ignore the temp folder.

JonMikelV · December 28, 2018, 9:16pm

Thanks for sharing those tips @NotHere & @John_Doe!

How would you feel about making the initial post a Wiki (so others can update it) and combining your two into a single one with sections something like:

Use a ramdisk with Duplicati!

Why would I want to do that?

Make backups and verifications it faster!
Reduce wear on your SSD! (fancy numbers from @NotHere)

Linux - on-demand

Steps from @NotHere

Windows - persistent w/ImDisk

Steps from @John_Doe [/details]

drwtsn32 · December 29, 2018, 3:52am

Perhaps the ultimate fix would be for Duplicati just to store the files in RAM instead of on disk. If it stored directly in RAM there would be no need for a RAM disk.

Of course this approach may not work if someone has decided to use very large volume sizes…

davide.gironi · December 29, 2018, 10:08am

Hi there, I’m a Duplicati user from years, never write there in the forum.
Anyway.
Actually I’m running: 2.0.4.5_beta_2018-11-28 in Windows 10 x64
I’ve notice an high write usage also for the database file and for the c:$ConvertToNonresident file.
Do you have any clue about that?
Also, what’s about the use-block-cache settings? Does it helps?
Reference image: Pasteboard - Uploaded Image
Thank you!

John_Doe · December 29, 2018, 12:32pm

This. It does seem a bit odd that duplicati uses a temp dir for the blocks that are getting ready to be sent over the network.

The default block size is 100KB, so holding 10 blocks ready for transmission in memory will only take 1MB + overhead in ram.

I have my block size set to 500KB, and asynchronous-upload-limit set to 50. So at most that is 25MB of memory usage… That’s not that much.

When you get into big block sizes at 10MB and an asynchronous-upload-limit of let say 20. That is still only at most 200MB ram usage, That is more than reasonable for a running backup job.

As far as I know, you do not want blocks to be that big anyway. I read somewhere on the forum that each block will hold the whole or part of a file. if you have a block size of 10MB, and you had a 1MB file That 1MB file will end up taking a hole 10MB block to its self. That also seems a bit odd to me, so do correct me if I’m wrong, as I would like to use a bigger block size then 500kb.

Pectojin · December 29, 2018, 1:47pm

Actually, blocks are packed into volumes, which default to 50MB in size. So your numbers are a bit off, but it’s still a valid point that you could fit a couple volumes in memory if you wish

Correcting. A block will be up to that size but if the file is smaller it the block will be smaller.

John_Doe · December 29, 2018, 8:29pm

Watching the asynchronous-upload-folder, It looks like it holds the blocks, not the volumes in the folder. that’s why I was working with blocks and not the volumes. Again, if im wrong, do let me know.

As for smaller files producing a smaller block, if that is the case, I may just rebuild my backup to use 10MB blocks.

ts678 · December 29, 2018, 9:14pm

No, that’s entirely unrelated although the name and documentation are both vague enough to be tempting.
Useing a RamDisk for the asynchronous-upload-folder is how I think it works (and what the name refers to).
use-block-cache does not work as expected; high SSD wear #3566 speaks of rename, docs, and a feature (however features have a long backlog – good part is that gives some time to start discussing beforehand).

One tradeoff which I think is especially visible on a database Recreate is that an SSD can help add speed. Unfortunately it also takes its toll on the SSD. I don’t think we would want the database kept in a ramdisk…

is a Windows internal file. Your image refers to “Paging I/O”. So does this article. Search can find other info.

Several (but not most) third-party backend access methods have file-based APIs, so can’t stream transfers. Possibly this would be kind of a sometimes-can-have-it-sometimes-can’t thing by the time the dust settles…

This is probably partly for performance goals about keeping a steady upload, and the queue size is settable. There’s also the previously mentioned inability to stream files to some destinations, and worry about messes. Somewhere there’s a comment from the lead author to that effect, I think. At least initially, simplicity is good…

Of course it should but it doesn’t yet. Keeping messes somewhat confined to temporary files is a good thing.

Some people have wanted huge volumes for large backups. Choosing sizes in Duplicati discusses tradeoffs. Having a roomy drive is more forgiving of configurations, and Duplicati temp dup-xxxx files not being deleted, however whether or not one thinks SSD endurance is an issue now, the shift from TLC to QLC will get worse.

It’s hard to tell them apart (certainly not using the random file names). On my system I generally get a mix of 50MB dblock volumes and smaller files that are dindex files (done after the dblock, I think) and a dlist at end. Look at Job --Show log → Remote for lines doing put to get file names. Click on these lines to get file sizes.

JonMikelV · December 29, 2018, 10:47pm

It sounds like you may already know this, but you will have to start a fresh backup to change your block size. While dblock {upload volume) size can be adjusted on an existing back, block size can’t.

@ts678, yes - I do recall a discussion with @kenkendk about adding a parameter to allow Duplicati to do all it’s archive generation in memory instead of on disk (for performance reasons). Unfortunately, I’m not finding it in the forum right now but if I stumble across it I’ll try to add it here.

ts678 · December 30, 2018, 1:06am

I finally found the post that I was trying to recall in How can I speed up local backups?

Store temporary files in memory #2816 in 2017 talks about the non-seekable streams.

2014 comments probably predate Duplicati 2, but possibly some of the ideas still hold:

Attempt to avoid temp files if the backend supports streaming #181

The problem with this is handling retries, as the retry has to roll back the entire state and re-produce the volume.

If system disk is SSD and backups are large, Duplicati’s default options will cause write-wear #1032

This refers to unnecessary writes on SSDs where a system can write to a non-SSD location. If all you have is an SSD, you can reduce temp writes by disabling the upload asynchronously option.

–synchronous-upload is now the documented option. Anyone care to test if it will reduce writes and speed?

JonMikelV · December 30, 2018, 5:46am

I haven’t tested but I suspect it will not reduce writes as it sounds to me like it generates the files the same way but allows other tasks to continue during the upload rather than making them pause until upload is complete.

davide.gironi · December 30, 2018, 6:16pm

Thank you for clarifications!

rinaldi6109 · December 30, 2018, 10:34pm

If I have well understood, Duplicati writes 300Gb for the first backup and then only the size of the changed files to update the backup. Right?

If it is so, I will do without a ramdisk. My SSD (as well as most SSD) can bear such write amount.

ts678 · December 31, 2018, 1:50am

Likely, but to get a little more precise, How the backup process works explains what happens. The goal is to upload only changed parts of changed files, determined by reading the file as fixed-size blocks. If a block is already on destination storage, there’s no need for the extra work (and writes) to pass it through temp then upload it again. BytesUploaded in BackendStatistics is roughly what went through temp probably twice because I think that compression and encryption each write a file. The –backup-test-samples also go there.

From test with --synchronous-upload (confirming @JonMikelV prediction for Duplicati 2), temp activity was:

12/30/2018 9:16:51.0602470 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee first write
12/30/2018 9:16:53.0879437 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee last write 7182846 bytes
12/30/2018 9:16:53.4265630 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee first read
12/30/2018 9:16:53.4669133 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d first write
12/30/2018 9:16:53.5853793 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee last read 7182846 bytes
12/30/2018 9:16:53.5860775 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d last write 7183149 bytes
12/30/2018 9:16:53.5923439 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d first read
12/30/2018 9:17:06.1254322 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d last read 7183149 bytes

While I’ve passed along some concern about the backend (messes and retries), I wonder if a partial feature could stream the zip file creation directly through the encryption, thereby just doing writes for uploader use? Don’t know the design well enough to comment further, so I’ll just leave that thought around, in case it helps.

anon8404338 · January 2, 2019, 11:59pm

Hmm, wear and tear worry me. So I looked into this and noticed

diskutil erasevolume HFS+ 'ramdisk' `hdiutil attach -nomount ram://8388608`
cp -pr some3GBdirectory /Volumes/ramdisk
umount /Volumes/ramdisk

Removed /Volumes/ramdisk from the directory tree, but the 3GB stays allocated by a process called diskimages-helper. I need to kill it with SIGKILL to get the process removed and my memory back. What is the nice way to remove the ramdisk and get my memory freed?

anon8404338 · January 3, 2019, 12:12am

Answering my own question: The first command tells you which disk it is, e.g. /dev/disk2

Then diskutil eject /dev/disk2 will eject the disk and free the memory

Now I only need to find a way to tell the post script what the disk was that was created in the pre script…

NotHere · January 3, 2019, 12:22am

Use diskutil eject; sorry for the mistake in directions. I’ll edit shortly.

Mxx · January 3, 2019, 2:49pm

This is a rather FUD-style thread title!
Unless you bought extremely-cheapo and extremely-undersized SSD, DUPLICATI WILL NOT SIGNIFICANTLY AFFECT THE LONGEVITY OF YOUR SSD.

In 2015 website techreport performed an endurance test of 6 different SSD drives. The result? The drives were writing non-stop for 18 months straight. For example Samsung 840 drive started to encounter flash wear after writing 200TB of data and first non-correctable errors were happening at 300TB but it continued to work until 900TB of writing data! Other drives managed to write as much as 2.2TB PB of data!

This is way beyond what you will encounter during a lifetime of normal computer. By the time your SSD drive will encounter wear leveling issues your other components are more likely to fail.

This is a non-issue.

Pectojin · January 3, 2019, 4:54pm

In extreme use-cases it could hypothetically be an issue, but I still have a 120GB SSD from 2012 in active use. Granted I didn’t use Duplicati on it for that long

Also, you mean 2.2PB, right?