Attention SSD users: if your temp directory isn't ram-backed, Duplicati will prematurely wear your SSD

#6

This. It does seem a bit odd that duplicati uses a temp dir for the blocks that are getting ready to be sent over the network.

The default block size is 100KB, so holding 10 blocks ready for transmission in memory will only take 1MB + overhead in ram.

I have my block size set to 500KB, and asynchronous-upload-limit set to 50. So at most that is 25MB of memory usage… That’s not that much.

When you get into big block sizes at 10MB and an asynchronous-upload-limit of let say 20. That is still only at most 200MB ram usage, That is more than reasonable for a running backup job.

As far as I know, you do not want blocks to be that big anyway. I read somewhere on the forum that each block will hold the whole or part of a file. if you have a block size of 10MB, and you had a 1MB file That 1MB file will end up taking a hole 10MB block to its self. That also seems a bit odd to me, so do correct me if I’m wrong, as I would like to use a bigger block size then 500kb.

0 Likes

#7

Actually, blocks are packed into volumes, which default to 50MB in size. So your numbers are a bit off, but it’s still a valid point that you could fit a couple volumes in memory if you wish

Correcting. A block will be up to that size but if the file is smaller it the block will be smaller.

1 Like

#8

Watching the asynchronous-upload-folder, It looks like it holds the blocks, not the volumes in the folder. that’s why I was working with blocks and not the volumes. Again, if im wrong, do let me know.

As for smaller files producing a smaller block, if that is the case, I may just rebuild my backup to use 10MB blocks.

0 Likes

#9

No, that’s entirely unrelated although the name and documentation are both vague enough to be tempting.
Useing a RamDisk for the asynchronous-upload-folder is how I think it works (and what the name refers to).
use-block-cache does not work as expected; high SSD wear #3566 speaks of rename, docs, and a feature (however features have a long backlog – good part is that gives some time to start discussing beforehand).

One tradeoff which I think is especially visible on a database Recreate is that an SSD can help add speed. Unfortunately it also takes its toll on the SSD. I don’t think we would want the database kept in a ramdisk…

is a Windows internal file. Your image refers to “Paging I/O”. So does this article. Search can find other info.

Several (but not most) third-party backend access methods have file-based APIs, so can’t stream transfers. Possibly this would be kind of a sometimes-can-have-it-sometimes-can’t thing by the time the dust settles…

This is probably partly for performance goals about keeping a steady upload, and the queue size is settable. There’s also the previously mentioned inability to stream files to some destinations, and worry about messes. Somewhere there’s a comment from the lead author to that effect, I think. At least initially, simplicity is good…

Of course it should but it doesn’t yet. Keeping messes somewhat confined to temporary files is a good thing.

Some people have wanted huge volumes for large backups. Choosing sizes in Duplicati discusses tradeoffs. Having a roomy drive is more forgiving of configurations, and Duplicati temp dup-xxxx files not being deleted, however whether or not one thinks SSD endurance is an issue now, the shift from TLC to QLC will get worse.

It’s hard to tell them apart (certainly not using the random file names). On my system I generally get a mix of 50MB dblock volumes and smaller files that are dindex files (done after the dblock, I think) and a dlist at end. Look at Job --Show log --> Remote for lines doing put to get file names. Click on these lines to get file sizes.

1 Like

#10

It sounds like you may already know this, but you will have to start a fresh backup to change your block size. While dblock {upload volume) size can be adjusted on an existing back, block size can’t.

@ts678, yes - I do recall a discussion with @kenkendk about adding a parameter to allow Duplicati to do all it’s archive generation in memory instead of on disk (for performance reasons). Unfortunately, I’m not finding it in the forum right now but if I stumble across it I’ll try to add it here.

0 Likes

#11

I finally found the post that I was trying to recall in How can I speed up local backups?

Store temporary files in memory #2816 in 2017 talks about the non-seekable streams.

2014 comments probably predate Duplicati 2, but possibly some of the ideas still hold:

Attempt to avoid temp files if the backend supports streaming #181

The problem with this is handling retries, as the retry has to roll back the entire state and re-produce the volume.

If system disk is SSD and backups are large, Duplicati’s default options will cause write-wear #1032

This refers to unnecessary writes on SSDs where a system can write to a non-SSD location. If all you have is an SSD, you can reduce temp writes by disabling the upload asynchronously option.

–synchronous-upload is now the documented option. Anyone care to test if it will reduce writes and speed?

0 Likes

#12

I haven’t tested but I suspect it will not reduce writes as it sounds to me like it generates the files the same way but allows other tasks to continue during the upload rather than making them pause until upload is complete.

0 Likes

#13

Thank you for clarifications!

0 Likes

#14

If I have well understood, Duplicati writes 300Gb for the first backup and then only the size of the changed files to update the backup. Right?

If it is so, I will do without a ramdisk. My SSD (as well as most SSD) can bear such write amount.

1 Like

#15

Likely, but to get a little more precise, How the backup process works explains what happens. The goal is to upload only changed parts of changed files, determined by reading the file as fixed-size blocks. If a block is already on destination storage, there’s no need for the extra work (and writes) to pass it through temp then upload it again. BytesUploaded in BackendStatistics is roughly what went through temp probably twice because I think that compression and encryption each write a file. The –backup-test-samples also go there.

From test with --synchronous-upload (confirming @JonMikelV prediction for Duplicati 2), temp activity was:

12/30/2018 9:16:51.0602470 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee first write
12/30/2018 9:16:53.0879437 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee last write 7182846 bytes
12/30/2018 9:16:53.4265630 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee first read
12/30/2018 9:16:53.4669133 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d first write
12/30/2018 9:16:53.5853793 AM dup-3ecd30fd-0d55-4062-a659-37c1ff861cee last read 7182846 bytes
12/30/2018 9:16:53.5860775 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d last write 7183149 bytes
12/30/2018 9:16:53.5923439 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d first read
12/30/2018 9:17:06.1254322 AM dup-f3c3c50d-9082-40ae-a45d-e19aeddc037d last read 7183149 bytes

While I’ve passed along some concern about the backend (messes and retries), I wonder if a partial feature could stream the zip file creation directly through the encryption, thereby just doing writes for uploader use? Don’t know the design well enough to comment further, so I’ll just leave that thought around, in case it helps.

0 Likes

#16

Hmm, wear and tear worry me. So I looked into this and noticed

diskutil erasevolume HFS+ 'ramdisk' `hdiutil attach -nomount ram://8388608`
cp -pr some3GBdirectory /Volumes/ramdisk
umount /Volumes/ramdisk

Removed /Volumes/ramdisk from the directory tree, but the 3GB stays allocated by a process called diskimages-helper. I need to kill it with SIGKILL to get the process removed and my memory back. What is the nice way to remove the ramdisk and get my memory freed?

0 Likes

#17

Answering my own question: The first command tells you which disk it is, e.g. /dev/disk2

Then diskutil eject /dev/disk2 will eject the disk and free the memory

Now I only need to find a way to tell the post script what the disk was that was created in the pre script…

0 Likes

#18

Use diskutil eject; sorry for the mistake in directions. I’ll edit shortly.

0 Likes

#19

This is a rather FUD-style thread title!
Unless you bought extremely-cheapo and extremely-undersized SSD, DUPLICATI WILL NOT SIGNIFICANTLY AFFECT THE LONGEVITY OF YOUR SSD.

In 2015 website techreport performed an endurance test of 6 different SSD drives. The result? The drives were writing non-stop for 18 months straight. For example Samsung 840 drive started to encounter flash wear after writing 200TB of data and first non-correctable errors were happening at 300TB but it continued to work until 900TB of writing data! Other drives managed to write as much as 2.2TB PB of data!

This is way beyond what you will encounter during a lifetime of normal computer. By the time your SSD drive will encounter wear leveling issues your other components are more likely to fail.

This is a non-issue.

1 Like

#20

In extreme use-cases it could hypothetically be an issue, but I still have a 120GB SSD from 2012 in active use. Granted I didn’t use Duplicati on it for that long :wink:

Also, you mean 2.2PB, right?

0 Likes

#21

This is not a “non issue.” I’ve read the techreport test.

" Developed by a this handy little app includes a dedicated endurance test that fills drives with files of varying sizes before deleting them and starting the process anew."

That is not how the vast majority use their laptop, desktop, or server hard drives, and just by total coincidence, it also happens to be the most favorable scenario for an SSD’s wear longevity.

  • The test is compressed over a very short period of time, so caches have maximum chance at absorbing re-writes
  • No already-written blocks are repeatedly re-written except for directory structure information
  • Re-writes are spread across a very large portion of the flash cells, and after every test cycle, the entire drive is freed for the wear-leveling algorithm to use again

Again: this scenario does not represent anyone’s real-world usage. I can think of two use cases that fit: videographers (many digital cinema cameras write directly to SATA or similar drive modules) and people using said devices as backup destinations by traditional backup programs (think Legato and the like) where the volume is written to, read for verification, then recycled by re-writing start to finish.

For most users, 50% to as much as 90% or more of the drive is filled with data and much of that does not change. If your drive is 90% full, unless it implements static wear leveling, its wear capacity is reduced tenfold because each write must result in a re-write in that 10% of available flash memory.

Note that I am assuming drives are not over-provisioned. This is partly because drive manufacturers do not generally advertise over-provisioning levels; I also don’t know how many SSDs actually implement static wear leveling; the industry similarly hides behind “we do wear leveling!” It’s probably safe to assume that most SATA and NVME SSDs implement at least dynamic wear leveling, but I would not be surprised if static wear leveling doesn’t exist except in the high-end desktop and enterprise drives.

Anyway. This is why Duplicati’s backup process causes so much wear (on a drive that doesn’t do static wear leveling): if you have a 500GB drive, and it’s got 450GB of data of which 300GB is backed up by Duplicati, that results in 6 wear cycles on that 10% of the drive. And if you have extra verify options turned on - every verification results in more writes as well because the verification streams the block to disk, not memory.

Worse, that 10% of the drive is absorbing the vast majority of re-writes due to user and system activity; logs, system updates, filesystem metadata, virtual memory (glares at Chrome), application caches, email clients, and so on.

0 Likes

#22

Please re-read their testing methodology. Your bullet-points are inaccurate.

If you have 500GB drive with only 10% free space, you need to upgrade your drive or stop hoarding those cat pictures.
Anything that duplicati writes is no different than what OS or apps write. So if you perform your backups even twice a day - whoopy-freaking-do! You just created 2 extra writes on each cell involved in the backup. 2 extra writes out of dozens if not hundreds of thousands of writes each cell can handle.

1 Like

#23

It sounds to me like most of the examples so far COULD be considered edge cases depending on individual system setup / usage, but both are valid.

Regardless of whether or not SSD wear is an issue, I’m pretty sure we can all agree that performance could be improved.

Using a ram-disk (or adding a feature to do more processing in RAM when available) absolutely should improve performance and if it happens to ease wear on drives (both SSD and spinning rust) then that can be considered a bonus. :slight_smile:

1 Like

#24

“should” and “will” are not the same. Has anyone actually done any benchmarks to see how much of a performance difference ramdisk or bigger caching makes?

0 Likes

#25

I think I saw some posts from users who set up ram-disks saying it sped things up, but I don’t know how they measured or what bottlenecks they had before.

I’ve never bothered to check myself as most of my backups run on always-on machines so I care more about reducing system impact than speeding things up.

Agreed - which is why I didn’t say “will”. :smiley:

0 Likes