Throttle disk access?

I see there are options for throttling network, but would it also be possible to add an option to throttle disk access while it compiles the files to backup. When it scans the whole disk on my computer it really hammers it (I can hear it) and doing almost anything else becomes slow - just repositioning the cursor in an editor for example can take several seconds to respond. Once it has responded, it tends to be OK, but if you pause for a few seconds, then the next interaction is awful.

1 Like

I second any recommendation on how to manage is. When a backup job starts, it starts with building the inventory/indexing the files for the backup job. This hammers the disk I/O for a while. I’ve got an older spinning HD in my machine and this takes close to 20 minutes. So that 20 minutes per a backup that my machine is unusable. I’m not worried about the time it takes to index. I’m bothered that my machine is nearly unusable for each backup backup.

I run duplicati (headless) as a systemd service.
I’ve tried playing with ionice / nice to get it lower the priority of the processes but they never seem to take.

Any ideas?

I agree this would be nice to have, but how would you imagine a feature like this working?

Something like this --folder-scan-interim-delay=50ms which I would assume pauses 50 ms between each folder scan?

I’d just be happy if I could get it to duplicati to honor ionice settings in linux when it is running as a service.

Do you know if any mono based apps play well with ionice?

Well, I’m on Windows, so that wouldn’t help here.

Yes, frequent pauses during scanning should give enough time for other apps to get a look in. Rather than every folder scan, I might be inclined to try to even it out more, so after the most often repeated calls to get data from disk, see if a timer has expired, and if so sleep for a short time and reset the timer. The pauses become more even then, rather than dependent on disk content.

Yeah, that would be better but I can’t think of how one would do that. Granted I’m no IO expert so there certainly COULD be a way.

To be honest, I haven’t dug through the code yet but I assume it goes VERY ROUGHLY something like this:

  1. Get list of folders
  2. For each folder get list of files
  3. For each file, process it

I was proposing a pause in the 2 loop, but I suppose one could just as easily pause during the step 3 loop.

Of course there are plenty of comments about how slow people are finding Duplicati to be, so purposefully slowing it down could make that side of things worse. To help avoid the likely “now it’s even slower!” onslaught after implementing a pause like we’re discussing it might make sense to include stats about how much time was spent “being nice” to other disk processes. :slight_smile:

Yes, that’s what I was thinking, but not always, only if some disk timer has expired, so it just gives the other processes a window of opportunity to access the disk. It probably doesn’t even need to be that long - just to give the scheduler a chance to let the other processes in disk-wait state an opportunity to run.

Well, (a) I’d assume this would be an option, as it probably wouldn’t be much of an issue on SSD disks, and (b) as long as it isn’t so slow it runs into the next backup, or extended locks on files that might be being worked on, it doesn’t matter if the background tasks are a bit slower. It’s the interactive things that are killers - like the extraordinarily long time it takes to open folders one by one when navigating to a directory to restore.

That one has been addressed and is already in testing (so far looks like 5 min → 5 sec)!

Agreed it should be an option. As long as the “pause” is between file accesses it shouldn’t have any affect on file locks.

I know it can be done. I’ve had mono honor nice/ionice in the past. I suspect the issue is how Duplicati is spawning off it’s sub processes/threads that it is not inheriting the nice/ionice settings.

1 Like

Slight tangent but I think part of the issue is that it does a full scan on each backup. Crashplan (I think) used to just monitor the folders for changes and then process that list. Then occasionally do a full scan to make sure it didn’t miss anything. Doing a full scan each time is probably safer (and easier) but much more intensive. I’ve not looked at the code so this might be horrible to implement at this stage. I assume the file to block map for the previous backup are recorded in the local db so it might be doable to just adjust it for a new version.

Either way some more niceness would be good. Duplicati does rather hog the io when it runs. Ideally I shouldn’t be able to tell it’s running.

Yep, that’s how it does it.

There have been some talks of implementing a file watcher (I think there are even hooks to accept “process only these files” parameters already) but so far there hasn’t been anybody I’m aware of with time / experience to “do the deed”.

Part of the issue is that there is such a wide range of hardware on which Duplicati is used that it’s difficult to prioritize optimizations. For example, if a majority of people had SSD devices but less memory availability then we would be better off optimizing pretty much any other party of the tool.

Similarly, the supported OS and file system variants make it difficult to decide what to code for. While some newer systems might have meta data that could greatly speed up change detection, not everybody with those file systems will have it enabled.

Since Duplicati is open source, if there is a particular feature you’d like to see sooner than later, you can always “join the fight” by:

  • submitting Duplicati updates on GitHub (learn a new skill! :slight_smile: )
  • donating to Bounties on GitHub to entice others to implement what you want (the Sia integration and --retention-policy parameter are good examples of that)

Up to a point, I don’t really think it matters if it is a bit slow to work out what needs backing up, so long as it doesn’t interfere with the rest of the computer, doesn’t hog resources. Personally, I’d rather slow it down so it makes less impact. Where the performance is a much bigger issue is interactive stuff, which I know you’re aware of, like selecting files for restore.

It all runs in the same process, so it should inherit the priorities automatically.
But I have not worked with ionice so I could be wrong.

I found the fix for the systemd service that works ionice and nice. It’s in the official Github project. I don’t think it part of the current .deb build that I use with Mint 17 and 18.

Now I’m having issues with the init.d script for Mint 17 (14.04) It isn’t it honoring the IONice and nice because of how the duplicati-server script subshells and drops the should be inherited settings. I’ve fixed it, but it requires changes of a duplicati-server script and not just the default configuration file.

I have another issue with the startup scripts because by default it is storing user data and configuration in /usr/lib/ directory! But that’s not for this thread. I’ll fork and push my fixes once I get them both resolved.

3 Likes

Awesome, I look forward to it!

Is this possible resolved now?

I’d say throttle differentiates from priority, so both are probably valid features.