Duplicati stuck on Compact when Getting/Putting from OneDrive and results in missing files

arberg · August 11, 2023, 7:52pm

Duplicati 2.0.7.1_beta_2023-05-25

I’m running a compact operation where I’m joining my 50 mb files into 250 mb files. 3 times now duplicati has gotten stuck, after between 5min and 2hours. After restart of server-process, there are 5 lost files, which a repair will fix (hopefully without actual lost data).

On the last run, I ran with careful log-selection into separate files and with profiling log level.

I’m pretty sure Duplicati gets stuck on a OneDrive operation. I this week I compacted 700 Gb without a hickup from local storage, now it cannot handle the same operation on a much smaller set when using OneDrive as backend.

I have a bit of experience with OneDrive throttling after downloading/uploading all with rclone, so I know it shouldn’t be throtlled at the tiny amounts Duplicati has sent through, though of cause requests to OneDrive can fail.

I had hoped I could get some logging (like with rclone) that gave be detailed info on what is going on with the requests, but I don’t know if that’s possible.

So there are two (potential) bugs here. One is that it gets stuck. The other is that it registers files it has deleted as missing in subsequent execution.

When I stop it, I just restart the docker container, not bothering to cancel the duplicati job in duplicati, but it never helps to cancel in duplicati.

I’m pretty sure no data is lost here, as in the last case it logged (last few lines before it hung)

2023-08-11 19:06:44 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-ie5b63f57419345e9854799b7137dbd2f.dindex.zip.aes (18.13 KB)

and then the following verify reported this file as missing.

duplicati-logs.zip (214.9 KB)

see log files 03-05 for the simplest case.

Any help or insigthful comments will be appreciated.

Jojo-1000 · August 12, 2023, 11:49am

This is a known bug that will hopefully be fixed in the next release. If compact is interrupted, it can happen that index files are deleted, but the local database is not updated with the change.

This is exactly how that bug happens.

Unfortunately I don’t know why it gets stuck (it looks like on a get, is that always the case?). It looks like it is supposed to follow the http timeout, ~~but that is not correctly set from the options~~. So maybe there is a connection problem?

Edit: It seems the options are set elsewhere. You can try to set --http-operation-timeout and see if retries will fix the problem.

arberg · August 13, 2023, 2:41pm

Thank you for the info jojo.

I just tried the '--http-operation-timeout setting it to 1 minute, and it made no difference. This time it ran for 45 minutes then hung on next dblock-get right after deletes again.

I’ll just stop the avoid compacting for now and change the settings so it won’t compact. However it might be interesting to solve the OneDrive stuckness. If there’s anything I can enable to log it further to help the developers let me know. If the stuckness is part of the bug, then I guess they are already on it.

Best Alex

ts678 · August 14, 2023, 8:57pm

Although I don’t have a good suggestion (and nobody else is reporting this, and devs can’t generally fix things they can’t reproduce), I’d note that you may not be able to restore unless you can download files.

Testing things like Direct restore from backup files to another system might be good practice for disaster recovery. Or at the least test a generous local restore with no-local-blocks to force it to download files…

I suppose you could also restart all your network gear to see if it helps, but before that check for retries.

Job log’s Complete log ideally says "RetryAttempts": 0, but a few retries are normal, and tolerable.

You might have some low-level network issue, but it’d be hard to spot without a lot of packet capturing…

arberg · August 15, 2023, 6:41am

That’s why I asked for stuff like rclones ‘–dump headers’. Its absolutely essential for us devs to get good logs in order to fix issues. So essentially this means the OneDrive backend is broken in the most recent release to the degree of being unusable I would say. However I’ve been using it without any issues at all for at least a year on the old version, so it must be the recent version that’s to blame.

Here I saw the OneDrive issues again, but instead with a regular backup:
2023-08-14 23:57:14 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-08-14 23:57:21 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (1.58 KB)
2023-08-15 00:40:33 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b16a2b14b6895436ab85142b209c4e267.dblock.zip.aes (99.49 MB)
2023-08-15 00:40:45 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-bf3e4d54c83d0457f955905dbfa7e4f47.dblock.zip.aes (91.29 MB)
2023-08-15 00:40:59 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-bf3e4d54c83d0457f955905dbfa7e4f47.dblock.zip.aes (91.29 MB)
2023-08-15 00:41:13 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i9dddf4bc34f640a682b63f9bf9f71721.dindex.zip.aes (39.47 KB)
2023-08-15 00:41:14 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i9dddf4bc34f640a682b63f9bf9f71721.dindex.zip.aes (39.47 KB)

I have not experienced any problems with restore, and I have tested a bit due to reencrypting everything. This could also indicate a deadlock (or similar) in other branches of the code. Anyway I’m not worried about restore, I can just rclone it down if disaster happens, and the OneDrive backups are just my 3rd tier. I have 2nd tier local storage at parents for larger backup set. So I’ll be fine on that front.

The logs don’t get saved in the GUI because it hangs, so I end up restarting duplicati. But with that said, I have not seen any of that even in the information log. I even enabled the profiling log and the end of it is VERY clean, no issue what so ever. Actually it makes me think of deadlocks, because its just hanging for no apparent reason.

This is not a network issue, I just downloaded my entire 1.5-2 TB library across several OneDrive accounts using rclone, and rclone just did it. Even if it was a network issue, duplicati should not hang, it should retry at tell us in the logs.

My guess is the devs will easily be able to recreate this. It’s so frequent. It happens on every run of compact so far (out of 3-4 attempts), with the longest running about 2h before the problem arrived.

ts678 · August 15, 2023, 11:59am

I don’t know exactly how that’s done. Given that rclone (like Duplicati) has a variety of storage types, there would seem to have to be some types whose protocols need different output, and maybe some (Duplicati certainly has these) where third-party code is used, so one is subject to what it’s able to do.

https://rclone.org/flags/

--dump-headers Dump HTTP headers - may contain sensitive info

would appear to be the warning label not to just post all the data into the forum. I do that as well, e.g.

which actually gives the pre-encryption view, but the format is pretty awful, and you don’t get all data.

I have to say I like the idea of better debug tools, because as Duplicati matures, the ratio of external interoperability issues to internal processing bugs will probably increase – without good tools to look. Migration off .NET Framework will also mean we lose its tracing, although not many people will do it.

Feel free to find other people seeing this. It also works for me. I don’t know why it’s not working for you. Would you be able to share any (non-private) Advanced options? Can you get this without any of them?

Because these contain a lot of changes, someone who can get the problem (that’s you) could test the Canary versions leading to Beta, possibly identifying at what time things went wrong, for further study.

If you can attach a debugger to Duplicati, that might reveal where it’s stuck. Short of that, problems unreproducible to devs are hard to study. I don’t think just staring at the code would be very helpful. Generally a reproducible test case (ideally filed as a GitHub Issue with exact minimal steps) is best.

It doesn’t always hang, correct? Your stopping the compact avoids the problem (but is not attractive).

You’re in the wrong log. I’m asking for information from a stored (assuming operation finishes) job log.

That’s why I’m asking you for the retry statistics from an operation that completed. It’s certainly not a guarantee that it’s network related of course, and the fact that rclone is OK doesn’t mean Duplicati is.

Rclone storage type might also serve as a live workaround, and let you turn your compacting back on. There shouldn’t be any need for workarounds, of course, so there’s still the question of what’s up here.

… for you. https://usage-reporter.duplicati.com/ shows lots of other backups to OneDrive without issue.
I’ve been running all the Canary versions to OneDrive in production for a long time without such hangs. Certainly my setup is different, which is why I’m asking you for the simplest possible reproducible steps.

My guess is otherwise, based on all the available evidence not just from me, but from all Duplicati users. Certainly I might have missed a complaint, and it’s certainly possible that a dev (there are few) can hit it. Generally non-reproducible issues are unworkable given all the other things that people want from devs.

arberg · August 15, 2023, 2:43pm

My guess is that once it hangs, it hangs indefinitely. Its hung on all my attempt at compacting those remote 200 GB, and it has never progressed far. I have compacted other backups with nothing to compact and it didn’t show any retries. Nor do the successful backups.

Its a good idea to check those job statistics for a job that completes and actually uploads something. This looks fine, its from noon today:

"BackendStatistics": {
      "RemoteCalls": 19,
      "BytesUploaded": 138656881,
      "BytesDownloaded": 95622677,
      "FilesUploaded": 5,
      "FilesDownloaded": 3,
      "FilesDeleted": 9,
      "FoldersCreated": 0,
      "RetryAttempts": 0,
      "UnknownFileSize": 0,
      "UnknownFileCount": 0,
      "KnownFileCount": 2455,
      "KnownFileSize": 140077702985,
      "LastBackupDate": "2023-08-15T12:00:00+02:00",
      "BackupListCount": 15,
      "TotalQuotaSpace": 1158567428096,
      "FreeQuotaSpace": 790097821696,
      "AssignedQuotaSpace": -1,
      "ReportedQuotaError": false,
      "ReportedQuotaWarning": false,
      "MainOperation": "Backup",
      "ParsedResult": "Success",
      "Version": "2.0.7.1 (2.0.7.1_beta_2023-05-25)",
      "EndTime": "0001-01-01T00:00:00",
      "BeginTime": "2023-08-15T10:00:00.102575Z",
      "Duration": "00:00:00",
      "MessagesActualLength": 0,
      "WarningsActualLength": 0,
      "ErrorsActualLength": 0,
      "Messages": null,
      "Warnings": null,
      "Errors": null
    }
  },

Yes I’ve considered just switching to rclone. Now I updated my local copy, so I can also just switch to local for the compression, but it might be interesting for me to see the rclone plugin work in duplicati.

Have you tried compacting 200+ GB from OneDrive, where all files has to be processed? My job is a 450 GB remote backup storage, where I increased dblock-size from 50 → 250 MB. I can see I have let it run 4 times, and it has managed to compact 46 GB. That should give an idea of the sizes that I need to trigger the issue.

It’s true we don’t yet know whether this is a common issue for all or ‘only’ me. But jojo did say the compacting was a know problem wrt. repair. I wonder if there’s more to it than that. Also mostly my normal backup processes seem to work, but I have found some that got stuck.

These are my advanced options

--log-file=/config/log/duplicati-onedrive-storage.log
--blocksize=100KB
--log-file-log-level=Information
--http-operation-timeout=1m
--list-folder-contents=false
--small-file-size=45MB

The --http-operation-timeout=1m was just added for test in response to jojo, so not all backups had that.

But its also fine just to leave it for the devs. They know best where to look, and what logs is possible and helpful to add. I generally always ensure that my software logs network issues and stuff like that. But if its a deadlock, that won’t help them at all.

ts678 · August 15, 2023, 3:08pm

If it takes that much, it’s got to be a paid plan. That’s why I’m asking if you can get it with a simple case.

We do have stats from the usage reporter that I linked to. My own data is looking thinner than I’d hoped. Currently my backup is deleting full files during compact. I’d prefer that it be downloading and repacking. Maybe I’ll try to set up a batch file running some CLI tools that let me make data. It still won’t be 200 GB.

If you click on the PR link provided and then click on the issue, you’ll see that I filed it, after investigation, and there are also good steps to reproduce (which are not always easy, but make the devs’ work easier):

Error during compact forgot a dindex file deletion, getting Missing file error next run. #4129

Compact has been a relatively large source of errors after backup (which runs more often) has improved due to the usual cycle of fixing what’s found broken, and with reproducible enough steps to allow repairs.

I didn’t look at all your logs, but the first one was a very clear case of this problem (which has nothing to do with get hangs except that they abort the compact) where all the “missing” files were the dindex files. The dblock files were cleaned up due to their database state of “Deleting”. PR does the same for dindex.

Before the dindex files are deleted on the remote, update the state in the database to Deleting. Previously, only dblock files were handled this way. Patch by warwickmm

ts678 · August 15, 2023, 3:24pm

There are about 6,100 topics here, with about 62,700 posts. There are about 2 active dev volunteers.
They have already been by here, and I don’t know how much time they want to spend chasing this…

Generally I suggest opening an issue with exact steps that are as simple as you can possibly achieve, thereby increasing the chances that it’s reproduced. GitHub is also an issue tracker. The forum is not.

So it’s known to not happen on smaller backups (which would be bad for repro), or is it not yet tested? Requiring volunteers to buy paid storage has generally not gone over well, so smaller repro may help.

arberg · August 15, 2023, 4:41pm

I have a feeling the communication has gone a bit sideways.

I appreciate the info that there are not many active devs to chase issues and fix bugs. I also appreciate the software and the community, I even supported the software with 75 USD, because its a great project. That’s what I expected buying a software would cost. I don’t expect support or help. I do want to share the issues that I find so that interested parties can be aware of the issues.

It will be difficult for the devs to chase this issue without attempting to reproduce it themselves, and obviously testing it will take time. There are no more logs that I am aware I can enable that would help.

I am not requiring any one to test compact via OneDrive on at least 50GB of data, but I think that is what would be a sensible step to try to reproduce it. Saying that the issue does not exists for others is not helpful, unless we actually know others have executed such size tests via OneDrive. And yeah, I’m using OneDrive family with 5x 1TB, and its a paid plan. Probably it would be just as fine to just run on a small backup set and just run many times in a loop, but then one has to develop such automated test. More work for the devs or someone interested in testing.

I am aware it is difficult to track down issues that are intermittent in nature. I just don’t see what more I can do.

ts678 · August 15, 2023, 5:29pm

Statistically it seems likely that some other OneDrive user runs backups this size, but it’s not known.

That’s what I was on my way to doing, attempting to see if at least one other can reproduce a hang.

Testing different sizes and running intermediate Canary versions was suggested. OneDrive changes:

https://github.com/duplicati/duplicati/commits/master/Duplicati/Library/Backend/OneDrive

shows the bulk of changes have been in effect for over two years of (not highly run) Canary releases.
Running Canary in production is kind of risky, so maybe it’s awkward to isolate problem history there. Running a second backup to play with size (total, Remote volume, etc.) would seem completely safe.
The TEST command can do intensive file downloading up to the entire content of backup destination.
Duplicati.CommandLine.BackendTool.exe can do single downloads using target URL from an Export.
Adjust that URL to an empty folder, and Duplicati.CommandLine.BackendTester.exe can run by itself.
Possibly one of those will find a hang, or possibly (somehow) it only happens during a compact run…

Let me see if I can find an issue with a small scale test. If not, this topic might get lost in the vastness. Additional logging might still be an option, but you might be asked to run a test build with that logging.

arberg · August 15, 2023, 6:30pm

Cool, that’s nice of you to see if you can reproduce it.

I did have the thought that the size might very well be irrelevant, and instead it might very well be the upload/download switching and number of requests to one drive that triggers something bad.

So doing a backup with dblock-size 1 MB (which is minimum supported), and then upping it by 5x on each compact run might trigger it. I’ve just started a test of 250 MB and then we will see if I can also trigger the issue there.

I have not really had the desire to run with a potentially buggy canary, so I have not done that previously.

Thank you for linking the OneDrive changes, I might have a look at those.

I’m hesitant to spend time running other kinds of tests like TEST command. If we have one way of reproducing it that’s all that will be needed to gather more information using for instance more logging in a debug build. Whether or not we reproduce it elsewhere doesn’t really add that much value. I’m saying this being a developer and thinking about how I would go about fixing the issue, if I knew my way around the duplicati source-code and had the environment up and running.

However I just recall I would actually like to do a complete deep test of the newly reencrypted archives. I think its pretty cool restic can determine if their backup is valid. I havn’t yet spent the time to find the switches to create such a deep-test. If you knew those, it would help me out. Then I could run a complete test on one of my onedrive backups and see if that triggers it.

Best Alex

ts678 · August 15, 2023, 7:27pm

--full-remote-verification
Checks the internal structure of each file instead of just verifying the file hash.

but that likely gets some complaints about Extra (better than Missing though). A known issue.
Pull request has been made, but pull request processing has unfortunately been flowing well.

This only goes so far and is not a replacement for an actual restore test or database recreate.
There is a recent forum topic on recreate having trouble that the usual self-tests didn’t predict.

Good practices for well-maintained backups has some other suggestions.

I’m testing with a script with 1 GB of data at default 50 MB dblock-size, changing 10% of the blocks on each run, using --threshold=5 to make it hyper-sensitive to wasted space. This churns quite effectively:

2023-08-15 10:43:21 -04 - [Information-Duplicati.Library.Main.Operation.CompactHandler-CompactResults]: Downloaded 29 file(s) with a total size of 1.12 GB, deleted 60 file(s) with a total size of 1.14 GB, and compacted to 4 file(s) with a size of 62.17 MB, which reduced storage by 56 file(s) and 1.08 GB

I was going to try 250 MB next, but I suppose I could go the other way. You can have the batch file if you’d like to play with it. The data-changer is a separate Python script similar to the one posted here.

Main loop is just 1) change 2) backup 3) pause so I can kill it if desired. No hang yet. I did see retries.

arberg · August 15, 2023, 8:29pm

Thank you for the info. I’ll try the --full-remote-verification on some of the OneDrive backups one day.

I’ve been testing smaller sets of 600 MB backup, with 1 MB blocks compacted to 5MB, then 50 MB. No problems so far. So maybe large tests are needed. I’ll try a 3 GB chunk overnight.

Yeah please send the scripts you have. I’m using docker on linux (unRaid), I can run python. I might have problem getting it to access duplicati in the docker, but maybe i’ll figure something out.

arberg · August 15, 2023, 8:55pm

I suspect I reproduced it now. As mentioned I didn’t see it on my 600 MB backup set, nor an even smaller, but here on compacting 3.3 GB remote backup I got it after 5 minutes. Its been hanging for 30 minutes same place as before, so my guess is I’ve triggered the issue. I’ll let it hang till tomorrow evening or so. It’s possible, but unlikely, it’s just OneDrive throttling for now.

Here’s my exported backup job, in case that’s relevant. I deleted by key but left my super secret password for others to learn how one really chooses a secret password. It’s so good no one will even guess its a secret. Or something…

I’m pretty sure my filters are not important, given those are ignored during compacting, but I just included it all. The same goes with aes encryption and zip compression. Its best to change as little as possible while trying to reproduce in an easier setting.

export.json:

{
  "CreatedByVersion": "2.0.7.1",
  "Schedule": null,
  "Backup": {
    "ID": "14",
    "Name": "Compacting dummy backup test",
    "Description": "",
    "Tags": [],
    "TargetURL": "onedrivev2://backup/duplicati-dummytest/dummyTest1?authid=DELETED",
    "DBPath": "/config/CompactTest.sqlite",
    "Sources": [
      "/mnt/user/dockerhub/minecraft/"
    ],
    "Settings": [
      {
        "Filter": "",
        "Name": "encryption-module",
        "Value": "aes",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "compression-module",
        "Value": "zip",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "dblock-size",
        "Value": "6MB",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "passphrase",
        "Value": "dummy",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "retention-policy",
        "Value": "1D:0s,7D:8h,4W:1W,1Y:1M",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "--log-file",
        "Value": "/config/log/duplicati-onedrive-compactDummyTest.log",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "--zip-compression-zip64",
        "Value": "true",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "--auto-compact-interval",
        "Value": "1W",
        "Argument": null
      },
      {
        "Filter": "",
        "Name": "--blocksize",
        "Value": "100KB",
        "Argument": null
      }
    ],
    "Filters": [
      {
        "Order": 0,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/bashed-docker/data[^/]*/FeedTheBeast/]"
      },
      {
        "Order": 1,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/bashed-docker/data[^/]*/(world|world_nether|world_the_end|worlds)/]"
      },
      {
        "Order": 2,
        "Include": false,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/bashed-docker/data[^/]*/[^/]*/]"
      },
      {
        "Order": 3,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/bashed-docker/data[^/]*/FeedTheBeast/(world|world_nether|world_the_end|worlds)/]"
      },
      {
        "Order": 4,
        "Include": false,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/bashed-docker/data[^/]*/FeedTheBeast/[^/]*/]"
      },
      {
        "Order": 5,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/docker-composed/[^/]*/data/FeedTheBeast/]"
      },
      {
        "Order": 6,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/docker-composed/[^/]*/data/(world|world_nether|world_the_end|worlds)/]"
      },
      {
        "Order": 7,
        "Include": false,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/docker-composed/[^/]*/data/[^/]*/]"
      },
      {
        "Order": 8,
        "Include": true,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/docker-composed/[^/]*/data/FeedTheBeast/(world|world_nether|world_the_end|worlds)/]"
      },
      {
        "Order": 9,
        "Include": false,
        "Expression": "[/mnt/[^/]*/dockerhub/minecraft/docker-composed/[^/]*/data/FeedTheBeast/[^/]*/]"
      },
      {
        "Order": 10,
        "Include": false,
        "Expression": "[.*.jar]"
      },
      {
        "Order": 11,
        "Include": false,
        "Expression": "[.*.log]"
      }
    ],
    "Metadata": {
      "LastBackupDate": "20230815T195555Z",
      "BackupListCount": "1",
      "TotalQuotaSpace": "1158567428096",
      "FreeQuotaSpace": "786742378496",
      "AssignedQuotaSpace": "-1",
      "TargetFilesSize": "3469634619",
      "TargetFilesCount": "6967",
      "TargetSizeString": "3.23 GB",
      "SourceFilesSize": "5726761811",
      "SourceFilesCount": "8035",
      "SourceSizeString": "5.33 GB",
      "LastBackupStarted": "20230815T195555Z",
      "LastBackupFinished": "20230815T201742Z",
      "LastBackupDuration": "00:21:47.5944080",
      "LastCompactDuration": "00:00:25.6808890",
      "LastCompactStarted": "20230815T193019Z",
      "LastCompactFinished": "20230815T193045Z",
      "LastVacuumDuration": "00:00:00.0343780",
      "LastVacuumStarted": "20230815T181615Z",
      "LastVacuumFinished": "20230815T181615Z"
    },
    "IsTemporary": false
  },
  "DisplayNames": {
    "/mnt/user/dockerhub/minecraft/": "minecraft"
  }
}

ts678 · August 15, 2023, 9:04pm

Here’s the very slightly updated version of the file changer. This one is called random_file3.py:

#!/usr/bin/python
import os
import random
import argparse
import sys

parser = argparse.ArgumentParser()
parser.add_argument('path')
parser.add_argument('--block-size', type=int, default=102400, help='size of each block of this file')
parser.add_argument('--change-size', type=int, default=2, help='size of change at start of a block')
parser.add_argument('--percent', type=int, default=10, help='percent of blocks to get changed')
parser.add_argument('--create', type=int, help='create file with this many blocks', dest='blocks')
args = parser.parse_args()

if args.blocks is not None:
    size = args.block_size * args.blocks
    fd = os.open(args.path, os.O_WRONLY | os.O_BINARY | os.O_CREAT | os.O_TRUNC)
    os.ftruncate(fd, size)
    blocks = args.blocks
elif os.path.exists(args.path):
    fd = os.open(args.path, os.O_WRONLY | os.O_BINARY)
    size = os.path.getsize(args.path)
    blocks = size // args.block_size
else:
    print('Please supply path of an existing file, or use --create to set up a new file.', file=sys.stderr)
    sys.exit(1)

changes = blocks * args.percent // 100

for index in random.sample(range(blocks), k=changes):
    os.lseek(fd, index * args.block_size, os.SEEK_SET)
    os.write(fd, random.randbytes(args.change_size))

There are some subtleties to its use. In the test script below, I initialize to fully random data because otherwise the blocks compress too well, and I need size in order to get the compact churning data…

I did a second run at 50 MB to see if behavior stablized, and it had.

Downloaded 20 file(s) with a total size of 978.77 MB, deleted 40 file(s) with a total size of 979.41 MB, and compacted to 36 file(s) with a size of 880.84 MB, which reduced storage by 4 file(s) and 98.57 MB

My first run at 1 MB after initial backup creation (I commented out the whole-file create for later runs) is

Downloaded 646 file(s) with a total size of 632.23 MB, deleted 1292 file(s) with a total size of 633.07 MB, and compacted to 994 file(s) with a size of 534.26 MB, which reduced storage by 298 file(s) and 98.82 MB

and I guess I’ll increase dblock-size as you suggest, to see if anything in this smallish backup will hang.

set DUPLICATI=C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe
set RANDOM=C:\bin\random_file3.py
set FILE=C:\tmp\onedrive_compact\file.bin
set LOG=C:\tmp\onedrive_compact\profiling.log

rem "%RANDOM%" --create 10000 --percent 100 --change-size=102400 "%FILE%"

:loop
"%RANDOM%" --percent 10 --change-size 4 "%FILE%"
"%DUPLICATI%" backup "onedrivev2://Duplicati Backups/onedrive_compact?authid=REDACTED" "%FILE%" --no-encryption --keep-versions=1 --threshold=5 --console-log-level=information --log-file="%LOG%" --log-file-log-level=profiling --profile-all-database-queries --dblock-size=1MB
rem timeout /t 10
rem goto loop

I took out the timeout and loop because I’m running this by hand, increasing dblock-size after each run.
I can increase from 1 GB to 3.3, but maybe I’ll first continue prior plan while pondering what you posted.
Good progress I guess. BTW if you ever go to huge dblock-size, you need larger http-operation-timeout.

ts678 · August 16, 2023, 12:59am

Results of fulll range of dblock-size got no hangs in testing 2.0.7.1_beta_2023-05-25 on Windows 10.

1 MB
Downloaded 646 file(s) with a total size of 632.23 MB, deleted 1292 file(s) with a total size of 633.07 MB, and compacted to 994 file(s) with a size of 534.26 MB, which reduced storage by 298 file(s) and 98.82 MB

2 MB
Downloaded 640 file(s) with a total size of 658.00 MB, deleted 1280 file(s) with a total size of 658.89 MB, and compacted to 546 file(s) with a size of 559.86 MB, which reduced storage by 734 file(s) and 99.02 MB

4 MB
Downloaded 432 file(s) with a total size of 660.39 MB, deleted 864 file(s) with a total size of 661.15 MB, and compacted to 284 file(s) with a size of 568.49 MB, which reduced storage by 580 file(s) and 92.66 MB

8 MB
Downloaded 312 file(s) with a total size of 799.39 MB, deleted 624 file(s) with a total size of 800.14 MB, and compacted to 176 file(s) with a size of 702.10 MB, which reduced storage by 448 file(s) and 98.04 MB

16 MB
Downloaded 148 file(s) with a total size of 880.28 MB, deleted 296 file(s) with a total size of 881.08 MB, and compacted to 98 file(s) with a size of 779.58 MB, which reduced storage by 198 file(s) and 101.50 MB

32 MB
Downloaded 71 file(s) with a total size of 985.63 MB, deleted 142 file(s) with a total size of 986.40 MB, and compacted to 56 file(s) with a size of 882.88 MB, which reduced storage by 86 file(s) and 103.52 MB

64 MB
Downloaded 31 file(s) with a total size of 978.77 MB, deleted 62 file(s) with a total size of 979.52 MB, and compacted to 28 file(s) with a size of 880.80 MB, which reduced storage by 34 file(s) and 98.72 MB

128 MB
Downloaded 16 file(s) with a total size of 978.77 MB, deleted 32 file(s) with a total size of 979.52 MB, and compacted to 14 file(s) with a size of 880.88 MB, which reduced storage by 18 file(s) and 98.65 MB

256 MB
Downloaded 8 file(s) with a total size of 978.77 MB, deleted 16 file(s) with a total size of 979.41 MB, and compacted to 8 file(s) with a size of 880.87 MB, which reduced storage by 8 file(s) and 98.53 MB

512 MB
Downloaded 6 file(s) with a total size of 1.05 GB, deleted 12 file(s) with a total size of 1.05 GB, and compacted to 4 file(s) with a size of 979.09 MB, which reduced storage by 8 file(s) and 98.81 MB

arberg · August 16, 2023, 5:58am

Yeah, thank you. I’ll remove that again, it was just for testing.

I just did a quick scan of your script, and it seems you just use it to change the files, so you have something now to backup. If that is true, then you don’t need that for this. I’m just taking backup of one thing, then compacting it, without any changes.

The problem is intermittent in nature. It’s probably the least work for you (in user-hours) to just pick the largest size you have space for on your OneDrive account and then backup that. Like a couple of movies 10GB+, or even more if possible. Yesterday I did also rclone, updating all my local copies from OneDrive, meaning its more likely I was thottled due to throughput. I still don’t know if actual GB throughput makes a difference. I’ll retest again.

Also you might try with the backup I exported, so you get the same settings.

Btw. its still hanging here from yesterday, but I’ll leave it hanging till tonight or tomorrow night.

ts678 · August 16, 2023, 12:16pm

I don’t know what’s in your Docker container, but just letting it sit isn’t adding much debug data.
Taking resource (e.g. memory) measurements on the container might be good to rule those out.
Possibly netstat -a or similar would show connections to 1drv.ms, and give connection state.
There’s also a mono thread dump ability, but I’m not sure if you can easily catch the output data.
Getting Duplicati to run in a terminal would. I suspect Docker won’t, but I don’t have any Docker.
I also don’t have much Linux (just some small VMs on Windows). Do you see this on Windows?

Attempting to simulate normal backup. You posted that you saw this with a “regular backup” too.
OK, next test is backup 10 GB of data at 50 MB, then will try standalone compact to 250, as OP.

I see 3 GB, but dblock-size is unclear. dblock-size suggests 6 MB, backend values suggest 1 MB.
I can tune next run, but I’m still not particularly confident that this system will show any such hang.

arberg · August 16, 2023, 12:55pm

I’ve seen it once on a regular backup, and concluded hesitantly that its probably the same issue. However we don’t know its the same issue, and we know I can relatively easily and relatively reliably reproduce with compacting, therefore that is the sensible thing to try. It doesn’t hurt for you to also edit you backed up data, and run the backup.

I just started with 1 MB dblock-size because that’s the minimal. If the issue is related to Duplicati threads handling and switching between its handling of files, then it would make sense we just need to make Duplicati handle many files. Therefore minimal initial dblock-size. Run the backup. The edit to 6 MB dblock-size, to ensure Duplicati will compact all the remote dblock-archives with its default setting of 20% small files. I thought 1 → 5 MB would be enough, but it wasn’t in all cases. If it compacts successfully with that, I rerun with 35MB, and then start over, delete remote backup, delete local database, reset dblock-size to 1MB, and redo. On my 3 GB remote backup data set, I got the error within 4 min on first compacting attempt of 1MB → 6MB. But my first 30 minutes compacting on smaller sets didn’t produce it.

Its a great idea to see if I can get a thread dump with stacktraces for each thread. I’ll se if I can manage that.