Various positive + negative feedback

I just did the following using the Canary: Started a new backup job on a Windows PC backing up a NAS SMB share to Sharepoint. The job ran halfway through the upload, then I cancelled it (“after current file”). Then I restarted the job without looking to close what happened, but it did start uploading at some point again.

Next my NAS decided to freeze the SMB share due to some incompatibility with its external USB drive, thus the job was cancelled again due to connection loss with the SMB source.

Duplicati did not ask for a Repair, so I just reconnected the SMB shares and then restarted the job.

Unfortunately, it seems that Duplicati reads (hashes?) through all source files again instead of just checking date changes. In practice my NAS SMB connection can deliver around 110 mb/s for single large files, but Duplicati only reads at about 80 mb/s while running 16 (+) threads of light CPU load (9900K at 5 Ghz). The total CPU load of said threads sums up to what a single CPU core would achieve (around 6.25% total CPU load), but these are still multiple threads happening in parallel on different “Ideal Processors” and for very short periods of time Duplicati’s total CPU load increases towards 10%.

Only once it went through the content of all the already backed up files does Duplicati start upload the other half of the formerly cancelled backup again.

Is this supposed to work like this?

Normally it will only reprocess file contents if it detects metadata changes (such as a timestamp change).

Not sure if the fact that the backup was interrupted and restarted has anything to do with your experience, but I’m guessing that must be what it was. You can test on a smaller backup to confirm more easily - let it finish, then modify or add one file, then run the backup again and it should only process that one new/changed file.

Once D2 started re-uploading I cancelled it again manually (Stop after current file), then restarted. Indeed it reads through all file content again despite the files not having changed a bit since the seconds I stopped it last.

For a manual (=error free) cancellation this seems a bit ungraceful and unnecessarily resource-intensive. Especially considering that this is not a Repair, but only a restarted backup.

Edit: It just finished going through all files and refused to start uploading afterwards (Got 2674 error(s)). So it seems that now a real repair is in order.

Overall I am not entirely happy with how cancelled and/or interrupted backups are handled. More tests are in order before I can trust this.

Yeah it’s a known weak point and there are efforts underway to fix it.

  • Deleting a backup job + data from Sharepoint left the data on the server (over 5000 files).

I will delete those manually now. The fastest way to delete so many files is to just delete the folder from the web-interface. Every other mechanic tries to delete one file at a time. Ouch.

Yeah I would manually delete on the back end. I think one reason Duplicati doesn’t delete the folder itself is because it may not be safe to do so. There may be other contents unrelated to this job’s backup data.

I did not mean for Duplicati to delete the folder, but it should have deleted the (5000+) backup files. Taking ages to do so was expected (and what I was testing for), but the files stayed on the server when D2 claimed to be finished.

Ah, interesting. The few times I have used the feature where Duplicati deletes the back end data, it worked properly. Maybe it’s an issue with its SharePoint back end implementation only (which I have never tested or used).

Deleting many files is were various operating-systems and others struggle. Sending several thousand deletes over an internet connection is a challenge and I mean to have read that Onedrive (and likely Sharepoint) has more trouble with many small files.

Even the Onedrive client takes a very long time to delete so many files from my end. Deleting the files via web front-end is faster, but you can only mark 30 files at once or so. That is why I usually try to delete the whole folder and why I keep different backup destinations in different folder with Duplicati, since it does not create sub-folders for its many files by itself.

  • After over 17 hours of uninterrupted upload the backup on my Windows PC finished stating: “Found 8521 files that are missing from the remote storage, please run repair.”

The last two Remote log entries list:

Oct 10, 2019 6:50 PM: list
[



]
Oct 10, 2019 6:50 PM: list
[



]

At the same time there are 8523 files in the destination Sharepoint folder.

What to make out of this?

Did a repair, took 3 hours since my last post here. Result:

“Got 4260 error(s)”.

All the Remote job log is telling me, is those two empty list entries again.

The “verbose” log tells me:

" * Oct 10, 2019 10:56 PM: The operation Repair has completed

  • Oct 10, 2019 10:56 PM: Backend event: Put - Completed: duplicati-i664a0393d4494976a666897d8eb492af.dindex.zip.aes (10,47 KB)

  • Oct 10, 2019 10:56 PM: Backend event: Put - Started: duplicati-i664a0393d4494976a666897d8eb492af.dindex.zip.aes (10,47 KB)

  • Oct 10, 2019 10:56 PM: Failed to perform cleanup for missing file: duplicati-b995eb2af964f45d596b7827c212dffef.dblock.zip.aes, message: Repair not possible, missing 89 blocks. If you want to continue working with the database, you can use the “list-broken-files” and “purge-broken-files” commands to purge the missing data from the database and the remote storage.

  • Oct 10, 2019 10:56 PM: This may be fixed by deleting the filesets and running repair again

  • Oct 10, 2019 10:56 PM: duplicati-20191009T233246Z.dlist.zip.aes

  • Oct 10, 2019 10:56 PM: Repair cannot acquire 89 required blocks for volume duplicati-b995eb2af964f45d596b7827c212dffef.dblock.zip.aes, which are required by the following filesets:"

The source data did not change or move, so I wonder what’s with those 89 missing blocks? “rebuild-missing-dblock-files” is set in the global settings.

My goodness… I think I’d give up on SharePoint for the back end at this point :frowning:

1 TB space on Sharepoint (aka Onedrive Business) is part of Office 365, so it’s available for “free”. Additionally Office 365 “Germany” is using German data-centers and following more strict data privacy rules. The only German based cloud service coming even close to all the US based ones in function and price is Strato “Hidrive” (supported by Synology HyperBackup).

Tried a “Delete and Recreate” on the database. Result:

“No files were found at the remote location, perhaps the target url is incorrect?”

Connection works (Test Connection), files are there, Duplicati, log shows an empty list again. Restarting Duplicati didn’t help. Curious case.

This is probably hitting the SharePoint 5000 item list view limit. I’m not that familiar with the internals of the Microsoft product offerings, but If you’re not using Microsoft SharePoint v2, you could try that instead.

‘No files were found at the remote location’ using Onedrive for business has some discussion, and in this case switching to “v2” version helped. I’m not sure it will always. Logic and usecase for remote subfolders would be the long-term solution, but isn’t done yet. Meanwhile, large remote volume size is a workaround.

I tried to restore the back directly from the Sharepoint source. Same result, Duplicati claims to find no files. Tried to access the files from a Docker installation, same result.

Thing is, I was messing around with a Docker installation while the Windows PC uploaded. At one point I pointed the Docker installation to the same Sharepoint folder as the ongoing Windows installation. It may well be possible that I deleted some file that was needed for Duplicati to identify the backup.

I am not downloading the whole Sharepoint (Onedrive Business) folder via Onedrive application and will try to access the backup from the local Onedrive folder instead.

I remember to have read about that just a few days ago, so you are likely onto something. Still strange, though, that Duplicati’s “List” record is empty and thus claims to find no files. There are 8523 files present, so well above 5000.

Switching to “v2” is no option, because the idea here is to test how well Duplicati play with the “free” (as in part of Office 365) Onedrive Business Germany (even though this service won’t be around for too long anymore).

Large remote volume size can bring its own problems with Duplicati, but it would get around the 5000 files limit.

Adding to this: Other block based backup software usually stuff their files into lots of sub-folders instead of putting all of them into one large folder. In the light of this 5000 file list limit this seems like a very good idea/option.

“v2” is (I believe) just the Microsoft Graph API as seen in Migrating from Live SDK to Microsoft Graph. From a GUI point of view, the main difference is probably what screen login credentials get typed into.

What’s in Microsoft Graph? shows how it connects to roughly everything, including OneDrive Personal, and SharePoint (as in Office 365 Business). While I’m unsure it will help, I’m unclear why it’s no option.

If I understand correctly then I just have to get the auth ID link, which I did just a few days ago when I tried to set up WebDAV. I will give it a try tomorrow.

Office 365 Germany is limited in its features (or rather not updated to new ones). So not everything that works for the normal version is available. That’s why I am trying around to get some backup solution working via Sharepoint.

Edit: Just noticed that there is an “AuthID” button that brings me to the login screen…

Edit2: “Server error, close window and try again”

My my… This discussion went wild with so many issues.
Timur, the damage is done, but for the future, I recommend editing the original post to add more issues to the list.

On the other hand, I want to draw your attention that with the introduction of multithreading into Duplicati came along many new bugs. While the development continues and many other bugs are getting fixed, some of the multithreading issues are prohibitive to me to use newer versions. This is why my recommendation to you is to use Duplicati - 2.0.3.5_canary_2018-04-13 which is relatively solid, but does not have multithreading capability.

Finally, I suppose you already know how much work and time programming takes. Duplicati is a donation-supported project, which makes us lucky to be able to use it for so little cost or even for free if we want to.
If you can write code I invite you to get involved with the source and help out the community.