Various positive + negative feedback

TimurBorn · October 5, 2019, 5:20pm

Hello everyone!

This is the second time in over two years that I take a look at Duplicati and this time around I intend to make good use of it. Generally I like the idea of deduplication based backups, in the past I also looked at Arq and similar software. So back to Duplicati and what I noticed after a day of trying it out.

Positive:

The UI is overall clean, simple and let’s me do what I want it to do. The advanced options are somewhat awkward to use, but as a user I am more interested in the options being present inside the UI rather than them looking pretty (instead of having to edit config files or rebuild my own code base to get there).
D2 is able to connect to the Office 365 German based Sharepoint server. This is somewhat special, because those German based servers differ in features compared to normal Onedrive/Office 365 servers. Even better, D2 is able to fill my full upload bandwidth of about 32 mbit (3.8 mb/s).
It is possible to set up file extensions for files that are not meant to be compressed, usually because they are already compressed anyway. Unfortunately this is done via editing of a config file, but better than the option not being present. I specifically checked whether this feature is case insensitive and fortunately it is (.JPG = .jpg = Jpg).
Compression level and even compression type (Deflate, LZMA…) is user configurable, even per backup job.
Compression is multi-threaded, making use of multiple CPU cores. While this might seem obvious, many (most?) image based backup programs still compress their images using only a single thread/core, thus being bottle-necked by higher compression levels.
Overall there is lots of user control, including such settings as chunk and block size. This is great for those of us who like to (ab)use such options.
User data folders in D2 properly follow folder redirection (i.e. Pictures pointing to a NAS share instead of a local folder). Local drive letters that point to network shares are also properly usable.
The progress bar tells me useful stuff, like the current file number, throughput (kind of) and how much data still has to be processed. The backup job gives additional information of which file is currently processed (including percentage of said file).

That is, until it does not tell me any useful stuff anymore, which brings us to

Negative:

The quality of the progress bar information is inconsistent:

Throughput numbers of the progress bar seem mostly useless. Displaying something like 4 kb/s when real I/O throughput if over 40 mb/s.

The progress bar becomes entirely useless when “Stop after upload” is chosen (see below).

Once all files are finished being created, before the backup is finished, the remaining file size turns negative (into absurd numbers).

When “Stop now” is used then the progress bar stops showing any information of what is still going on in the background. The current batch of volumes are still created and uploads are still happening, the database is still locked, but the bar shows nothing of it.

Graceful cancellation of an ongoing job does not seem possible:

“Stop after upload” seems useless. It keeps finishing the whole backup job, uploading all data, which may be hundreds of gigabytes taking hours and days. I would expect it to only finish uploading those files and volumes it already began working on and then do what I told it to do: stop the backup job.

“Stop now” does not stop now! As mentioned above it keeps creating some files and keeps uploading some data. It even keeps the database locked, sometimes seemingly forever, so that I have to quit Duplicate to unlock the database again. Even worse, it does not clean up after itself, leaving hundreds of megabytes of data inside the temp folder.

The “Pause” button is confusing and useless for ongoing backups. As a new user I kept thinking that it would pause the current backup job, but it does not. Instead it kind of pauses the Duplicati server from doing further backup jobs, but it keeps running ongoing backup jobs instead of pausing them. It told me over half a dozen times that it cannot do some “HTTP stuff”, which was solved by just clicking a second time.
Compression level cannot be set for compression type LZMA, which is a shame. The default level seems to correspond to 7Z using around level 3/fast. This is fine for many cases, but sometimes using level 5/normal wields considerably better results, especially when you upload a backup to the cloud with the upload bandwidth being the main culprit.
The list of raw image files in the compression-extension-file is too short. It should include all extensions listed in the raw image Wikipedia article.

On the other hand it should not include “.tif”, because many TIFF files are not compressed due to compatibility and performance + efficiency reasons.

Even those that are compressed are often compressed badly when created by software other than Photoshop. LZW compressed TIFF files usually can be compressed further, at least the 16 bit per color channel files that are bigger using LZW than being uncompressed. ZIP compressed TIFF files often are compressed badly by software other than Photoshop, but unfortunately these cannot be compressed further unless first being decompressed. Overall LZMA excels in compressing TIFF files (RAR being even better), but even Deflate can squeeze about 10% out of them.

Compression does not make full use of multi-threading. The default of 8 compression “processes” (means threads?) creates around 50% total CPU load on my 9900K with 16 logical cores. Testing on a single 1.9 gb TIF file reveals that the default of 8 runs faster than using both 4 (25% load) or 16 processes (upto 80% load). Using 7Z on the same source file results in close to 100% CPU load and is considerably faster than Duplicate.

8: 1:55 min
4: >2:20 min
16: >2:10 min
7Z: 0:40 min

The finishing process is bottle-necked by being mostly single-threaded (1 core) only. That is when the progress bar reaches 100%, remaining data size turns negative and Duplicity seems to do some finishing touches that can take quite some time because of this CPU bottleneck.
During the backup process D2 takes regular brakes for several seconds when nothing seems to happen. There is no bottle-necking CPU load, no SSD load, no network traffic, nothing. This happens regardless of whether the currently progressed file is compressed or not and it happens regardless of the “use-block-cache” option.

Damages backups cannot be repaired:

When all files of a backup are deleted, D2 only displays an error that tells me to repair the backup. When I click on Repair it does nothing but send me to the home screen. When I try to repair the database via the “Database” option it does the same thing. When I try to “Recreate (Delete + Repair)” it does not work. Only once I specifically hit “Delete” I can start over. I would at least have expected for “Recreate” to do the same thing in one click. And some better communication about errors from Duplicati’s side would be appreciated.

When a single file of a backup is deleted, D2 does the same as above! The only difference being that in the “Database” option it specifically tells me to turn on the “rebuild-missing-dblock-files” advanced option. Alas, once I do that I finally can hit the “Repair” button for something to happen. But what happens is that I get an unspecific “Error: 1” message and that’s it. No rebuilding happening despite the backup being local, from a single local source file that did not change.

The Synology package does not seem to work at all. The server seemingly keeps crashing when I try to access the UI (connection lost). I cannot reconnect unless I close the UI windows, but then it happens within a few seconds again.

That’s all from the top of my head. Should be plenty enough to chew on anyway.

Overall: Well done! But also: Way to go…

TimurBorn · October 5, 2019, 6:06pm

“Visit us on” in the lower right corner includes no link, so no visiting possible.

drwtsn32 · October 5, 2019, 6:09pm

Great feedback!

I believe a fix I submitted resolves this particular problem, but it hasn’t made it into the beta channel yet. What version are you testing?

Are you using the Synology Mono (beta) package? Newer versions of Duplicati require Mono 5.x or later, and that Synology “official” package is woefully out of date. This may not be your problem if you are using the Duplicati beta version (but it will be for the next beta release). Some have switched to a third party Synology Mono package, and others (like myself) switched to using Duplicati Docker container on Synology.

That’s all I have time to comment on at the moment. I know some of your other observations have been resolved in the latest canary releases but as you noted there still are some things that need to be worked out.

TimurBorn · October 5, 2019, 6:24pm

Thanks for the quick reply. I am testing 2.0.4.23 for Windows. I will take a look into the latest canary release.

You are correct that I downloaded the Synology Mono Beta, so that may be a culprit.

TimurBorn · October 5, 2019, 9:10pm

Quick feedback for the 2.0.4.30_canary_2019-09-20:

The number temp volumes being created equals the number of concurrent compression processes (threads). So by default 8 temp volumes are created, instead of 4. I tried 15 compression threads and 15 temp volumes were created accordingly.

I tried this with a single 750 mb MOV file using 200 mb volume size and 8 compression threads. The size of the D2 LZMA volumes increased until about 8 x 95 mb = 760 mb (why larger than source, with MOV not being compressed?), which is the final size of the D2 backup. Once all temp volumes reached 95 mb D2 merged volumes into 3 x 200 mb + 1 x 160 mb ones. This final step happened using only a single CPU core and thus was heavily bottle-necked and taking its time.

Using 50 mb volume sizes helped to get rid of the bottle-necked merging part, but the very last volume is still created using a single core. This explains my former observation about the slow finishing/stopping of backups on a single core.

I would have preferred to see single volumes being compressed by multiple threads, instead of multiple volumes being compressed by single threads. Users without spinning platters as temp drives likely would also benefit additionally from that.

“Stop Now” does the same as “Stop after current file” now.
“Stop after current file” really seems to stop after the current file now, which effectively means that it keeps uploading all current temp volumes. It seems to clean up afterwards, but I saw temp volumes remaining some cases.
Both the Beta and Canary lack a “Create New Folder” option for the destination dialog. Else when a folder is created by other means we have to go through some hoops to make D2 recognize the new folder.

TimurBorn · October 5, 2019, 9:24pm

I tried again using my 1.9 GB compressible TIFF file. The odd behavior is happening when LZMA is used instead of Deflate.

With Deflate 8 x 200 mb volumes are created and moved to the destination, then a single smaller final volume is created on a single core.

With LZMA 8 x 156 mb volumes are created (=final size), then only the last is grown to full 200 mb size and moved. Afterwards new volumes are created starting at 156 mb and then completed to 200 mb before being moved. This is done one volume at a time, thus only using a single core at a time.

As a consequence LZMA is badly hampered by this odd temp volume behavior.

TimurBorn · October 6, 2019, 11:59pm

Hm, I am still not quite sure how temp files and destination files work.

For testing I started a (Deflate) backup from a NAS source (connected via 1 Gbit Ethernet) to my local SSD drive. 50 mb volume size, 500 kb chunk size. The files are read from the NAS at less than 400 mbit = around 32 mb/s according to Duplicati. At the same time SSD write throughput created by D2 is close to 130 mb/s on the local drive.

How does D2 create so much local write throughput out of so little source/network read throughput?

And why is read performance so low? My network connection can transfer the large media source files at around 110 mb/s, yet Duplicati only reads them at 32 mb/s?!

Average CPU load is rather low accordingly, albeit most of the files are already compressed anyway and thus included in the “default_compressed_extensions” file.

TimurBorn · October 7, 2019, 8:38am

I am currently testing a Restore operation from local drive to NAS.

When Duplicati says: “Downloading files” while creating files in the local temp directory, is it decompressing/deduplicating those files or is it just copying from source to temp?

I ask because in this case source and temp drive are the very same, so the latter would be unnecessary.

Is deduplication used on all files, including those that are marked by “default-compression-extensions”?
I noticed several Duplicati files in its local database folder that are two days old. These are named “Backup” or “Sicherung” (German word), while the current database uses a random characters name. Since I hit “Delete” database a lot during tests and at one point deleted all backup jobs and uninstalled Duplicati, I wonder why these are still present?
The “Cancel” button does not seem to do anything when I try to cancel a restore. The “X” button to stop the current process does work, though.

TimurBorn · October 7, 2019, 9:03am

There are no “Advanced” Restore options. Consequently there is no direct way to tell Duplicati to use “no-local-blocks” for a specific restore operation.

For an existing backup configuration the workaround is to change the corresponding option in the backup job before doing a restore.

For a “Direct restore from backup files” the workaround is to change the “Default options” via “Settings”.

Unsurprisingly I would prefer to get direct access from the Restore dialog to such advanced options.

TimurBorn · October 7, 2019, 9:06am

There does not seem to be a way to manually delete old backup versions, even less so specific backup versions?

A workaround for the former is to change the configuration retention plan and then manually starting a backup.

I did not find a workaround for the latter, though. Being able to manually delete specific versions has merit when one or more specific versions grew unnecessary large. It sometimes happens that you accidentally drop files on the source that are not meant to be there, when they are deleted there should be a way to get them out of the backups, too (especially for space/size reasons).

TimurBorn · October 7, 2019, 9:10am

D2 does not properly clean up after a Restore. I see 7 temporary files (300 mb total) being left in my temp folder after various restore tests. These may be temporary database files that were created when I restored directly from the backup files (instead of a present configuration).

drwtsn32 · October 7, 2019, 12:49pm

Wow you have provided a LOT of feedback. I wanted to respond to a few things that I can comment on pretty quickly.

Compressing files manually with 7-Zip isn’t directly comparable to backing up with Duplicati. 7-Zip is just doing raw compression. Duplicati is breaking blocks into chunks, hashing, checking the hash against known blocks, packaging blocks into volumes, etc.

That being said the developers are looking at ways to improve concurrency, reduce bottlenecks, and otherwise do what is possible to improve performance.

Undoubtedly something is happening. If you do not see any network traffic, I’m guessing it’s some operation that is currently limited to a single core (of which there are many). On a 16-core system such as yours, you may see the Duplicati process using ~6% CPU. In the Web UI, go to About, Show Log, Live, and select Verbose to see what’s going on under the hood.

Just to be clear, you deleted all the “back end” files? Yeah there is nothing for Duplicati to work with - no way to “repair” in this case.

There are many parts of Duplicati currently limited to single cores. The goal is to add multithreading where ever possible, but it can be challenging from a software design point of view.

These commands have some issues that I’m not sure have been fully worked out yet. There are known open issues on Github.

I agree that would be useful.

I don’t have experience with LZMA (as I only use the default zip compression). But if your volume size is set to 200MB then Duplicati will repackage blocks before uploading so that the volumes are 200MB. It is trying to keep as few files on the back end as possible.

“Downloading files” means it is transferring back end files from the backup destination. In cases where the back end is the same computer, then yes, I suppose this is perhaps an unnecessary step. Maybe Duplicati could be made more intelligent so it knows it can read the files directly - it would have to also not delete the files when done (which it normally does after downloading the files from the remote side).

Yes.

“Backup” files are created whenever you switch to a newer Duplicati version and the database schema is upgraded. Duplicati makes a backup “just in case.” Right now there is no mechanism built in to Duplicati to delete these files - they need to be deleted manually.

I agree, that should be there for the “Direct restore from backup files” option.

You can with the Delete command.

Via the web UI - click a backup set, click Commandline, select “delete” from the Command dropdown, scroll to the bottom and select “version” from the Add Advanced Option dropdown. This will add “version” to the list of defined parameters. Now scroll up to find the “version” setting and enter the backup versions to delete. 0 = most recent, 1 = next oldest, etc. You can enter a value like “5,10-15” to delete multiple versions or spans of versions.

You can also use the command line utility.

It should clean up temp files after you complete an operation. I think if you are doing a direct restore from backup files, have it generate a partial database (to show you file selection dialogs), but then don’t complete the restore, some time files may be left over.

I have seen code in Duplicati where it will go through and delete old temp files. I think it’s configured to delete temp files over 1 day old. I’ll have to see if I can find it…

TimurBorn · October 7, 2019, 10:26pm

Compressing files manually with 7-Zip isn’t directly comparable to backing up with Duplicati. 7-Zip is just doing raw compression. Duplicati is breaking blocks into chunks, hashing, checking the hash against known blocks, packaging blocks into volumes, etc.

Yes, but my point was, that 7-Zip can make use of all logical CPU cores for compressing a single destination file and be much faster. Duplicati rather seems to use single cores for single files (1 core per concurrent temp volume) and thus not only cannot make use of Hyperthreading cores, but is slower in the process. Performance problems get much worse when large volume sizes are used (see below).

Undoubtedly something is happening. If you do not see any network traffic, I’m guessing it’s some operation that is currently limited to a single core (of which there are many).

Unfortunately, no. The upper HWInfo bar is monitors highest single core load. 100% on that bar is 100% load on a single core (=6% total load). As you can see it never even gets close to loading a single to full extend. It might be possible, though, that Windows’ thread scheduler keeps switching cores so fast that it doesn’t monitor properly. I will retest this using the Canary and real thread load via Process Explorer and maybe check different CPU affinity settings.

There are many parts of Duplicati currently limited to single cores. The goal is to add multithreading where ever possible, but it can be challenging from a software design point of view.

Again, Duplicati seems to use a single thread per single temporary file. The default of 8 concurrent compression threads creates 8 temporary files to begin with. While this could be seen as multi-threading, in reality it’s multi I/O instead, which creates its very real set of performance issues.

This gets worse when the destination is on the same drive as the temporary folder, even more so with large volume sizes. I saw disk queue numbers of up to over 32, which bottle-necks my (temp + destination) SSD to the point where CPU cannot be fully loaded with compression tasks anymore.

Overall there is too much I/O happening and too little (if any) multi-threading. Duplicati (and other backup software) should compress single temporary files using multi-threaded compression, just like any compressing software like Zip or 7Zip would do. This would keep I/Os lower and make better use of CPU resources.

But if your volume size is set to 200MB then Duplicati will repackage blocks before uploading so that the volumes are 200MB. It is trying to keep as few files on the back end as possible.

The problem is not so much in the destination volume sizes, but in that D2 first creates 8 (=concurrent compression threads) temporary files of a size that fits the destination size if possible. I tried this with 500 mb and 4000 mb sizes, which demonstrates it even better and also shows that it also happens with Deflate. Here is an example:

4000 volume size, 7.1 gb final backup size = D2 creates 8 files of 0,8875 gb first, high disk I/O (queue sizes) happen that create bottle-necks. Once these files are finished D2’s progress bar claims that 0 bytes (and some files) are to go, even though not a single byte/file has been written to the destination folder yet, it’s all still temporary.

At this point D2 begins merging the smaller temporary files into the large 4000 mb volumes to then copy them to the destination folder. This last bit is entirely single-threaded = only 1 CPU core being used. So we are waiting for 7.1 gb of data being processed and moved progress while the progress bar claims 0 bytes (but finally displays throughput).

What I take from this is that for the time being small volume sizes have to be used for better backup performance.

Via the web UI - click a backup set, click Commandline, select “delete” from the Command dropdown, scroll to the bottom and select “version” from the Add Advanced Option dropdown. This will add “version” to the list of defined parameters. Now scroll up to find the “version” setting and enter the backup versions to delete. 0 = most recent, 1 = next oldest, etc. You can enter a value like “5,10-15” to delete multiple versions or spans of versions.

Thanks. So this needs its own UI for easier deletion of old versions and display of version sizes/information, similar to Synology’s HyperBackup version list. Entering some backwards version number does not reveal any information on which version is the offending one that may need deletion. In practice it can happen that a user fills a source folder with lots of data by accident, this is all backed up then until it’s recognized and deleted, at which point old backup versions are still holding all the unneeded extra data.

Two more issues I noticed:

“compression-extension-file” seems to be ignored for compression method LZMA where it’s needed even more (LZMA is slower than Deflate).
D2 would benefit from being more intelligent/automatic with “no-local-blocks”, even if it just means to set this by default for local backup destinations (no internet transfer necessary). For testing I restored from my local SSD to a NAS destination that also was the original backup source. This led to the the situation that D2 was reading local blocks from the NAS just to write back to the same NAS, all over Ethernet instead of reading from the faster local SSD backup files instead. I will set “no-local-blocks” by default now, as I am not using cloud storage that bills by downloaded data volume anyway.

Last but not least: In D2’s help text I read the warning that low I/O priority would lead to slower performance. But I thought that I/O priority worked similar to CPU priority in that you get close to full performance when no other load happens. Alas, D2’s throughput dropped to around 25% with low I/O priority, which is quite dramatic and good to know beforehand.

TimurBorn · October 7, 2019, 11:40pm

When Sharepoint is chosen as destination then the Sharepoint advanced options do not become available in point 5 “Options” until the new backup job is saved and edited again. They are available in point 2 “Destination”, though. So it’s more of a confusing change than a bug.

TimurBorn · October 7, 2019, 11:54pm

When I a duplicate “Default options” is created (by accident) then a simple message tells about the duplicate option and no recent changes are saved. This just caused me to choose all defaults again for no apparent reason. D2 could just remove the duplicate entry and save the rest.

drwtsn32 · October 8, 2019, 12:24am

That’s exactly what I was wondering. I would use the basic Task Manager and just watch the details tab while Duplicati appears to be “stalled”. See what it says for CPU utilization. I don’t think you need to use Process Explorer or mess with CPU affinity at all to test this.

Yeah, I could see that being a future improvement. I am thinking most people don’t try to delete specific versions, but instead just set a retention policy and be done with it.

I think Duplicati assumes that what is being backed up is local. It’s usually more efficient to back up at the source. Have you had any more success getting Duplicati to run directly on your Synology? It may or may not be more efficient depending on hardware specs of your NAS and other factors.

I didn’t respond to the other parts of your comment simply because I’m not familiar enough with inner workings of Duplicati. I’d love to see a core developer respond!

TimurBorn · October 8, 2019, 12:43am

Thanks for your answers so far. I am now in the process of uploading another 200 gb backup to Sharepoint (Onedrive Business Germany). There are no stalls yet, so maybe that only starts after a few hours of upload. Will check tomorrow when I get up.

What I noticed:

There is no message when the Sharepoint path is wrong, D2 just keeps uploading into the nether.

I accidentally forgot to include the “/personal/<mail_adress>” part in front of the “/documents” part and then also forgot to include the last subfolder after “/documents/Duplicati”. Despite this URL not existing, D2 did upload in a constant stream of 3.8 mb/s.

No idea where all that uploaded data went before I noticed, but it’s not to be found on my Sharepoint server. I specifically accessed Sharepoint via PDF XChange Editor’s Sharepoint plugin, which allows to browse the Sharepoint structure above “/documents” (not above /personal/<email_adress> though).

TimurBorn · October 8, 2019, 12:51am

That’s exactly what I was wondering. I would use the basic Task Manager and just watch the details tab while Duplicati appears to be “stalled”. See what it says for CPU utilization. I don’t think you need to use Process Explorer or mess with CPU affinity at all to test this.

I mean to remember that CPU load in Task-Manager was below 6%, but I am not entirely sure. Let’s see how the Canary behaves, since it’s more recent than the Beta I began testing with.

Yeah, I could see that being a future improvement. I am thinking most people don’t try to delete specific versions, but instead just set a retention policy and be done with it.

I had to manually delete Synology HyperBackup versions in the past when a client put large media files into one sub-directory that was part of the backup. I excluded said sub-directories for ongoing backups, but also needed to get rid of hundred of gb worth of useless data from the already present backup files. This meant to delete versions from within a specific time-span only, while keeping older backup data alive.

Have you had any more success getting Duplicati to run directly on your Synology? It may or may not be more efficient depending on hardware specs of your NAS and other factors.

Still have to try the extra Mono Synology installation again and look into Docker. CPU and I/O load can be a problem for local backups, especially with mechanical platter drives being used in the Synology. But my current main usage/test case is to backup my encrypted personal photo library to Sharepoint, which is bottle-necked to 3.8 mb/s by my upload bandwidth anyway. That should pose no problem for the rather tiny NAS CPU (DS218+) and spinning harddrives.

drwtsn32 · October 8, 2019, 1:02am

My backup strategy now is to back up all my systems to my Synology (and the Synology backs up to itself), and then use CloudSync to synchronize ALL of this backup data to Backblaze B2 (to satisfy my off-site requirement). It is working great. My NAS is model 1517+, by the way.

I recall the initial backups were slow, both in Duplicati processing time and also synchronizing to B2. But once I was over that initial hump backups are working very quickly and nicely.

TimurBorn · October 8, 2019, 1:22am

Neither CloudSync nor HyperBackup offer access to my Sharepoint drive. I found a website offering a “basic-to-sharepoint-auth-http-proxy Docker image” that should allow HyperBackup to access Sharepoint via WebDAV.

I will give both D2 and that workaround another try on the DS218+.

For local backups I am using an external USB NTFS drive connected to the NAS. But I want my most important files to be off-site (encrypted).