Duplicati 2 vs. Duplicacy 2

Hi!

Yes, there should be no huge difference in performance if designs are close enough. And again - let me say that this testing is not fully conclusive yet - I need to run similar tests on larger data source.

This level of compression impact is strange - the was plenty of CPU headroom. Testing is done on i5-3570 3.40GHz and I haven’t seen single core pegged and compression should utilize hyper-threading pretty well.
For backups, --zip-compression-level=1 is not bad actually if performance is a concern. And if backing up already compressed files, it may be actually recommended.

The behavior of --synchronous-upload="true" was a surprise for me as well and I remember looking at disk I/O chart and seeing better utilization when it was set to true. I am pretty sure I haven’t screwed up testing, but will re-test again.

I have some ideas on what profiling I can do on this, but first I want to try similar tests on a much bigger set of very large files - 15Gb with Windows Server 2016 iso, some .Net app memory dump Virtual Box vmdk file with CentOs and couple of rar files… Largest file is 5.5Gb, most files are non-compressible.

Will update this thread with the results.

That could explain it perhaps. If the disk is reading two different files (source and compressed file), maybe that reduces the throughput. But it should not matter as much as your results indicate.

I look very much forward to the results of this!

You’re doing such a great job so I don’t want to ask too much, but if you don’t mind, could you post your updated results directly in the actual post? It would make it much easier for people to find and read those results and they would be preserved as long as this forum exists.

It can be done via copy and paste: Here's a very easy way of creating tables on this forum

1 Like

What? Then why am I wasting my time hitting spaces all over the place to get my posts to line up??? :smiley:

Let’s see…

DT DC Feature
x * Free (*DC command line free for individual users, else per user or computer fee)
x x Client side software
Server side software

Oh, yeahhh…much better…

1 Like

@kenkendk - I will try to point temporary folder to another local drive and see if it changes the behavior… I also have to point out, that source and temp are located on relatively slow disk… This creates a lot of variables and still does not explain why CY is at least twice as fast. We should definitely try and find the bottleneck.

@tophee - I prefer working with real spreadsheet, of course - it allows much better formatting, calculations, notes, etc…
However, I will see if I can post brief summaries here and re-link to the Google Spreadsheet for detailed data.
I am not always record data systematically - quick tests may benefit from embedded tables.

@kenkendk, maybe that could be the default compression level if the majority of files in a block are compressed files such as zip or jpeg or docx?

Are you proposing a per-dblock (archive) compression setting based on the actual archive contents?

1 Like

Basically, I don’t know what I’m talking about :zipper_mouth_face:

I was going to propose the lower compression level for certain file types but then realised that duplicati is not zipping individual files and so I came up with this majority thing :smirk:

All compressed files are not re-compressed. There is a file (default_compressed_extensions.txt) which contains all extensions that are known to be un-compressable:

Content from files in that list are sent to the zip archive without compression, so that should not matter.

2 Likes

So the performance difference observed with --zip-compression-level=1 are based only on (attempted) compression of file types not on that list?

Yes. I even made a unittest to ensure that the compression excludes non-compressible files.

@dgcom can you confirm that the file extensions are not on the list?

1 Like

The files in my current data set are not on the list of compressed files, moreover, they compress pretty well - approx 50%.

Apologies for the delay with testing, but I wanted to make sure that I provide useful data here.

Since we all agree that there is a bottleneck of some sorts, I decided that current data set should be quite fine to help identify it. So, I automated the test a bit more (after all, it takes a lot of time!).
Initially, I thought to use Perfmon, but this would take more time (although with nicer charts), but I quickly realized that just screenshots from Sysinternals Process Explorer should be enough here to pinpoint an area needing improvement.

Writing entire report in the forum post is also not very feasible, so I used Google Presentation this time.
But to allow better searching and forum experience, I’ll add some info here as well.

I run each test at least 3 times (actually more than that) and captured timing and parameters (to be sure) every time.
Yes, using --synchronous-upload="true" does indeed pause execution and takes more time - I believe I swapped false/true value in my previous testing.
I also checked what would happen if I locate temp folder on another drive (BTW, Windows version uses TMP environment variable and ignores TEMP/TMPDIR). And, obviously, I tested lowest compression, no compression at all and no encryption in addition to no compression.
The picture from the table below is relatively clear - yes, TI can reach CY performance… if you disable compression… and possibly encryption - all probably depend on your CPU.
But if you browse through performance graphs I include in the linked document, you will realize that actual bottleneck is in the less efficient use of CPU, including calculating hash…
I briefly looked at the results of google:".net sha256 efficiency" search and found some interesting read there.

Anyway, I still want to see how large files behave, but this preliminary result here would help me to restrict and better manage larger data set testing…

Time Parameters
Run 1 Run 2 Run 3 Duplicati
0:09:07 0:08:48 0:08:51 --synchronous-upload="true"
0:07:11 0:07:47 0:07:22 --synchronous-upload="false"
0:06:20 0:06:36 0:06:21 --synchronous-upload="false"
Set TMP=D:\Temp (separate drive)
0:04:27 0:04:13 0:04:12 --synchronous-upload="false"--zip-compression-level=1
Set TMP=D:\Temp
0:03:31 0:03:35 0:03:37 --synchronous-upload="false"--zip-compression-level=0
Set TMP=D:\Temp
0:03:01 0:03:09 0:03:10 --synchronous-upload="false"--zip-compression-level=0 --no-encryption="true"
Set TMP=D:\Temp
Duplicacy
0:03:09 0:03:09 0:03:10 -threads 1
3 Likes

Sorry if this sounds obvious to some, but do I read these results correctly when I say: It’s the compression, stupid? (Although duplicacy still has an edge unless you turn off encryption too… )

That is really great work! And I am well aware that such setups take a long time to get up, and report on.

That is some comfort to me :slight_smile: That means we need to tweak things to get there, and re-consider the default settings.

There is an issue here that says that the default hashing implementation is inefficient, and that we should use another:

It should be trivial to switch over:

It does nothing on Mono, but no ill effects:

Yes, it appears that compression is some of the problem. From what I read, CY uses LZ4, which is considered “fast”:

I can add LZ4 support to Zip:

But unfortunately, writing LZ4 streams inside a zip archive will make it incompatible with most Zip tools.

I will make a versions that uses SHA256Cng, to see if that takes the edge of the CPU usage.

I also have the concurrent_processing branch, which I have been working on for improving error handling. With that branch, it should be possible to stream data as a fast as the disk can provide it, without any of the “pauses” you see.

We can also consider storing the zip file in memory, instead of writing it to disk (--write-volumes-to-disk ? ).

2 Likes

I have put up a new version (2.0.2.8) which now uses SHA256Cng as the hashing implementation: Releases · duplicati/duplicati · GitHub

1 Like

Never mind, I figured it out - I had the terminology backwards in my head :frowning_face:

original stupid question for posterity

I’m confused - is synchronous-upload disabled by default (i.e. if not using the advanced option setting at all)? Is there any reason?

Thank you, glad I can help here.

Oh, I see, glad it was discovered already. What is interesting, I see that .Net Core 2.0 have better managed implementation as well - wonder if this can be backported…
Performance Improvements in .NET Core

I got your latest build and tested it - results are below. It clearly shows very noticeable performance improvement, however there is still room to grow :slight_smile:
Implementing multithreaded processing and upload/download should help to offload those tasks from cores which are busy with hashing, compression and encryption - overall, backup programs should be I/O bound on sufficiently fast current CPU. But then you need to take care of keeping blocks in memory instead of trashing disk - but that is more complex because memory can grow significantly depending on parameters used.

2.0.2.6 2.0.2.8 Improve Parameters
0:08:55 0:08:19 6.66% –synchronous-upload=“true”
0:07:27 0:07:03 5.39% –synchronous-upload=“false”
0:06:26 0:06:02 5.98% –synchronous-upload=“false”
Set TMP=D:\Temp
0:04:17 0:03:47 11.78% –synchronous-upload=“false” --zip-compression-level=1
Set TMP=D:\Temp
0:03:34 0:02:50 20.87% –synchronous-upload=“false” --zip-compression-level=0
Set TMP=D:\Temp
0:03:07 0:02:42 13.39% –synchronous-upload=“false” --zip-compression-level=0 --no-encryption=“true”
Set TMP=D:\Temp
CPU was still pegged, but seems to be used more efficiently.

Looking forward to see all three major math points (hash, compression and encryption) optimized :slight_smile:
For now, I’ll use --zip-compression-level=1 for test on larger source set.

The optimization essentially fixes the problem that SHA256.Create() returns the slow implementation in .Net standard profile (which Duplicati uses). The fix I made was to change calls to HashAlgorithm.Create("SHA256") into calling my own method that returns the faster implementation.

The change has impact on Windows only, but I can see that there is supposed to be a faster OpenSSL based version for Linux, which I might be able to load also.

Thank you for clarification, @kenkendk
My tests against 15Gb source are finally running, hope to have results tomorrow.

In a mean time - since sha256 is improved, we the other two - encryption and compression may need a look.
Although disabling encryption when no compression is used does not show a lot of improvement anyway…
But what about zip? Have you thought about different implementations? Since it looks like compression is designed as module, it should be possible to add different implementation without removing the original. From my quick research it looks like DotNetZip beats SharpZipLib and ZipArchive can be even faster… Moreover, DotNetZip can do ParallelDeflateOutputStream…

I am asking, because I put together small table comparing the resulting backup/compression sizes and even that you mention CY uses “fast” compression, it is on the same level as TI default and uses much less CPU:

Test 1 size Bytes
InfoZip default 998,508,177
Default compression 1,005,213,845
Duplicacy 1,005,435,292
–zip-compression-level=1 1,054,828,415
–zip-compression-level=0 2,008,793,923
Source 2,026,607,926