Yes, there should be no huge difference in performance if designs are close enough. And again - let me say that this testing is not fully conclusive yet - I need to run similar tests on larger data source.
This level of compression impact is strange - the was plenty of CPU headroom. Testing is done on i5-3570 3.40GHz and I haven’t seen single core pegged and compression should utilize hyper-threading pretty well.
For backups, --zip-compression-level=1 is not bad actually if performance is a concern. And if backing up already compressed files, it may be actually recommended.
The behavior of --synchronous-upload="true" was a surprise for me as well and I remember looking at disk I/O chart and seeing better utilization when it was set to true. I am pretty sure I haven’t screwed up testing, but will re-test again.
I have some ideas on what profiling I can do on this, but first I want to try similar tests on a much bigger set of very large files - 15Gb with Windows Server 2016 iso, some .Net app memory dump Virtual Box vmdk file with CentOs and couple of rar files… Largest file is 5.5Gb, most files are non-compressible.
That could explain it perhaps. If the disk is reading two different files (source and compressed file), maybe that reduces the throughput. But it should not matter as much as your results indicate.
You’re doing such a great job so I don’t want to ask too much, but if you don’t mind, could you post your updated results directly in the actual post? It would make it much easier for people to find and read those results and they would be preserved as long as this forum exists.
@kenkendk - I will try to point temporary folder to another local drive and see if it changes the behavior… I also have to point out, that source and temp are located on relatively slow disk… This creates a lot of variables and still does not explain why CY is at least twice as fast. We should definitely try and find the bottleneck.
@tophee - I prefer working with real spreadsheet, of course - it allows much better formatting, calculations, notes, etc…
However, I will see if I can post brief summaries here and re-link to the Google Spreadsheet for detailed data.
I am not always record data systematically - quick tests may benefit from embedded tables.
I was going to propose the lower compression level for certain file types but then realised that duplicati is not zipping individual files and so I came up with this majority thing
All compressed files are not re-compressed. There is a file (default_compressed_extensions.txt) which contains all extensions that are known to be un-compressable:
Content from files in that list are sent to the zip archive without compression, so that should not matter.
Apologies for the delay with testing, but I wanted to make sure that I provide useful data here.
Since we all agree that there is a bottleneck of some sorts, I decided that current data set should be quite fine to help identify it. So, I automated the test a bit more (after all, it takes a lot of time!).
Initially, I thought to use Perfmon, but this would take more time (although with nicer charts), but I quickly realized that just screenshots from Sysinternals Process Explorer should be enough here to pinpoint an area needing improvement.
Writing entire report in the forum post is also not very feasible, so I used Google Presentation this time.
But to allow better searching and forum experience, I’ll add some info here as well.
I run each test at least 3 times (actually more than that) and captured timing and parameters (to be sure) every time.
Yes, using --synchronous-upload="true" does indeed pause execution and takes more time - I believe I swapped false/true value in my previous testing.
I also checked what would happen if I locate temp folder on another drive (BTW, Windows version uses TMP environment variable and ignores TEMP/TMPDIR). And, obviously, I tested lowest compression, no compression at all and no encryption in addition to no compression.
The picture from the table below is relatively clear - yes, TI can reach CY performance… if you disable compression… and possibly encryption - all probably depend on your CPU.
But if you browse through performance graphs I include in the linked document, you will realize that actual bottleneck is in the less efficient use of CPU, including calculating hash…
I briefly looked at the results of google:".net sha256 efficiency" search and found some interesting read there.
Anyway, I still want to see how large files behave, but this preliminary result here would help me to restrict and better manage larger data set testing…
Sorry if this sounds obvious to some, but do I read these results correctly when I say: It’s the compression, stupid? (Although duplicacy still has an edge unless you turn off encryption too… )
That is really great work! And I am well aware that such setups take a long time to get up, and report on.
That is some comfort to me That means we need to tweak things to get there, and re-consider the default settings.
There is an issue here that says that the default hashing implementation is inefficient, and that we should use another:
It should be trivial to switch over:
It does nothing on Mono, but no ill effects:
Yes, it appears that compression is some of the problem. From what I read, CY uses LZ4, which is considered “fast”:
I can add LZ4 support to Zip:
But unfortunately, writing LZ4 streams inside a zip archive will make it incompatible with most Zip tools.
I will make a versions that uses SHA256Cng, to see if that takes the edge of the CPU usage.
I also have the concurrent_processing branch, which I have been working on for improving error handling. With that branch, it should be possible to stream data as a fast as the disk can provide it, without any of the “pauses” you see.
We can also consider storing the zip file in memory, instead of writing it to disk (--write-volumes-to-disk ? ).
Oh, I see, glad it was discovered already. What is interesting, I see that .Net Core 2.0 have better managed implementation as well - wonder if this can be backported… Performance Improvements in .NET Core
I got your latest build and tested it - results are below. It clearly shows very noticeable performance improvement, however there is still room to grow
Implementing multithreaded processing and upload/download should help to offload those tasks from cores which are busy with hashing, compression and encryption - overall, backup programs should be I/O bound on sufficiently fast current CPU. But then you need to take care of keeping blocks in memory instead of trashing disk - but that is more complex because memory can grow significantly depending on parameters used.
CPU was still pegged, but seems to be used more efficiently.
Looking forward to see all three major math points (hash, compression and encryption) optimized
For now, I’ll use --zip-compression-level=1 for test on larger source set.
The optimization essentially fixes the problem that SHA256.Create() returns the slow implementation in .Net standard profile (which Duplicati uses). The fix I made was to change calls to HashAlgorithm.Create("SHA256") into calling my own method that returns the faster implementation.
The change has impact on Windows only, but I can see that there is supposed to be a faster OpenSSL based version for Linux, which I might be able to load also.
Thank you for clarification, @kenkendk
My tests against 15Gb source are finally running, hope to have results tomorrow.
In a mean time - since sha256 is improved, we the other two - encryption and compression may need a look.
Although disabling encryption when no compression is used does not show a lot of improvement anyway…
But what about zip? Have you thought about different implementations? Since it looks like compression is designed as module, it should be possible to add different implementation without removing the original. From my quick research it looks like DotNetZip beats SharpZipLib and ZipArchive can be even faster… Moreover, DotNetZip can do ParallelDeflateOutputStream…
I am asking, because I put together small table comparing the resulting backup/compression sizes and even that you mention CY uses “fast” compression, it is on the same level as TI default and uses much less CPU: