Duplicati 2 vs. Duplicacy 2

I commented on the issue on github.

I will test this new build and provide the results. I’ll use SMB share for the destination - that should remove extra variables from testing.

Interesting. I’d like to test this, but how often you’ll have to backup 1Tb file? Would this difference be really noticeable on a bunch of, let’s say, 4Gb files?

Here’s where a “scan your files and suggest settings” profiling “wizard” might be useful. :slight_smile:

I tested Duplicati Canary and added test results to the spreadsheet.
This time I used locally attached USB 2.0 drive and used single up/down thread for CY compare.
And CY still noticeably faster, even with single thread.

I do not see much of a change between --use-block-cache="true" and not setting this option.
On backup, time is actually spent reading source files and there is no CPU bottleneck.

I’ll see if I can dig more into why TI is much slower in comparable configuration.

Some more testing did not reveal any real help from --use-block-cache="true"
The only way I was able to speed up backup was to enable --synchronous-upload="true"
Restore with --no-local-db="true" is a bit faster compared to two separate operations.
I ditched VSS since it may take variable time and also measured compression and encryption impact on the backup.
Here are some of the results:

00:08:49.995    --synchronous-upload="true" --no-backend-verification="true" - COMMON for below
00:08:42.335    --use-block-cache="true" 
00:07:23.040    --no-encryption="true"  --use-block-cache="true"
00:06:08.722    --zip-compression-level=1 --use-block-cache="true"
00:05:04.746    --zip-compression-level=1  --no-encryption="true" --use-block-cache="true"

00:01:52.741    --no-local-db="true" --no-local-blocks="true --skip-restore-verification="true" - COMMON for below
00:01:45.409    --use-block-cache="true"

00:00:18.345    DB repair

00:01:32.149    --dbpath=<repaireddb> --use-block-cache="true" --no-local-blocks="true --skip-restore-verification="true" 

These results are still noticeably worse than CY :frowning:

As suggested before, I’ll try backup of very large files instead of many small ones.

Thanks @dgcom for trying that out, much appreciated.

I am a bit surprised that TI is that much slower, and I guessed at maybe the in-memory lookup table was reason, but it should not matter a lot, since most lookups will fail (the block is new), and the database is using a log(n) lookup time anyway. Your results show that the database is indeed fast enough (at least on an SSD).

Compared to CY there are not many differences, so I think TI should be able to reach similar speeds.

CY stores all blocks “as-is” on the remote store (in some cases using folders to reduce the number of files in a single folder).
TI stores files inside archives to reduce the number of remote files and requests.

CY keeps a cache of the remote data locally on-disk.
TI keeps a cache/lookup of the remote data in a database.

CY uses a flexible block width (content defined chunking), TI uses a fixed block width and a block-of-blocks.
They both use SHA256 as the hashing algorithm.

I see a big jump when you lower the compression level, so maybe the bottleneck is really the zip compression.
The speedup from --synchronous-upload is a bit strange, as it just “pauses” the backup while there is an upload happening.



Yes, there should be no huge difference in performance if designs are close enough. And again - let me say that this testing is not fully conclusive yet - I need to run similar tests on larger data source.

This level of compression impact is strange - the was plenty of CPU headroom. Testing is done on i5-3570 3.40GHz and I haven’t seen single core pegged and compression should utilize hyper-threading pretty well.
For backups, --zip-compression-level=1 is not bad actually if performance is a concern. And if backing up already compressed files, it may be actually recommended.

The behavior of --synchronous-upload="true" was a surprise for me as well and I remember looking at disk I/O chart and seeing better utilization when it was set to true. I am pretty sure I haven’t screwed up testing, but will re-test again.

I have some ideas on what profiling I can do on this, but first I want to try similar tests on a much bigger set of very large files - 15Gb with Windows Server 2016 iso, some .Net app memory dump Virtual Box vmdk file with CentOs and couple of rar files… Largest file is 5.5Gb, most files are non-compressible.

Will update this thread with the results.

That could explain it perhaps. If the disk is reading two different files (source and compressed file), maybe that reduces the throughput. But it should not matter as much as your results indicate.

I look very much forward to the results of this!

You’re doing such a great job so I don’t want to ask too much, but if you don’t mind, could you post your updated results directly in the actual post? It would make it much easier for people to find and read those results and they would be preserved as long as this forum exists.

It can be done via copy and paste: Here's a very easy way of creating tables on this forum

1 Like

What? Then why am I wasting my time hitting spaces all over the place to get my posts to line up??? :smiley:

Let’s see…

DT DC Feature
x * Free (*DC command line free for individual users, else per user or computer fee)
x x Client side software
Server side software

Oh, yeahhh…much better…

1 Like

@kenkendk - I will try to point temporary folder to another local drive and see if it changes the behavior… I also have to point out, that source and temp are located on relatively slow disk… This creates a lot of variables and still does not explain why CY is at least twice as fast. We should definitely try and find the bottleneck.

@tophee - I prefer working with real spreadsheet, of course - it allows much better formatting, calculations, notes, etc…
However, I will see if I can post brief summaries here and re-link to the Google Spreadsheet for detailed data.
I am not always record data systematically - quick tests may benefit from embedded tables.

@kenkendk, maybe that could be the default compression level if the majority of files in a block are compressed files such as zip or jpeg or docx?

Are you proposing a per-dblock (archive) compression setting based on the actual archive contents?

1 Like

Basically, I don’t know what I’m talking about :zipper_mouth_face:

I was going to propose the lower compression level for certain file types but then realised that duplicati is not zipping individual files and so I came up with this majority thing :smirk:

All compressed files are not re-compressed. There is a file (default_compressed_extensions.txt) which contains all extensions that are known to be un-compressable:

Content from files in that list are sent to the zip archive without compression, so that should not matter.


So the performance difference observed with --zip-compression-level=1 are based only on (attempted) compression of file types not on that list?

Yes. I even made a unittest to ensure that the compression excludes non-compressible files.

@dgcom can you confirm that the file extensions are not on the list?

1 Like

The files in my current data set are not on the list of compressed files, moreover, they compress pretty well - approx 50%.

Apologies for the delay with testing, but I wanted to make sure that I provide useful data here.

Since we all agree that there is a bottleneck of some sorts, I decided that current data set should be quite fine to help identify it. So, I automated the test a bit more (after all, it takes a lot of time!).
Initially, I thought to use Perfmon, but this would take more time (although with nicer charts), but I quickly realized that just screenshots from Sysinternals Process Explorer should be enough here to pinpoint an area needing improvement.

Writing entire report in the forum post is also not very feasible, so I used Google Presentation this time.
But to allow better searching and forum experience, I’ll add some info here as well.

I run each test at least 3 times (actually more than that) and captured timing and parameters (to be sure) every time.
Yes, using --synchronous-upload="true" does indeed pause execution and takes more time - I believe I swapped false/true value in my previous testing.
I also checked what would happen if I locate temp folder on another drive (BTW, Windows version uses TMP environment variable and ignores TEMP/TMPDIR). And, obviously, I tested lowest compression, no compression at all and no encryption in addition to no compression.
The picture from the table below is relatively clear - yes, TI can reach CY performance… if you disable compression… and possibly encryption - all probably depend on your CPU.
But if you browse through performance graphs I include in the linked document, you will realize that actual bottleneck is in the less efficient use of CPU, including calculating hash…
I briefly looked at the results of google:".net sha256 efficiency" search and found some interesting read there.

Anyway, I still want to see how large files behave, but this preliminary result here would help me to restrict and better manage larger data set testing…

Time Parameters
Run 1 Run 2 Run 3 Duplicati
0:09:07 0:08:48 0:08:51 --synchronous-upload="true"
0:07:11 0:07:47 0:07:22 --synchronous-upload="false"
0:06:20 0:06:36 0:06:21 --synchronous-upload="false"
Set TMP=D:\Temp (separate drive)
0:04:27 0:04:13 0:04:12 --synchronous-upload="false"--zip-compression-level=1
Set TMP=D:\Temp
0:03:31 0:03:35 0:03:37 --synchronous-upload="false"--zip-compression-level=0
Set TMP=D:\Temp
0:03:01 0:03:09 0:03:10 --synchronous-upload="false"--zip-compression-level=0 --no-encryption="true"
Set TMP=D:\Temp
0:03:09 0:03:09 0:03:10 -threads 1

Sorry if this sounds obvious to some, but do I read these results correctly when I say: It’s the compression, stupid? (Although duplicacy still has an edge unless you turn off encryption too… )

That is really great work! And I am well aware that such setups take a long time to get up, and report on.

That is some comfort to me :slight_smile: That means we need to tweak things to get there, and re-consider the default settings.

There is an issue here that says that the default hashing implementation is inefficient, and that we should use another:

It should be trivial to switch over:

It does nothing on Mono, but no ill effects:

Yes, it appears that compression is some of the problem. From what I read, CY uses LZ4, which is considered “fast”:

I can add LZ4 support to Zip:

But unfortunately, writing LZ4 streams inside a zip archive will make it incompatible with most Zip tools.

I will make a versions that uses SHA256Cng, to see if that takes the edge of the CPU usage.

I also have the concurrent_processing branch, which I have been working on for improving error handling. With that branch, it should be possible to stream data as a fast as the disk can provide it, without any of the “pauses” you see.

We can also consider storing the zip file in memory, instead of writing it to disk (--write-volumes-to-disk ? ).