Duplicati 2 vs. Duplicacy 2

Basically, I don’t know what I’m talking about :zipper_mouth_face:

I was going to propose the lower compression level for certain file types but then realised that duplicati is not zipping individual files and so I came up with this majority thing :smirk:

All compressed files are not re-compressed. There is a file (default_compressed_extensions.txt) which contains all extensions that are known to be un-compressable:

Content from files in that list are sent to the zip archive without compression, so that should not matter.

2 Likes

So the performance difference observed with --zip-compression-level=1 are based only on (attempted) compression of file types not on that list?

Yes. I even made a unittest to ensure that the compression excludes non-compressible files.

@dgcom can you confirm that the file extensions are not on the list?

1 Like

The files in my current data set are not on the list of compressed files, moreover, they compress pretty well - approx 50%.

Apologies for the delay with testing, but I wanted to make sure that I provide useful data here.

Since we all agree that there is a bottleneck of some sorts, I decided that current data set should be quite fine to help identify it. So, I automated the test a bit more (after all, it takes a lot of time!).
Initially, I thought to use Perfmon, but this would take more time (although with nicer charts), but I quickly realized that just screenshots from Sysinternals Process Explorer should be enough here to pinpoint an area needing improvement.

Writing entire report in the forum post is also not very feasible, so I used Google Presentation this time.
But to allow better searching and forum experience, I’ll add some info here as well.

I run each test at least 3 times (actually more than that) and captured timing and parameters (to be sure) every time.
Yes, using --synchronous-upload="true" does indeed pause execution and takes more time - I believe I swapped false/true value in my previous testing.
I also checked what would happen if I locate temp folder on another drive (BTW, Windows version uses TMP environment variable and ignores TEMP/TMPDIR). And, obviously, I tested lowest compression, no compression at all and no encryption in addition to no compression.
The picture from the table below is relatively clear - yes, TI can reach CY performance… if you disable compression… and possibly encryption - all probably depend on your CPU.
But if you browse through performance graphs I include in the linked document, you will realize that actual bottleneck is in the less efficient use of CPU, including calculating hash…
I briefly looked at the results of google:".net sha256 efficiency" search and found some interesting read there.

Anyway, I still want to see how large files behave, but this preliminary result here would help me to restrict and better manage larger data set testing…

Time Parameters
Run 1 Run 2 Run 3 Duplicati
0:09:07 0:08:48 0:08:51 --synchronous-upload="true"
0:07:11 0:07:47 0:07:22 --synchronous-upload="false"
0:06:20 0:06:36 0:06:21 --synchronous-upload="false"
Set TMP=D:\Temp (separate drive)
0:04:27 0:04:13 0:04:12 --synchronous-upload="false"--zip-compression-level=1
Set TMP=D:\Temp
0:03:31 0:03:35 0:03:37 --synchronous-upload="false"--zip-compression-level=0
Set TMP=D:\Temp
0:03:01 0:03:09 0:03:10 --synchronous-upload="false"--zip-compression-level=0 --no-encryption="true"
Set TMP=D:\Temp
Duplicacy
0:03:09 0:03:09 0:03:10 -threads 1
3 Likes

Sorry if this sounds obvious to some, but do I read these results correctly when I say: It’s the compression, stupid? (Although duplicacy still has an edge unless you turn off encryption too… )

That is really great work! And I am well aware that such setups take a long time to get up, and report on.

That is some comfort to me :slight_smile: That means we need to tweak things to get there, and re-consider the default settings.

There is an issue here that says that the default hashing implementation is inefficient, and that we should use another:

It should be trivial to switch over:

It does nothing on Mono, but no ill effects:

Yes, it appears that compression is some of the problem. From what I read, CY uses LZ4, which is considered “fast”:

I can add LZ4 support to Zip:

But unfortunately, writing LZ4 streams inside a zip archive will make it incompatible with most Zip tools.

I will make a versions that uses SHA256Cng, to see if that takes the edge of the CPU usage.

I also have the concurrent_processing branch, which I have been working on for improving error handling. With that branch, it should be possible to stream data as a fast as the disk can provide it, without any of the “pauses” you see.

We can also consider storing the zip file in memory, instead of writing it to disk (--write-volumes-to-disk ? ).

2 Likes

I have put up a new version (2.0.2.8) which now uses SHA256Cng as the hashing implementation: Releases · duplicati/duplicati · GitHub

1 Like

Never mind, I figured it out - I had the terminology backwards in my head :frowning_face:

original stupid question for posterity

I’m confused - is synchronous-upload disabled by default (i.e. if not using the advanced option setting at all)? Is there any reason?

Thank you, glad I can help here.

Oh, I see, glad it was discovered already. What is interesting, I see that .Net Core 2.0 have better managed implementation as well - wonder if this can be backported…
Performance Improvements in .NET Core

I got your latest build and tested it - results are below. It clearly shows very noticeable performance improvement, however there is still room to grow :slight_smile:
Implementing multithreaded processing and upload/download should help to offload those tasks from cores which are busy with hashing, compression and encryption - overall, backup programs should be I/O bound on sufficiently fast current CPU. But then you need to take care of keeping blocks in memory instead of trashing disk - but that is more complex because memory can grow significantly depending on parameters used.

2.0.2.6 2.0.2.8 Improve Parameters
0:08:55 0:08:19 6.66% –synchronous-upload=“true”
0:07:27 0:07:03 5.39% –synchronous-upload=“false”
0:06:26 0:06:02 5.98% –synchronous-upload=“false”
Set TMP=D:\Temp
0:04:17 0:03:47 11.78% –synchronous-upload=“false” --zip-compression-level=1
Set TMP=D:\Temp
0:03:34 0:02:50 20.87% –synchronous-upload=“false” --zip-compression-level=0
Set TMP=D:\Temp
0:03:07 0:02:42 13.39% –synchronous-upload=“false” --zip-compression-level=0 --no-encryption=“true”
Set TMP=D:\Temp

CPU was still pegged, but seems to be used more efficiently.

Looking forward to see all three major math points (hash, compression and encryption) optimized :slight_smile:
For now, I’ll use --zip-compression-level=1 for test on larger source set.

The optimization essentially fixes the problem that SHA256.Create() returns the slow implementation in .Net standard profile (which Duplicati uses). The fix I made was to change calls to HashAlgorithm.Create("SHA256") into calling my own method that returns the faster implementation.

The change has impact on Windows only, but I can see that there is supposed to be a faster OpenSSL based version for Linux, which I might be able to load also.

Thank you for clarification, @kenkendk
My tests against 15Gb source are finally running, hope to have results tomorrow.

In a mean time - since sha256 is improved, we the other two - encryption and compression may need a look.
Although disabling encryption when no compression is used does not show a lot of improvement anyway…
But what about zip? Have you thought about different implementations? Since it looks like compression is designed as module, it should be possible to add different implementation without removing the original. From my quick research it looks like DotNetZip beats SharpZipLib and ZipArchive can be even faster… Moreover, DotNetZip can do ParallelDeflateOutputStream…

I am asking, because I put together small table comparing the resulting backup/compression sizes and even that you mention CY uses “fast” compression, it is on the same level as TI default and uses much less CPU:

Test 1 size Bytes
InfoZip default 998,508,177
Default compression 1,005,213,845
Duplicacy 1,005,435,292
–zip-compression-level=1 1,054,828,415
–zip-compression-level=0 2,008,793,923
Source 2,026,607,926

I finally got the results together for the second test - larger set of big files.

Source included 5 files of total size a bit over 15Gb. Some files were compressible, so that default zip compression would end up with approx. 10Gb archive.

So far I rested backup timing only. I plan to check restore speed as well (when I have time).
Results are not bad, but still a bit disappointing. Full details also available in the Google spreadsheet I shared earlier, but below you can find all the numbers as well.

Backup Time Destination Parameters
Run 1 Run 2 Run 3 Size Files Folders Duplicati 2.0.2.6
1:12:40 1:17:38 1:11:14 10,968,733,985 421 1 Deafault compression
0:44:34 0:41:00 0:41:38 11,153,579,823 427 1 –zip-compression-level=1
0:56:30 0:51:12 0:49:18 15,801,175,593 605 1 –zip-compression-level=0
0:31:01 0:30:44 0:29:58 15,800,984,736 605 1 –zip-compression-level=0
–no-encryption=“true”
Duplicacy 2.0.9
0:27:27 0:27:23 0:27:24 10,931,372,608 2732 2931 -threads 1

Details on the source:

Size File name
5,653,628,928 en_windows_server_2016_x64_dvd_9327751.iso
1,572,864,000 DVD5.part1.rar
1,572,864,000 DVD9.part1.rar
5,082,054,656 disk1.vmdk
1,930,806,927 dotnetapp.exe.1788.dmp

Compressability

Size Compression
10,788,967,173 InfoZip default
10,968,733,985 Default compression
10,931,372,608 Duplicacy
11,153,579,823 –zip-compression-level=1
15,801,175,593 –zip-compression-level=0
15,812,218,511 Source

The results are not too bad, but still a bit disappointing - CY is able to maintain same or better compression/deduplication as the TI defaults but showing noticeable performance difference.
Although TI can get closer to those times in expense of space efficiency,
I will also have to note, that CY can perform even better with multithreaded upload.

2 Likes

Yes, compression is a module, so it is fairly simple to just add in another compression library.

But, unlike for CY, TI needs to store multiple files, so it needs a container/archive format in addition to a compression algorithm.

The zip archive format is quite nice, in that you can compress the individual files with different algorithms. Not sure SharpCompress supports injecting a custom stream at this point, but I can probably mingle in something. If we do this, it is trivial to compress blocks in parallel and only sync them for writing to the archive.

The reason for CY being faster is because it uses LZ4, which is not a container but a compression algorithm. As I wrote earlier, writing LZ4 streams into a zip archive is possible, but it will not be possible to open such an archive with normal zip tools.

There is some limitation to what TI can do with this, as it uploads (larger) volumes. But it is possible to do chunked uploads (to B2, S3 and others), and here chunks can be uploaded in parallel.

I have looked at the design, and I think I can adapt the concurrent_processing branch to support parallel uploads of volumes. It is tricky to get right, because a random upload can fail, and the algorithm needs to work correctly no matter which uploads fail. But with the new design, it is simpler to track the state provided by each volume, and thus also to roll back such changes.

I took a stab at this, and wrote a new library for it:

The library picks the best hashing implementation on the platform (CNG on Windows, AppleCC on MacOS, OpenSSL on Linux) and provides that instead of the default managed version. Performance measurements show 10x improvement on MacOS and 6x improvement on Linux.

Unfortunately, there is something that triggers an error when I use this in Duplicati:
https://travis-ci.org/duplicati/duplicati/builds/278801004

I can reproduce the error on my machine, but I have not yet figured out what causes it.

1 Like

you are given good information about Backup source support comparison…
thanks a lot…

Regard
AMAAN

1 Like

Hint: If you refer to some other post on this forum, it would be great if you could provide a link to that post.

The link will automatically be visible from both sides (i.e. there will also be a link back from the post you’re linking to). Those links will make it much easier for people to navigate the forum and find relevant information.

So far in this discussion, the focus has been on speed and it looks like we can expect duplicati to eventually catch up with duplicacy in single-threaded uploads and perhaps even in multi-threaded uploads. Good.

Another conclusion I draw from the above discussion is that, compared to duplicacy, duplicati saves a lot of remote storage space (which usually costs money). Also good.

But there is another killer feature on the duplicacy side: cross-source deduplication. Duplicacy can use the same backup archive for backup jobs on different machines. Depending on the scenario, this may well make up for duplicati’s more efficient storage use.

I seem to remember someone (@JonMikelV?) saying that this could become possible in duplicati too. Is that correct and if so, is it on the road map?

I think databases use is the weak point of Duplicati. It’s a very “sophisticated” approach, and though elegant, it creates several points of failure.

When everything is ok, there is the problem of the occupied space (several hundred megabytes).

When a problem occurs (database corrupted, eg), it is a nightmare. Several hours (sometimes days) to rebuild the database from the backup. Exactly the time you don’t have when needs a restore.

Ok, just backup the databases after each backup job with another tool, but it’s strange to back up the backup databases (?!).:roll_eyes:

2 Likes

If it was me that was probably in my early days so I may not have correctly understood how stuff worked. :blush:

While in theory cross source deduplication could happen, it would require a huge rewrite so multiple backups could share the same destination.

For that to happen a destination log / database of some sort to handle block tracking across the multiple sources would need to be created.

For example, if sources A and B both have the same file (or at least a matching block) and it’s debate deleted from source A something has to stop source A backup from deleting the still-in-use-at-source-B block.

Similarly, if you set up two jobs sharing the same destination but with different retention schedules something needs to keep stuff from being deleted h until it’s flagged as deletable in all backup jobs.

1 Like