I saw in another post that using 7z instead of zip is unsafe. Is this still the case? or is 7z now safe to use? I would like to switch to using it doesn’t have any issues.
7z results in a much smaller archive, and more importantly, can be multi-threaded (which zip can not).
Additionally, is there any plans to introduce more common archive formats (for unix sytems) like gz and bz2 (both of which can also be multi-threaded and are VERY efficient and should be very well supported across platforms).
BTW, right now, I’m using the beta release channel, is multi-threading only available on the canary tree? And is there going to be an update TO the canary or beta tree (the last release for both is in April, and looking at the git repository, there have been patches that fix issues since then).
It has a number of issues. We need to rewrite the 7z module to use the latest LZMA2 code, but no one has prioritized doing so.
In our tests we found that 7z gives a minor space saving (~10%) but takes much longer to compress. With the latest canary release, Duplicati is able to run multiple zip compressions in parallel, giving many of the benefits from 7z.
The multi-threading for LZMA2 is actually the main problem with the 7z module. It works with very large buffers, making it hard to know in advance when the buffer is filled, causing Duplicati to be cautious and ending up creating many small volumes.
AFAIK, gz and bz2 are just compression algorithms, they do not have a file format, relying often on tar. You can get some of the same by choosing another compression algorithm for the zip module (--zip-compression-method=bzip2).
Yes, multi-threading is still being tested to make sure it does not trip up.
Yes! I had some pressing work so I had to leave Duplicati behind for a while. I have fixed the performance issues today, and hope to get a new canary out very soon.
The beta will be updated once we are confident that the canary changes do not cause issues.
We can use external tools like that, but for compression it is a bit problematic. Duplicati needs to get feedback on how large a file currently is, so we can make sure that the files do not grow over the requested volume size.
Normally, the commandline tools take an input and produce a compressed output file. We need to keep updating the output until we reach the file size.
If something like this is possible with an external binary, we can use it. Otherwise it will give sub-optimal results.
I know that xz files can concatenated together, though that would not be programmatically beautiful. I don’t know, but the xz libraries may very well support the function you’re referencing as well.
Here is another idea. Would it be the end of the world if the requested volume size is used pre-compression, and then just compressed afterward? It would seem much simpler. Is there any design requirement that each volume size is exactly the same size?
It would significantly increase the number of remote files, as the limit would be hit much sooner than it actually should. Not the end of the world, but not a great solution either.