Zip-compression-zip64 clarification

kenkendk · April 18, 2018, 7:26pm

Actually, only limitiation (1) applies for the use in Duplicati.

Since Duplicati stores blocks, it can handle appx. --dblock-size=4GiB , and as mentioned elsewhere it does not matter what source files sizes are, as they are split into blocks.

Limitations (2) and (3) are handled automatically, because the zip library automatically detects that one of the limits are exceeded and writes a zip64 header (at the end of the file). In other words, the zip archive is automatically upgraded to zip64 if required.

Unfortunately this cannot be done for the individual streams (aka files in the archive) because the header is written before the content, thus we do not know in advance if the stream will need to store larger values. If the stream is bigger than 4GiB, we need to go back and expand the stream header, essentially “moving” 4GiB of data (expensive operation). The zip library now throws an exception if the limit is exceeded (old versions just overflowed the counter, writing broken zip archives!).

We could handle the this by catching the exception, enabling the zip64 flag, and retrying the operation. Would take some time to do the retry, but is not difficult to implement if we want it.

I don’t think so. The list of files is streamed into the archive to avoid storing it all in memory. We would need to build the entire file in memory (I guess 4GiB+ in your case), look at the size, and then activate zip64.