Very large backup to B2 failing with "GetResponse Timed Out"

kenkendk · October 31, 2017, 12:23pm

Good point. IIRC, the limit is 5%, meaning that the files must be 5% of the 300MB (= 15MB) or less to be considered as “small”.

No, it would not “fix” the backup, as the timeout still happens. I was thinking more in the “if I start a new backup” scenario.

That is very difficult to do, due to variances in network, server, etc.

Yes, there is some WiP on using pagination which would allow a more exact report on how many files are found, before they are all loaded.

Yes, splitting into subfolders would also mitigate the problem, at the expense of always having to do multiple calls on small backups.

Now that I consider the sub-folder idea (for the millionth time), I am thinking that we could do it more dynamic. Until we have reach some file count limit (e.g. 1000), all files are stored in the folder as normal. Once we go above that limit, new files are put into subfolders.

This makes the listing/query system slightly more complicated but also caters to both having flat lists for small backups, and a hierarchy for larger backups.

The algorithm can scale a lot by gradually taking a bit of the prefix and introducing sub-sub folders. We can also make the initial verification faster by only checking a (random) subfolder.

If we store the relative path to the remote volume, we do not need to do any guessing as to what folder to look in when we need the file.