As a non-coder, I guess this shouldn’t be too complicated to do. Maybe it will be a bit easier to use only this part of the sequence:
“1/1/1”, “1/1/2”, “1/1/3”, “1/2/1”, “1/2/2”, “1/2/3”, “1/3/1”, “1/3/2”, “1/3/3”
So folder 1 only contains subfolders “1”, “2” and “3” (no files). I suppose this will make the algorithm a bit easier to code.
However, I see a few issues using this folder structure:
- There is no relation between the filenames and the location where the files are stored. The only way to search for a file is listing the contents of each folder and check if it’s there.
- To avoid extensive folder list operations, an extension of the local db structure is required. Personally I think storing information in the local databse should be avoided where possible: currently most issues in Duplicati are related to the local db, every addition of new fields could make things worse.
- If a new file has to be uploaded, the only way to calculate the location is using the total number of files at the backend. However, when applying a retention policy or a compact operation, an unknown number of files (DBLOCK and DINDEX) will be deleted, decreasing the total number of files. When using the new total filecount for calculating the subfolder for the next upload, probably a subfolder is chosen that already contains the maximum number of uploaded files. A workaround should be implemented to avoid this, I have no idea how to do this, other than store folder information (number of files for every folder) in the local db. I’m not sure if this is recommendable (see previous item).
- DBLOCK and DINDEX both have a random filename. We cannot be sure that a DBLOCK/DINDEX pair are stored in the same folder. So for each file (both DBLOCK and DINDEX), folder location should be stored somewhere. However they belong together, the location of one file cannot be calculated using the filename/location of the other file.
I still prefer a mechanism where the location can be calculated using the filename. No extra information needs to be stored in the database, and theoretically migration back and forth between using subfolders or not should be possible (making it possible to restore using a third party tool to do a restore).
Because filenames are completely random, we have to cheat a bit when generating the filename for a new file to be uploaded. The best way I can think of, is calculating a few characters of the filename to match a particular checksum. That checksum can be used as folder name (see “Approach 3” in the first post).