Hello, I’m trying reuse remote backup storage (not versions) when changing OS (Duplication on Windows to Linux - docker)
After exporting/importing backup job to docker and coping sqlite DB I get this message box when I tried to run a backup:
“The backup contains files that belong to another operating system. Proceeding with a backup would cause the database to contain paths from two different operation systems, which is not supported. To proceed without losing remote data, delete all filesets and make sure the --no-auto-compact option is set, then run the backup again to re-use the existing data on the remote store.”
I thought to myself: ok, no problem - I can lose all backup versions, mainly so that I don’t have to upload all the data again. I tried following the instructions:
Run backup - and I watch how all the data is slowly uploaded to the remote storage again
Can you please tell me what I am doing wrong?
I have the feeling that if there is such an instructive error message, that the process should be carried out even without advanced manipulation (deletion of dlist.zip files), or not?
I’ve found a few threads on this topic, but I feel like they’re more about keeping existing backup versions. And I followed exactly the instructions in the error message, Thank you.
Maybe you’re misreading the activity, but let’s see.
As seen how? Did remote double in size? If not, what rough portion was new?
One way to get a rough idea is if you can sort the files by date, all on a screen.
Another is in the Complete log of the job log for that backup, compare these:
"BytesUploaded" "KnownFileSize"
An upload again would mean upload was up to half the known file size. Was it?
What should happen is that the Linux metadata should differ from the Windows.
This means that a flow of very small metadata blocks will be uploaded, but also
every file will get read through. The file data itself was moved unchanged, right? Processing similar data should notice all blocks are the same, so reuse existing,
but it can’t know that without reading through all files, which looks like a backup.
Yes, last backup on Windows was two days ago and data are now 100% unchanged.
What I’m expecting: Duplicati will rehash (re-read) source files and don’t upload them - I/O speed will be
tens to hundreds of MB/s and one large file will not take hours like when it is uploaded, but in seconds/minutes.
What I’m seeing:
Duplicity maxes out my line upload at 100%, the I/O speed is never higher than the upload speed. Within an hour, he uploaded approx. 20+ GB of data to the destination location where the data was before deleting the versions
Target storage was 199GB in size before the versions were deleted
Target storage is now 229GB in size - after deleting the versions and running the backup. The backup runs for about two to three hours.
So I have a feeling that the procedure I performed is not correct.
I have a copy of the target backups as well as the SQL databases, so I can repeat the process several times.
Duplicati devs are welcome to join in here. One thing you picked up on that the procedure didn’t is allow-full-removal otherwise you wouldn’t have been able to delete all versions (so all dlist files).
There are some other missing words, such as the imported job getting new database name (which possibly wasn’t always the case), thus the need to point it at old database or rename old database,
but if you fell into that one, the new empty database would complain about existing destination files.
One worry about the message idea is whether blocks are actually recyclable from DeletedBlock table which I think is where the deletion procedure would put them. They’re wasted space to get compacted.
Reuse existing deleted blocks to prevent uploading duplicated copies that can cause problems later
I don’t know if the claim in the message worked before, and I’m surprised if nobody else noticed failure.
If current Duplicati doesn’t do it, I don’t know how you’d feel about a test build if one can be provided…
There’s another reuse method you might have seen where the general idea is an empty dummy dlist combined with a DB recreate. That should put blocks in the Block table where backup will notice them. With no dlist full of block needs, it “should” need no dblock files, just the fairly small dindex of them.
Above is just brief summary, so if it’s appealing I can probably find more precise directions to attempt it.
was also a nice brief summary, but now I’m wondering whether it can actually work as message says…
I tried it and they are clearly thrown out the window during the backup. For fun, I tried a copy from the deletedblock to the block table (insert into deletedblock select * from block), emptied the deletedblock, and launched the backup and (well, rather logically) it reused the blocks. Not sure it is correct of course :-), although it’s logical it added another dblock to store the new OS metada, however it seems to show that the ‘procedure’ may refer to an ancient state of Duplicati that no longer applies.
Edit: on second thought, it could be interesting to check in the code if there were any attempt at any point to check if a (new) block read in a file exists in the deletedblock table (it sure tests if it exists in the block table, it’s the basis for deduplication), and if yes, do the move from deletedblock to block automatically.