FTP delete not atomic / verified / transactional (?)

Sami_Lehtinen · July 5, 2023, 10:49am

I really love this thread about stopnow() (and related questions & issues) - Some of my problems with Duplicati are for sure caused by hard kill. But to be honest, the only case where I see it creating real problems is hard kill while compacting. I’ve got so much data, that I’m not actually worried about the other hard kill / stopnow problems. It seems that those are at least somewhat adequately cleaned up. Unless that’s exactly what’s creating the junk, causing the compaction to fail later. - Keep up the great work. Now I’m seeing deep dive into the topics I’ve been seeing for years! - And to be honest, I’ve made no effort whatsoever to run Duplicati in a timeframe where it couldn’t be stopped (hardkilled) by something else. That’s where journal rollback should happen, and state should be restored on next start. And as far as I know, it works if the kill happens during backup.

To my statistical exprience, it’s not nearly as bad as it could be. And it doesn’t require much, to fix the last (?) major issue with data corruption. It could be just two lines in wrong order or one line missing, etc. That’s my experience with transaction handling, only one thing wrong is all that it takes. Didn’t someone already catch deleting remote files operation that wasn’t written into the journal? → Making proper recovery after stop automatically impossible.

These are just me opinions and experiences, I’ve got a lot of small to medium duplicati backup sets which are updated several times a day, and restore tested regularly.

If possible signals sigint / sigterm should of course make stop as soon as cleanly possible. But also recovery from hardkill / sigkill / usb-drive / system power cable disconnected abruptly should work.

Also, about not optimizing the restore. I’m quite happy with restore speeds, as long as, the backup isn’t corrupted and it doesn’t try to “recover blocks” or whatever it was called, which then is insanely slow. I’m not sure if all people compalining about restore speed, have recognized the difference between these issues. There are several tiers, restore, rebuild database and finally the block recovery attempt. I’m also happy with the rebuild database speed, as long as the recovery step won’t happen.

And if we want to go to very minor issues. It seems that the FTP leaves the last keepalived connection to idle until it times out. Proper open, use, close cycle isn’t followed with FTP.

To sum it up, thank you guys! And I’m really happy to see Kenneth @kenkendk commenting things around here.

Anyway, situation seems better. And I have to say, that the title of this thread is totally misleading. Because I initially thought that the problems would have been caused by bad FTP library, which wouldn’t return correct state information back to process calling it.

Most importantly, if there’s will to do fixes, we’re not that far off from reliable operation.

When I said this, I’ll run full restore tests again, starting right now.