I’m seeking to update my remote server quota dynamically without using the ‘–quota-size’ parameter, as it doesn’t meet my requirements. When Duplicati prepares to send backup files (zip-compressed and encrypted) to the remote server, is there a method to determine the total size of these files before transmission? I understand Duplicati employs parallelism and multi-threading (at some extent?), but I’m curious if there’s a stage where the zipping and encryption processes conclude, enabling me to retrieve the total data size and simultaneously check the remote server’s quota to ascertain if there’s adequate space for the files.
First, please see Channel Pipeline. It’s not like eventual total size is known before transmission begins.
I suppose that would be after all the new dblock/dindex files are prepared and dlist for version is made, however is that too late? There would have been uploads the whole time unless you somehow prevent.
This is reminding me of an issue I saw where some users want to backup to a big load of Blu-ray discs, posing a variety of problems, but the need for frequent quota checks might be one of them, unless files stage somewhere else first (e.g. a local drive), and can then be leisurely put on discs later, until they fill, whereupon knowing where files got put is the next problem, and after that comes a lot of disc changing.
If you think there’s any hope for solving your use case, could you please describe more what you need?
While awaiting clarification, I’ll link to possible similar issues that need to be careful about quota.
Thanks for the quick respone!
I did a lot of reading about how duplicati works, given the resources you provided and I thought of a way to solve my problem.
Having read this:
I understand that only one process is responsible to get all the files wich are ready for upload and queue them to be uploaded simultaneously. Correct me if I have any misconceptions.
Wouldn’t be possible to check the quota size before queueing a file to be uploaded and update the quota size for the next file to check? No need for any synchronization as the task is performed by one process.
As of the operations that are needed to be done if the quota is exceeded is a problem for later.
Duplicati does not upload ‘files’ - not as in ‘files that are found on your computer’. It upload blocks found in the files found on your computer. The difference: if one new file of 200 MB is created on your computer, and 150 MB of this file are included in (other) files already backed up, Duplicati will not bother with this data, it will only upload 50 MB of data. While searching the files on your computer, it is already (in separate threads) testing for this deduplication of data and building temporary files holding data from possibly several of your files. One of these temporary file, that will be uploaded to the remote backend to become a remote ‘data file’, can hold data of several of your local files. All of this happens while Duplicati continues to search directories for new files.
So what you are trying to do is very complex. In short, do not waste your time if what you want is to get this into Duplicati in the next year. If you want to build your own Duplicati version, feel free to do what you want with it, it’s totally allowed by the license.
So if I understand correctly(given the quote below too), multiple blocks from the files that changed since the last backup are merged to temporary file which is uploaded.
Would it be catastrophical for the backup if one temporary file wasnt sent to the destination? A localdatabase error probably when I backup again or smth else.
I understand the clarifications you made but even with that would my modification work or would I be facing more problems like backup corruption.
I am just experimenting with my own Duplicati version, thanks for the advice.
yes it looks similar to my understanding of Duplicati.
If ‘catastrophical’ mean completely destroyed, usually not as Duplicati in its current state* has pretty good capability for repairing itself. If you mean ‘catastrophical’ by needing repair, then yes. It is never good to have an upload fail completely (retries are not failures).
(*) current state means ‘Duplicati as it works currently’. I am making no assumptions on any custom Duplicati version that you can build yourself.
I hope that you understand that with an open source project, you have pretty limited hopes of support in case of data disaster. With specific software that you build yourself, you have even less than that.
I’m still not sure what this means, but I’ll point to one quota enhancement that’s already a pull request.
There’s other quota discussion in the references it linked.
I’m not sure which quota you’re talking about. If you mean TotalQuotaSpace or FreeQuotaSpace, per report that an IQuotaEnabledBackend makes, that’s not constantly read, but if it’s there’s it’s taken as accurate (even if it’s not, or not as one might want). If you mean read once, then update per uploads, chances of exceeding quota may be reduced compared to the currently infrequently measured quota.
I’m not sure what this means. Its one process has lots of threads that sometimes have to synchronize.
Although there are issues that an SSH (SFTP) backend hangs when full (and also doesn’t do quota), typical backends are supposed to just fail the backup cleanly, and state will be recorded in database. Correct the space problem, backup again, and it ideally should just backup what it couldn’t do before.
Sorry for not clarifying.
I want to dynamically get the size remaining of my remote server (I will do it manually by having a file that writes the disk size left of the server----Duplicati will read this file to get this size).
–quota-size is static
I am thinking an implementation which I will add a new variable -not sure where yet as I am brainstorming.
file_to_be_uploaded = get_file_from_ready_to_upload_queue)
if(size_left is not defined)
size_left = get_KB_left_from_server()
If (size(file_to_be_uploaded) > size_left)
Dont upload the file and stop the backup
Push the file to the upload queue
I thought a process represents one thread(Only one instance runs the code block that takes the files ready to be uploaded and inserts them to the queue). That is my understanding of the explanation below.
My thought was to add the psuedocode I provided before to the phase
A `process` caps the number of active uploads, given that only one process-thread does this job to avoid race condition.
Are there any misconceptions to my interprentation?
This might have been written for non-experts who will know the English term “process” but be confused by how the English term “thread” would fit. Formally, one may also speak of a computational “process” without getting into the details of whether it’s actually a thread. An algorithm can be expressed as steps that don’t know or care if they’re the only thread in the process – but in other cases, threads coordinate.
Task Class tells me
Task would be the correct word, but both are finer-grained than
Process would be.
Regardless of terminology, the pseudo-code seems to assume a very rapidly changing space situation, where’s it’s worth constantly checking with a download of a specially produced file before every upload, which would probably not be generally acceptable, but you can certainly roll your own binary, if desired.
What might stand a chance of being included for general use is something using either the destination’s quota or the one supplied by the pull request for destinations that don’t do quota the way one might like.
After the initial check, track space taken based on new files, hope nothing else is also filling, stop nicely which is sometimes easier said than done. The current stop button still needs uploads to finish, to allow tidy stop. You could try allowing enough margin for that, or maybe we rely on a harder stop instead, not even bothering to upload the dlist file. There now seems to be working code to upload list next backup, using synthetic file list which reflects what the interrupted backup managed to do while it had the room.
If your space situation is less dynamic than pseudo-code indicates, and a file download in advance will suffice, then you can consider waiting to see if the pull request gets in, then use a run-script-before, so
can be used to tell Duplicati how you want the quota set up, after your script figures that out from its file.
If your destination is one that doesn’t support a remote quota query, you don’t even need to wait for PR.
If you want to do the ultimate job, a lot of ideas were given, but that’s harder to get done, then accepted.
I am planing on supporting quota “myself”. I plan on having a file in a certain directory of the remote server which stores the numbers of bytes available in the remote server. Duplicati backend will read this file at the start of the backup and store it to a local variable.
I am confused on how this is not a good implementation. Would it be slower? Why? As I see it only one time will the size left be pulled from the server and then update the local variable.
I think we are describing the same thing to be honest, maybe my pseudocode was confusing:)
Gracefully stopping the backup is something that troubled me, but then I thought that we are talking about an edge case. An edge case which, as of my understanding, the repair functionality can recover from, if something is corrupted.
Could you provide further details on this?
If you want to write your own backend implementation, you can just include the quota check in the
Put functionality and throw an error if it is full. This would be equivalent to a cloud service telling the client that no space is left.
The current quota implementation doesn’t abort backups at all, it only produces warnings and error messages. Since this is not what you want, I think a local change in your specific backend will be easier. You can still provide the quota property for the advance warnings.
I want to check if the quota is exceeded while uploading files.
Currently if quota is exceeded while uploadng files it stucks waiting for the upload to finish and never ends.
You don’t necessarily get that choice until cited PR is in, if your destination supports remote quota:
--quota-size (Size): A reported maximum storage
This value can be used to set a known upper limit on the amount of space a
backend has. If the backend reports the size itself, this value is
but of course if last sentence bothers you but you’re creating custom code, you can code it away.
That will not be needed if destination does not support remote quota, or if your quota gets hit first.
So another remote URL of some sort. I’m not sure if it’s better or worse to use the primary folder.
Duplicati is file name sensitive, and can ignore files that don’t start with the right prefix, if it helps.
Possibly rather than having a full URL, primary could be modified, but beware of varying formats.
I misread the pseudo code but I think it may have an error. Looking at
size_left, I see initialization inside the upload decision loop (why in loop?), I see comparison, but I don’t see value ever updated.
Because of size_left read inside the loop, and no update, I misread it as repeated remote query, like
There was no update shown, but this plan takes us to the plan I mentioned of a theoretical size left, assuming no other destination changes. I think it’s still more sophisticated than the current checks.
Possibly the same, or similar, and you can see that I got confused (partly from looking at it too fast).
I didn’t know if it was rare edge case or regular work in the specific use case, and I linked to several.
For a little more, search forum or GitHub issues or do web search.
For a lot,
UploadSyntheticFilelist in GitHub search in top bar.
Killing backups is possible, then you can look as much as desired.
That might be easier than studying the code, or you could do both.
Looking at the code for log messages would be another view level.
You can then try various interruptions and see what messages are.
Is this by any chance SFTP? This unfortunate hang sounds familiar.
I tried to repro this and did not (client Linux / server Linux, both Ubuntu with default ssh/sftp server, filled to the brim by hand).
I got a failed transfer. It was not the ssh timeout (I set it to 5 minutes and it failed well before that). Also, I have set retries to 0 so no long retries, so it failed quite fast, less than one minute. Thanks to the new code, the job failure is reported, the detailed reporting is not very informative though:
[Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
SshConnectionException: An established connection was aborted by the server.
I tried to update SSHNET to the last version and sadly the error reporting is no better. It is probably a SSH problem though; trying to do a manual put with Linux sftp gives:
Couldn’t write to remote file “/fichiers/myfile.txt”: Failure
Not informative at all.
The server (SSH) is at fault. It returns status 4 (Failure) and not 14 (NO_SPACE_ON_FILESYSTEM)
SSHNet can handle the error code but it’s not reported by the server correctly.
Edit: looking at sftp.h in the openssh repo (for example GitHub - openssh/openssh-portable: Portable OpenSSH, but in the openbsd source it’s the same), the error ‘disk full’ does NOT exist. It’s not an Ubuntu old version problem, it’s the original developers who did not see any interest in reporting this kind of problem.
Let’s see: filezillaserver does not support sftp. Proftpd does and uses this 14 error code. Too late today to test if it works correctly with Duplicati, but Proftpd seems to live up to its ‘Pro’ in the name at least.
Thanks for the pursuit. I didn’t point to issues before because there was no value add to them.
Because there’s now a little more to at least one non-repro case, I’ll point to another you tried:
So how are these people getting their transfer stuck, I wonder? I don’t think we got server info.
@Kanellaman can you say something about your server? Maybe we can investigate that way.
Thanks for the insight, I have read it in many threads already. I am having a different directory for this file. I already have maniulated the paths with a little coding.
My pseudocode was poorly written indeed and I am sorry about that,it was supposed to give a general idea of my thoughts.
Yeah for sure
Yeah it is:)
My sftp server is Debian GNU/Linux 10 (buster), it uses openssh. I can give you more info about the server if you need anything specific (sshd_config etc).
The github issue you provided seems to sum up what I have encoutered. I should mention that in my case I use quota to set disk limit for each user in my server, if that matters. It seems to hang indefinetely too so I guess the reporting of “no more space” is wrognly reported. Is there a way to check how it is reported? (EDIT) Or even change the way it is reported?
Not my observation. Can you try to repro it with setting number-of-retries=0 ?