Quota size check to not be exceeded

Kanellaman · January 4, 2024, 1:36pm

I’m seeking to update my remote server quota dynamically without using the ‘–quota-size’ parameter, as it doesn’t meet my requirements. When Duplicati prepares to send backup files (zip-compressed and encrypted) to the remote server, is there a method to determine the total size of these files before transmission? I understand Duplicati employs parallelism and multi-threading (at some extent?), but I’m curious if there’s a stage where the zipping and encryption processes conclude, enabling me to retrieve the total data size and simultaneously check the remote server’s quota to ascertain if there’s adequate space for the files.

ts678 · January 4, 2024, 2:04pm

First, please see Channel Pipeline. It’s not like eventual total size is known before transmission begins.

ts678 · January 5, 2024, 5:01pm

I suppose that would be after all the new dblock/dindex files are prepared and dlist for version is made, however is that too late? There would have been uploads the whole time unless you somehow prevent.

This is reminding me of an issue I saw where some users want to backup to a big load of Blu-ray discs, posing a variety of problems, but the need for frequent quota checks might be one of them, unless files stage somewhere else first (e.g. a local drive), and can then be leisurely put on discs later, until they fill, whereupon knowing where files got put is the next problem, and after that comes a lot of disc changing.

If you think there’s any hope for solving your use case, could you please describe more what you need?

ts678 · January 6, 2024, 1:40pm

While awaiting clarification, I’ll link to possible similar issues that need to be careful about quota.

github.com/duplicati/duplicati

The RAID backend

opened 09:17AM - 05 Aug 14 UTC

kenkendk

enhancement imported

_From [kenneth@hexad.dk](https://code.google.com/u/kenneth@hexad.dk/) on Septemb…er 27, 2011 15:48:04_ The initial motivation for this feature is the fact that you can get free online storage space from multiple providers, but usually only a few gigabytes. To get the amount of space required for a decent photo collection, requires that you pay for it. What if Duplicati had a backend that would use multiple storage providers and thus enable you to pool together all the small free storage options into a single large one? This can be acheived by creating a meta-backend that has no configuration itself, but has a list of other backends and their options, similar to: webdav://user:pass@host1 ftp://user:pass@host2 --use-ssl ... The meta-backend would then create the instances of the real backends, based on configuration, and relay the requests to those instances. The UI side could also be a collection of instances of UIs for the real backends. One problem with this is how to chose what backend to use, and here there is no golden solution. Some users would prefer that each backend has a full copy (more resistant to failures) others would prefer that they were spread out as much as possible (optimal space usage), others that each setup is filled first, then "spill" to the next backend etc. Another problem is how to handle ambiguity, that is: if a file exists once and was supposed to exist twice, do we consider it deleted or do we assume that the file exists and the copy is erronously missing? The same dilemma arises if we find the same file in two different versions, which one is the right one? Stepping back a bit, it is clear that this is very similar to how RAID works, and the different solutions have different RAID names: RAID-0, RAID-1, RAID-N, etc. Appart from prioritizing the targets, RAID algorithms exists for solving these problems. The implementation should be able to handle any configuration of N destinations with R redundant copies (where R <= N). Even if the motivation is utilization of free online storage, this backend could also be used for enterprise backups that require geographically distributed copies. _Original issue: http://code.google.com/p/duplicati/issues/detail?id=479_ ## <bountysource-plugin> --- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/3560186-the-raid-backend?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github). </bountysource-plugin>

github.com/duplicati/duplicati

Backup to Blu-ray

opened 07:52AM - 13 Aug 15 UTC

kelna

enhancement

Blu-rays are **very** cheap, and a very secure way of storing data. Due to the s…ynthetic dye (in contrast of the DVD's organic dye) BDs can keep data for ~100 years (M Disk for even a 1000!). These factors make BDs an ideal medium for backup&archive, although very few actually use them so. More interestingly, there isn't any software available for linux to backup to BD. It would be very much appreciated if a BD option would be added to Duplicati. ## <bountysource-plugin> --- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/25714444-backup-to-blu-ray?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github). </bountysource-plugin>

Kanellaman · January 8, 2024, 7:56am

Thanks for the quick respone!
I did a lot of reading about how duplicati works, given the resources you provided and I thought of a way to solve my problem.
Having read this:

I understand that only one process is responsible to get all the files wich are ready for upload and queue them to be uploaded simultaneously. Correct me if I have any misconceptions.
Wouldn’t be possible to check the quota size before queueing a file to be uploaded and update the quota size for the next file to check? No need for any synchronization as the task is performed by one process.
As of the operations that are needed to be done if the quota is exceeded is a problem for later.

gpatel-fr · January 8, 2024, 9:56am

Duplicati does not upload ‘files’ - not as in ‘files that are found on your computer’. It upload blocks found in the files found on your computer. The difference: if one new file of 200 MB is created on your computer, and 150 MB of this file are included in (other) files already backed up, Duplicati will not bother with this data, it will only upload 50 MB of data. While searching the files on your computer, it is already (in separate threads) testing for this deduplication of data and building temporary files holding data from possibly several of your files. One of these temporary file, that will be uploaded to the remote backend to become a remote ‘data file’, can hold data of several of your local files. All of this happens while Duplicati continues to search directories for new files.
So what you are trying to do is very complex. In short, do not waste your time if what you want is to get this into Duplicati in the next year. If you want to build your own Duplicati version, feel free to do what you want with it, it’s totally allowed by the license.

Kanellaman · January 8, 2024, 11:40am

So if I understand correctly(given the quote below too), multiple blocks from the files that changed since the last backup are merged to temporary file which is uploaded.

Would it be catastrophical for the backup if one temporary file wasnt sent to the destination? A localdatabase error probably when I backup again or smth else.

I understand the clarifications you made but even with that would my modification work or would I be facing more problems like backup corruption.

I am just experimenting with my own Duplicati version, thanks for the advice.

gpatel-fr · January 8, 2024, 12:08pm

yes it looks similar to my understanding of Duplicati.

If ‘catastrophical’ mean completely destroyed, usually not as Duplicati in its current state* has pretty good capability for repairing itself. If you mean ‘catastrophical’ by needing repair, then yes. It is never good to have an upload fail completely (retries are not failures).

(*) current state means ‘Duplicati as it works currently’. I am making no assumptions on any custom Duplicati version that you can build yourself.

I hope that you understand that with an open source project, you have pretty limited hopes of support in case of data disaster. With specific software that you build yourself, you have even less than that.

ts678 · January 8, 2024, 1:00pm

I’m still not sure what this means, but I’ll point to one quota enhancement that’s already a pull request.

github.com/duplicati/duplicati

Add option to disable quota and update quota size option

duplicati:master ← Jojo-1000:quota-disable

opened 06:18PM - 16 Jul 23 UTC

Jojo-1000

+139 -95

Closes #4970 [Forum post](https://forum.duplicati.com/t/backend-quota-exceeded-…i-know-its-been-covered-but/16488?u=jojo-1000) - Add `--quota-disable` option to disable the reported backend quota. Sometimes Mono does not report the quota for the backup path, but rather for a parent directory. - Ignore the File backend quota if the total size is zero - Update quota size option to assign a backup size limit, in addition to the reported backend quota (related: #1301) ## Updated help text ``` --quota-disable = false Disable the quota reported by the backend. --quota-size can still be used to set a manual quota --quota-size Set a limit to the amount of storage used on the backend (by this backup). This is in addition to the full backend quota, if available. Note: Backups will continue past the quota. This only creates warnings and error messages. ``` ## Disabled quota If the quota is disabled, the backend is treated as if it does not support reporting a quota at all. This means that the quota values in the result remain set to zero. The reported quota sometimes needs to be disabled, because Mono sometimes does not recognize the mount point. Then it will report the quota for a parent directory. ## Assigned quota The `--quota-size` option was previously intended to provide a total backend size, if it does not support reporting a quota. However, this never actually worked. Instead, it now sets a limit on the backup size, in addition to the reported backend quota. If the backup size is more than the assigned quota size, the backup completes with an error. If the remaining free space is less than the warning threshold (percentage of backup size), the backup completes with a warning. If both the backend quota and the assigned quota is exceeded, the backend quota takes precedence in warning and error messages.

There’s other quota discussion in the references it linked.

I’m not sure which quota you’re talking about. If you mean TotalQuotaSpace or FreeQuotaSpace, per report that an IQuotaEnabledBackend makes, that’s not constantly read, but if it’s there’s it’s taken as accurate (even if it’s not, or not as one might want). If you mean read once, then update per uploads, chances of exceeding quota may be reduced compared to the currently infrequently measured quota.

https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/Main/Operation/FilelistProcessor.cs

I’m not sure what this means. Its one process has lots of threads that sometimes have to synchronize.

Although there are issues that an SSH (SFTP) backend hangs when full (and also doesn’t do quota), typical backends are supposed to just fail the backup cleanly, and state will be recorded in database. Correct the space problem, backup again, and it ideally should just backup what it couldn’t do before.

Kanellaman · January 8, 2024, 1:56pm

Sorry for not clarifying.
I want to dynamically get the size remaining of my remote server (I will do it manually by having a file that writes the disk size left of the server----Duplicati will read this file to get this size).
–quota-size is static

I am thinking an implementation which I will add a new variable -not sure where yet as I am brainstorming.

....
while(1){
  file_to_be_uploaded = get_file_from_ready_to_upload_queue)
  if(size_left is not defined)
    size_left = get_KB_left_from_server()
  If (size(file_to_be_uploaded) > size_left)
     Dont upload the file and stop the backup
  else
    Push the file to the upload queue
}
......

I thought a process represents one thread(Only one instance runs the code block that takes the files ready to be uploaded and inserts them to the queue). That is my understanding of the explanation below.

My thought was to add the psuedocode I provided before to the phase A `process` caps the number of active uploads, given that only one process-thread does this job to avoid race condition.
Are there any misconceptions to my interprentation?

ts678 · January 8, 2024, 2:34pm

This might have been written for non-experts who will know the English term “process” but be confused by how the English term “thread” would fit. Formally, one may also speak of a computational “process” without getting into the details of whether it’s actually a thread. An algorithm can be expressed as steps that don’t know or care if they’re the only thread in the process – but in other cases, threads coordinate.

github.com

duplicati/duplicati/blob/11c6b4b5de6b70d22bedd1f9153b8160fa0cf122/Duplicati/Library/Main/Operation/BackupHandler.cs#L189-L214


      
          all = Task.WhenAll(

              new[]

                  {

                          Backup.DataBlockProcessor.Run(database, options, taskreader),

                          Backup.FileBlockProcessor.Run(snapshot, options, database, stats, taskreader, token),

                          Backup.StreamBlockSplitter.Run(options, database, taskreader),

                          Backup.FileEnumerationProcess.Run(sources, snapshot, journalService,

                              options.FileAttributeFilter, sourcefilter, filter, options.SymlinkPolicy,

                              options.HardlinkPolicy, options.ExcludeEmptyFolders, options.IgnoreFilenames,

                              options.ChangedFilelist, taskreader, token),

                          Backup.FilePreFilterProcess.Run(snapshot, options, stats, database),

                          Backup.MetadataPreProcess.Run(snapshot, options, database, lastfilesetid, token),

                          Backup.SpillCollectorProcess.Run(options, database, taskreader),

                          Backup.ProgressHandler.Run(result)

                  }

                  // Spawn additional block hashers

                  .Union(

                      Enumerable.Range(0, options.ConcurrencyBlockHashers - 1).Select(x =>

                          Backup.StreamBlockSplitter.Run(options, database, taskreader))

                  )

This file has been truncated. show original

Task Class tells me Task would be the correct word, but both are finer-grained than Process would be.

Regardless of terminology, the pseudo-code seems to assume a very rapidly changing space situation, where’s it’s worth constantly checking with a download of a specially produced file before every upload, which would probably not be generally acceptable, but you can certainly roll your own binary, if desired.

What might stand a chance of being included for general use is something using either the destination’s quota or the one supplied by the pull request for destinations that don’t do quota the way one might like.

After the initial check, track space taken based on new files, hope nothing else is also filling, stop nicely which is sometimes easier said than done. The current stop button still needs uploads to finish, to allow tidy stop. You could try allowing enough margin for that, or maybe we rely on a harder stop instead, not even bothering to upload the dlist file. There now seems to be working code to upload list next backup, using synthetic file list which reflects what the interrupted backup managed to do while it had the room.

EDIT 1:

If your space situation is less dynamic than pseudo-code indicates, and a file download in advance will suffice, then you can consider waiting to see if the pull request gets in, then use a run-script-before, so

github.com

duplicati/duplicati/blob/11c6b4b5de6b70d22bedd1f9153b8160fa0cf122/Duplicati/Library/Modules/Builtin/run-script-example.sh#L63-L65


      
          # All Duplicati options can be changed by the script by writing options to
          # stdout (with echo or similar). Anything not starting with a double dash (--)
          # will be ignored:

can be used to tell Duplicati how you want the quota set up, after your script figures that out from its file.

EDIT 2:

If your destination is one that doesn’t support a remote quota query, you don’t even need to wait for PR.

EDIT 3:

If you want to do the ultimate job, a lot of ideas were given, but that’s harder to get done, then accepted.

Kanellaman · January 9, 2024, 7:43am

I am planing on supporting quota “myself”. I plan on having a file in a certain directory of the remote server which stores the numbers of bytes available in the remote server. Duplicati backend will read this file at the start of the backup and store it to a local variable.

I am confused on how this is not a good implementation. Would it be slower? Why? As I see it only one time will the size left be pulled from the server and then update the local variable.

I think we are describing the same thing to be honest, maybe my pseudocode was confusing:)

Gracefully stopping the backup is something that troubled me, but then I thought that we are talking about an edge case. An edge case which, as of my understanding, the repair functionality can recover from, if something is corrupted.

Could you provide further details on this?

Jojo-1000 · January 9, 2024, 11:15am

If you want to write your own backend implementation, you can just include the quota check in the Put functionality and throw an error if it is full. This would be equivalent to a cloud service telling the client that no space is left.

The current quota implementation doesn’t abort backups at all, it only produces warnings and error messages. Since this is not what you want, I think a local change in your specific backend will be easier. You can still provide the quota property for the advance warnings.

Kanellaman · January 9, 2024, 12:43pm

I want to check if the quota is exceeded while uploading files.
Currently if quota is exceeded while uploadng files it stucks waiting for the upload to finish and never ends.

ts678 · January 9, 2024, 1:09pm

You don’t necessarily get that choice until cited PR is in, if your destination supports remote quota:

  --quota-size (Size): A reported maximum storage
    This value can be used to set a known upper limit on the amount of space a
    backend has. If the backend reports the size itself, this value is
    ignored

but of course if last sentence bothers you but you’re creating custom code, you can code it away.
That will not be needed if destination does not support remote quota, or if your quota gets hit first.

So another remote URL of some sort. I’m not sure if it’s better or worse to use the primary folder.
Duplicati is file name sensitive, and can ignore files that don’t start with the right prefix, if it helps.
Possibly rather than having a full URL, primary could be modified, but beware of varying formats.

I misread the pseudo code but I think it may have an error. Looking at size_left, I see initialization inside the upload decision loop (why in loop?), I see comparison, but I don’t see value ever updated.

Because of size_left read inside the loop, and no update, I misread it as repeated remote query, like

There was no update shown, but this plan takes us to the plan I mentioned of a theoretical size left, assuming no other destination changes. I think it’s still more sophisticated than the current checks.

Possibly the same, or similar, and you can see that I got confused (partly from looking at it too fast).

I didn’t know if it was rare edge case or regular work in the specific use case, and I linked to several.

For a little more, search forum or GitHub issues or do web search.
For a lot, UploadSyntheticFilelist in GitHub search in top bar.

Killing backups is possible, then you can look as much as desired.
That might be easier than studying the code, or you could do both.

Looking at the code for log messages would be another view level.
You can then try various interruptions and see what messages are.

Is this by any chance SFTP? This unfortunate hang sounds familiar.

gpatel-fr · January 9, 2024, 9:16pm

I tried to repro this and did not (client Linux / server Linux, both Ubuntu with default ssh/sftp server, filled to the brim by hand).
I got a failed transfer. It was not the ssh timeout (I set it to 5 minutes and it failed well before that). Also, I have set retries to 0 so no long retries, so it failed quite fast, less than one minute. Thanks to the new code, the job failure is reported, the detailed reporting is not very informative though:

[Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
SshConnectionException: An established connection was aborted by the server.

I tried to update SSHNET to the last version and sadly the error reporting is no better. It is probably a SSH problem though; trying to do a manual put with Linux sftp gives:

Couldn’t write to remote file “/fichiers/myfile.txt”: Failure

Not informative at all.
The server (SSH) is at fault. It returns status 4 (Failure) and not 14 (NO_SPACE_ON_FILESYSTEM)
SSHNet can handle the error code but it’s not reported by the server correctly.

Edit: looking at sftp.h in the openssh repo (for example GitHub - openssh/openssh-portable: Portable OpenSSH, but in the openbsd source it’s the same), the error ‘disk full’ does NOT exist. It’s not an Ubuntu old version problem, it’s the original developers who did not see any interest in reporting this kind of problem.
Let’s see: filezillaserver does not support sftp. Proftpd does and uses this 14 error code. Too late today to test if it works correctly with Duplicati, but Proftpd seems to live up to its ‘Pro’ in the name at least.

ts678 · January 9, 2024, 9:44pm

Thanks for the pursuit. I didn’t point to issues before because there was no value add to them.
Because there’s now a little more to at least one non-repro case, I’ll point to another you tried:

github.com/duplicati/duplicati

Stuck on "Waiting for upload to finish" if remote SSH storage is full

opened 03:19PM - 20 Aug 22 UTC

wonx

- [ x ] I have searched open and closed issues for duplicates. - [ x ] I have s…earched the [forum](https://forum.duplicati.com) for related topics. ---------------------------------------- ## Environment info - **Duplicati version**: 2.0.6.3_beta_2021-06-17 - **Operating system**: Debian Buster - **Backend**: SFTP ## Description I save my backups to a hard drive in a remote location, using ssh (sftp). Until now I have used rsync, but I wanted to try some alternatives. I tried Duplicati today, created a new backup settings trying to backup a folder with less than 100MB containing a few pdf files and setting my SFTP server as the backend. Tested the connection and seemed to work fine. However, I start the backup process, and soon it gets stuck at "Waiting for upload to finish". I even left it overnight and it didn't go away. I could see that duplicati created the backup folder and the backup files, but with 0 byte size. After much examining, I noticed that the backup drive was almost full, with only the small reserved system space left (less than 5%). I tried with another folder in another drive, and the backup completed successfully. Couldn't be a way to detect that the remote destination is full and warn the user, instead of showing "Waiting for upload to finish" indefinitely? ## Steps to reproduce 1. Set Duplicati to save the backup on a SFTP backend 2. Use a backend without enough free space 3. Start backup - **Actual result**: Duplicati shows a warning that the backup couldn't complete due to not enough free space available. - **Expected result**: "Waiting for upload to finish" indefinitely. ## Screenshots ## Debug log

So how are these people getting their transfer stuck, I wonder? I don’t think we got server info.

@Kanellaman can you say something about your server? Maybe we can investigate that way.

Kanellaman · January 10, 2024, 7:33am

Thanks for the insight, I have read it in many threads already. I am having a different directory for this file. I already have maniulated the paths with a little coding.

My pseudocode was poorly written indeed and I am sorry about that,it was supposed to give a general idea of my thoughts.

I agree.

Yeah for sure

Yeah it is:)

My sftp server is Debian GNU/Linux 10 (buster), it uses openssh. I can give you more info about the server if you need anything specific (sshd_config etc).

Kanellaman · January 10, 2024, 7:47am

The github issue you provided seems to sum up what I have encoutered. I should mention that in my case I use quota to set disk limit for each user in my server, if that matters. It seems to hang indefinetely too so I guess the reporting of “no more space” is wrognly reported. Is there a way to check how it is reported? (EDIT) Or even change the way it is reported?

gpatel-fr · January 10, 2024, 7:53am

Not my observation. Can you try to repro it with setting number-of-retries=0 ?