allow-missing-source=False not working when symlink-policy=Folllow?

miseigman · January 2, 2023, 9:58pm

I have a backup job with a source data set that includes a couple of symlinks (junction points) to an external drive.

I have these advanced options configured (among others):

symlink-policy=Folllow and
allow-missing-source=False (that is, the option is not configured).

The other day that external drive was missing and Duplicati happily deleted all that drive’s data from my backup, about 200 GB.

The manual says:
allow-missing-source
--allow-missing-source = false
Use this option to continue even if some source entries are missing.

I interpret this as:
if not configured (so False), the backup will be aborted if one or more source entries are missing

Is it possible that allow-missing-source=False is not properly respected when symlink-policy=Folllow?

Thanks.

gpatel-fr · January 3, 2023, 8:39am

Hello

If you want to handle missing removable media gracefully, you could mount them automatically, for example have for Duplicati source /media/myuser/MYUSB. When the USB removable media is missing, /media/myuser/MYUSB will not exist and the backup will be aborted.

miseigman · January 3, 2023, 2:27pm

That sound like Linux. I should have said that I’m on Windows. The symlinks to the external drive have the drive letter of the external drive, D:, as their target. For example: D:\OLD

The external drive is normally mounted but could also be missing for any number of reasons.

gpatel-fr · January 3, 2023, 3:00pm

the D: letter don’t exist when the drive is missing, so if you make d:\old a source it will be missing, and the backup will abort. That may be the reason ‘store’ is the default policy for symlinks.

miseigman · January 3, 2023, 3:25pm

I think you’re suggesting I should work around a (probable) bug in Duplicati by removing from my backup source all symlinks to the external drive, and instead include the external drive directories directly in my backup source?

I guess I could do that. But it would make my backup source description more complex, brittle, and likely miss new future directories that are added.

miseigman · January 3, 2023, 3:36pm

I was wrong: it looks like the workaround is not brittle, and is quite doable. I’d still prefer the non-workaround config, but I could live with the workaround.

Thanks for the suggestion.

gpatel-fr · January 3, 2023, 3:48pm

I think it would not be so easy to distinguish between directories that have disappeared because the underlying remote drive has disconnected and directories the user has deleted.

ts678 · January 3, 2023, 4:28pm

Please clarify. How many backup versions do you keep? How do older versions look for that drive? Duplicati won’t delete drive’s data from the backup until no version of the backup includes the data.
You’re taking a snapshot on each backup though, so don’t expect missing drive files in that restore.

As a side note (subject to testing by someone), I think source entries means the source path list on

The BACKUP command

Duplicati.CommandLine.exe backup <storage-URL> "<source-path>" [<options>]
Multiple source paths can be specified if they are separated by a space.

and in GUI it would be the paths listed at the bottom of the Source selection window under the tree.

You can uncheck if you change your mind and want to remove a path from source data path entries.
This will only affect future backups, and if you want path back its files should reconnect to old blocks provided they are still in some kept version. This reduces the upload because the data was still kept.

More evidence of the intention:

	Line  13:         public static string SourceIsMissingError(string foldername) { return LC.L(@"Backup aborted since the source path {0} does not exist.  Please verify that the source path exists, or remove the source path from the backup configuration, or set the allow-missing-source option.", foldername); }
	Line 180:         public static string AllowmissingsourceShort { get { return LC.L(@"Ignore missing source elements"); } }
	Line 181:         public static string AllowmissingsourceLong { get { return LC.L(@"Use this option to continue even if some source entries are missing."); } }

miseigman · January 11, 2023, 5:43am

Sorry for the slow reply.

miseigman:

Duplicati happily deleted all that drive’s data from my backup, about 200 GB.

Please clarify. How many backup versions do you keep? How do older versions look for that drive? Duplicati won’t delete drive’s data from the backup until no version of the backup includes the data.
You’re taking a snapshot on each backup though, so don’t expect missing drive files in that restore.

The folders in question (on the external drive, pointed to by junctions points, about 130 GB) were missing in two snapshots taken while the external drive was disconnected. (The 1st snapshot deleted them, the 2nd snapshot did something else that I’m not sure of.)

The next (3rd) snapshot, taken after I had reconnected the external drive, did contain the folders in question, because they were re-uploaded (130 GB).

I admit that I didn’t understand of the rest of your post.

My vague idea is that Duplicati should be able, when it sees a failure on a symlink, to check to see whether (case 1) the entire referenced drive is missing, as opposed to (case 2) the drive is there but some files on the drive having been deleted.
In case 1, it would skip over the entire drive, with no impact to the snapshot (i.e., not do deletions).
In case 2, it would do exactly what it does today, i.e., delete the files from the snapshot.

ts678 · January 11, 2023, 4:34pm

I proposed they were maybe not re-uploaded, simply reconnected to data still existing in the backup. Depending on your upload speed, not needing to upload prior data again can be a major time saver.

was asking things like what Retention option you set or how many versions Restore menu shows?
Setting this very low might be able to cause re-uploads. Setting it reasonably would fare much better.

You can see exactly how much you uploaded in Complete log in the job log you believe did 150 GB.

Example:

    "BackendStatistics": {
      "RemoteCalls": 12,
      "BytesUploaded": 7415740,
      "BytesDownloaded": 17798535,
      "FilesUploaded": 4,

If not there, how did you measure 150 GB upload? Similar question is how you measured 200 GB at

which could be seen either from destination size comparisons using your own methods, or logs again:

"KnownFileSize": 9240718150,

subtract size after the accident from the size before the accident, and see whether it dropped 200 GB.
Such a drop would suggest you keep too few versions, and genuinely lost data from the older version.
The job log of the suspected drop would also show a Compact phase with a lot of activity for deletion.

This means look for an older version on Restore menu to see if one exists, and if it has the old data. Certainly this would not be as good as if the drive were connected, but it could be last seen version.

There’s a suggestion from post’s double reference to 130 GB that you think making source visible in Restore of latest backup means exactly (or roughly) 130 GB is uploaded. If so, that’s not correct, for reasons I’ve been writing about. Also see Processing similar data in How the backup process works.

From a Features point of view, this is Deduplication. That is also what provides Incremental backups, which is what reduces the storage costs of keeping multiple versions. Still looking for that information because the history post of 1st snapshot was the deletion. Is there a snapshot older than that one? What’s in it? All that drive’s data as last seen? If older version is now gone, consider keeping more…

Vague is sometimes good, because it allows room for a developer to do something similar when/if a developer ever picks up on this. There are a whole lot more “asks” than developer-volunteers now…

Because the forum is not an issue tracker, to increase chances of this staying visible, file an Issue on whatever you are advocating for, along with exact steps that require as little equipment as possible…

Describing a very tiny test case will let you run log-file=<path> log-file-log-level=verbose to see what messages emerge that might give a potential volunteer some hope that they can locate relevant code.

You will also be asked to look for similar requests in the forum and Issues. That can support the “ask”. Personally this does not ring any bells with me as something asked for a lot, thus priority is an issue…
Additionally, there are numerous workarounds, so I’m not at all hopeful anybody will jump immediately.

It would help to describe what paths and drives you are proposing for this handling. The prior clue was:

So should I think that this is trying to avoid the “I could live with the workaround” and aim for original?

When? As explained, there is an early sanity check of paths you request, but issues can be seen later when the folders are actually walked. One issue on Windows is usn-policy, to avoid walk. It must work.
Channel Pipeline attempts to describe how the process works. Your change needs to fit in somewhere.

I would suggest also keeping non-Windows systems in mind. Only Windows has this sort of drive idea, sometimes – you don’t have to mount on drive letter, if I recall correctly. I think you can do mount point. There is also perhaps the potential for relative symlinks, and a different mklink for files versus folders. There are directory junctions as well as directory symbolic links. What are current and proposed result? Ultimately we don’t want any inadvertent breakage by failing to think things through before changing…

This gets back to the unknown layout. If this is a broken symbolic link in a folder on a permanent drive, there’s going to be nothing visible to backup or to do anything else with. What’s supposed to be there?
This sounds like you want a backup version with a mix of new data and data that isn’t actually present, which is getting even deeper and stranger. A backup version is a point-in-time view of what was found. Should something disappear by deletion of any sort, go back to the earlier version that had that data…

miseigman · January 13, 2023, 10:55pm

At the time, my updates were daily and my retention policy was: 15D:1D,9W:1W,10M:1M,48M:3M
I checked the BytesUploaded and the KnownFileSize, and you’re right, that 130 GB was not reuploaded or lost
That 130 GB of data is available from backups done before the big delete.
Duplicati is not the only backup service I have that deletes data (in the current snapshot) when an external drive in the backup set disappears. So I implemented a watchdog process on my computer that immediately kills all of my backup services when the external drive suddenly isn’t reachable.
Given that this feature request is (a) very likely a low-runner request and (b) perhaps infeasible to implement, I’m going to let this drop.

Thanks for your great support throughout.

ts678 · January 13, 2023, 11:47pm

You could possibly use run-script-before exit code to suppress the backup or take some other actions.

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Modules/Builtin/run-script-example.bat#L21-L27


      
          REM - 0: OK, run operation

          REM - 1: OK, don't run operation

          REM - 2: Warning, run operation

          REM - 3: Warning, don't run operation

          REM - 4: Error, run operation

          REM - 5: Error don't run operation

          REM - other: Error don't run operation

Or Windows Drive Letters options can raise an error unless the specified file on drive is actually seen.

Either plan can keep you from getting backup when drive is missing, if that’s the way you want it to go.

miseigman · January 17, 2023, 1:49am

Thanks, great ideas, I’ll look into them.