Handling of Offline files in long-term backups

Scenario:
Backup retention is set to 1yr
exclude-files-attributes is set so that Offline files are excluded

A file (e.g. in OneDrive) is originally online, so it gets backed up.
The file is then set to be offline (cloud-only) to save space on local disk.
There are numerous subsequent backups, all of which exclude the offline file.

  1. After a year, when the first backup expires, will there then be no backup of the file?
    The file has not been “deleted” - it has just been unavailable for backup, but it seems that there is no way to tell duplicati to retain that first backup of the file that was taken when the file was online.
    Is there a way to do this?

  2. During the year, is there a way to tell from the backups which Offline files were skipped or excluded?

I’m not sure you know what offline means in this context. Actually I’m not sure either, but generally it means you can’t get the file because of lack of connection. One can make files available for offline use.

Windows in its right-click menu has Always keep on this device and Free up space (local space).
Save disk space with OneDrive Files On-Demand for Windows has a section explaining how they work.

File Attribute Constants are the internal view most easily seen in Explorer by using Attributes column.
FILE_ATTRIBUTE_OFFLINE is presumably the one you exclude, but look at how Windows defined that…
Some earlier documentation citing hierarchical storage management software mentions Windows 2000.

OneDrive File Attributes Uncovered attempts to reverse-engineer how Files on Demand attributes work. Notice P (FILE_ATTRIBUTE_PINNED) and U (FILE_ATTRIBUTE_UNPINNED), but no use of offline.
The attrib command in attrib /? talks about Pinned and Unpinned, but many online pages do not.

So this is kind of a mess on top of the messes that come from trying to back up simulated filesystems… Duplicati works best with local files. Anything else will be less reliable and slower, but some might insist.

Further advice against that comes from an experiment I just tried which suggests that attribute changes done by Files on Demand (start up Explorer to watch) can trigger downloads because Duplicati sees an attribute change as a reason to save attributes and to read the file through to see if its content changed:

Checking file for changes C:\Users\me\OneDrive\backup source\short.txt, new: False, timestamp changed: False, size changed: False, metadatachanged: True, 9/16/2022 6:10:43 PM vs 9/16/2022 6:10:43 PM
File has only metadata changes C:\Users\me\OneDrive\backup source\short.txt

Status and Attribute column in Explorer uses the following for me when I play with the right-click options:

Available on this device is AL
Always available on this device is ALP
Available when online is ALOUM (OK, it apparently set the offline attribute for that, surprising me)

Your usage of offline as cloud-only is at least consistent with that last finding, although it sounds strange.

Duplicati backs up files that you let it see. If you start to exclude some files, the files are seen as deleted. Older versions will still have the files until version is gone. You can of course always keep older versions. The retention you chose cuts off at a year, but Custom backup retention can thin out versions as age increases, and the usual delete at the end of the last timeframe can be blocked with a U (unlimited) time.

This still seems like a management headache. What about newer files? You can’t set a retention per file. Finding attribute-excluded file headache is probably possible from a log-file at log-file-log-level=verbose:

1 Like

Cloud drive integration (files on-demand) was a poll on handling cloud-only files.
Your preference is to exclude them, which looks like deletes, so cause problems.

Duplicati downloads OneDrive online only files #3411 similarly wanted an ignore.
Disable file attribute masking on Windows 10 #3998 provided that by disabling a
Windows compatibility feature that removed visibility of certain unusual attributes.

Some bits in the file attributes are masked out #8315 is a PowerShell issue, listing attributes including Offline, Pinned, Unpinned.
Expose file attributes of OneDrive placeholders #8745 changed PowerShell to use PHCM_EXPOSE_PLACEHOLDERS which is the same thing that Duplicati did to unmask.

Placeholder files for Windows 10 OneDrive are not documented #1484 asks docs to document the use
RtlSetProcessPlaceholderCompatibilityMode function (ntifs.h) and others cover, although quite vaguely
Some bits in the file attributes are masked out #27976 complained to dotnet, who said it’s PowerShell

So there’s more Files on Demand messiness. It’s unclear to me what the solution to this usage may be. Personally, I like to have backups of my cloud files, because they can be lost to malware or by provider.

1 Like

Thank you - I had found some of that information but not everything. There is much there for further thought.
Downloading offline files as part of (i.e. during) each backup would be helpful. Are there settings to enable that? I’m not seeing them if there are. I see warnings that those files are skipped, not downloaded.

There should be no warnings anywhere. You mean you got a yellow popup? Please describe more.

gives me (as expected from the code lines above) the below for files that are “Available when online”:

Excluding path due to attribute filter: C:\Users\Me\OneDrive\backup source\short.txt

but I had a live log running at Verbose level in order to verify that the attribute exclude ran as desired.

If you mean why was it skipped, it’s because I asked for that. If you want a download, don’t prevent it.

EDIT 1:

I’m confused by the wish. When Windows automatically downloads offline files, they stop being offline.

EDIT 2:

What are the goals anyway? For my goal, which seems different from yours, I’m finding some luck with check-filetime-only which (although it goes too far) avoids constant downloads as file attributes change.
One gets a far simpler check, and the files can probably be in any Files on Demand state that one likes.
This simplified model is probably what Microsoft was trying to achieve with their attribute masking code.