Exclude offline files broken with OneDrive?

What do you mean by “what cloud providers”?

I’m using the standard FTP with SSL to backup to a local NAS.
The only “Cloud Drive” is the 2018 OneDrive Client (currently v18.172.0826.0010) syncing Onedrive, OneDrive for Business and Sharepoint Online with “files on demand” being active.

There’s no specific reason for not updating Duplicati.
I would of course update if there is good reason for doing so, e.g. if I may expect to solve the problem.
Otherwise I’m following kind of a “never change a running system” policy if possible.

Perfectly valid reason. :slight_smile:

I asked about the cloud provider because I believe Microsoft “recently” made some changes to how they stub their files. Updates 2.0.3.7, 2.0.3.8, and 2.0.3.10 all have cloud related updates but I think their targeted at destinations not sources, so are likely not ones that would address what you’re seeing.

2.0.3.5 & 2.0.3.6 had (somewhat major) updates to filters that might fix your issue.

I don’t currently use OneDrive (in my sources) but maybe somebody else who does can chime in and let us know how “offline” filters work for them.

While I didn’t move to one of the latest canary builds with my production backup I tried the latest beta 2.0.3.3_beta_2018-04-02 in the meantime.
Used exclude-files-attributes = Offline,SparseFile,ReparsePoint,IntegrityStream
and symlink-policy = ignore

That didn’t make any difference.

OK, no surprise.

Thanks for testing that.

I’ve moved this to its own topic so hopefully others using OneDrive can chime in.

Just to make sure I’ve got this correct, you are trying to exclude cloud hosted files in the OneDrive “folder” be setting the offline (or any other attribute) exclude type, right?

Yes, I want to exclude the offline files - the ones that are visible in explorer but have not been downloaded by OneDrive.

Updated to the most recent beta 2.0.4.5_beta_2018-11-28 and retested.
Parameters have been unchanged:
exclude-files-attributes = Offline,SparseFile,ReparsePoint,IntegrityStream
symlink-policy = ignore

The new version didn’t make any difference.
The errors still apear as before, giving something like this (partially translated from german):

Failed to process path: <my local file path>
System.IO.IOException: Access to cloud file has been denied
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)
   at Duplicati.Library.Snapshots.SystemIOWindows.FileOpenRead(String path)
   at Duplicati.Library.Main.Operation.BackupHandler.HandleFilesystemEntry(ISnapshotService snapshot, BackendManager backend, String path, FileAttributes attributes)

Cheers
Klaus

Thanks for testing with the new version - sorry to hear it didn’t resolve the issue.

It’s possible OneDrive has changed how it flags not-yet-download files in a way Duplicati doesn’t understand.

I’m going to ping @Pectojin in case he has any thoughts as I believe he knows more about clot implementations than I do.

I’m afraid I’m fairly unexperienced in Windows file attributes :frowning:

@JonMikelV did you get any feedback for this? I tried using the flags and it still downloads/trys to sync everythign

I’ve confirmed this behavior. Did some initial debugging and for some reason Duplicati only thinks the file has the ‘Archive’ attribute when it also definitely has ‘Offline’. Will look into this some more…

Ok, well this is interesting. It seems to be a feature of Windows. It will filter out some attributes so they aren’t visible to certain applications.

See this for more info:

Relevant section:

The good news is it is possible to override this filtering. Here’s a demonstration of the filtering and then overriding it:

Click to expand to make it easier to read…here’s a summary:

  1. First test (line 2) uses PowerShell natively. Value returned is 0x501620

  2. Second test (line 4) uses a custom C# program to call that same GetAttributes method. Value returned is 0x500020. 0x1000 (Offline), 0x400 (Reparse Point), and 0x200 (Sparse File) are filtered out. Duplicati also experiences this same filtering.

  3. Third test (line 7) uses another custom C# program to show current “Placeholder Compatibility Mode” and the file attributes it sees. It then changes the Compatibility mode to PHCM_EXPOSE_PLACEHOLDERS per the MS link above and tests attributes again. You can see the filtering is turned off and the program now sees the Offline, Reparse, and Sparse attributes.

All I think we need to do in Duplicati is implement this same adjustment so it can see the true file attributes.

Pull request submitted that should resolve this problem.

Some bits in the file attributes are masked out #33644 was part of a who-dropped-the-bits chase that wound up in behavior discussions, settings to configure, and consistency issues among various code.

It would be nice if the Offline attribute is enough, because the newer ones from Files On-Demand are undocumented in an official way that I’ve found for program use. Command use is partly documented:

C:\>attrib /?
...
  O   Offline attribute.
...
  P   Pinned attribute.
  U   Unpinned attribute.

Query and set Files On-Demand states reminds me this is also a Mac issue, and it has a command.

OneDrive for Mac Gets Files On-Demand was one article on the announcement. I don’t have a Mac.

File Attribute Constants only shows below.

FILE_ATTRIBUTE_OFFLINE (with rather limited use, not mentining Files on Demand)
FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS
FILE_ATTRIBUTE_RECALL_ON_OPEN

FileAttributes Enum for latest .NET Framework shows none of these newer attributes but has Offline.

How do I get/set OneDrive “Files On Demand” status from PowerShell? tried to understand new bits.
OneDrive File Attributes Uncovered pointed to that and provided a simplified summary of its findings.

Force users Onedrive to free up space with powershell uses these undocumented values + attrib.

I’m not immediately looking at FILE_ATTRIBUTE_REPARSE_POINT because it’s a pretty general thing.
FILE_ATTRIBUTE_SPARSE_FILE also is sort of an older attribute, which may or may not relate to this.

Have they been backed up by Duplicati before? If not, it likely considers them new files, and does so. Your goal (they vary) sounds like you want Duplicati to ignore the folders. Using excludes might work.
Getting something like a backup of what’s local currently might sometimes be preferred, however one question there is what happens if the remote file is updated? I think probably the timestamp changes. This probably won’t cause a download ordinarily – until Duplicati sees the new time and downloads it.

–changed-files with PowerShell or something might allow crafting which files you want to be looked at instead a doing filesystem walk, however if it’s looked at, and new or updated, it likely gets backed up. Here was another thought of using --changed-files, but it seems to be somewhat awkard in actual use.

@drwtsn32 Thanks for the the testing and info. Indeed I think you’re right this is the case, as I checked the direct cmd properties on my machine and they show the offline filter before download, but not after download (although I didn’t have your skills to write a C# surrogate executable). I do believe your fixes could work for the Windows Onedrive syncing functionality

@ts678 Thanks for all the references, that breadth is impressive. Here is the documentation I found for the attrib function https://ss64.com/nt/attrib.html .

If you wanted to delve a bit more into MacOSX side of the problem, I run both machines (so I can do some testing). I ran xattrib on an “offline” file and got the following return. Note: I haven’t tried using duplicati to sync onedrive files on Mac yet.

com.apple.FinderInfo
com.apple.LaunchServices.OpenWith
com.apple.ResourceFork
com.apple.fileutil.PlaceholderData
com.apple.metadata:com_apple_backup_excludeItem
com.microsoft.OneDrive.RecallOnOpen 

As for the files, I’ve never backed up the files before with Duplicati. It started downloading 700GB when I tried to schedule and so I stopped the process. The desired behavior here being that it backups up a direct copy of the harddrive state (i.e. backup up download files and backup up placeholder onedrive files where applicable, just how I imagine a harddrive clone would work). I think @ts678 's solution will work, and will report testing results back once the PR is integrated.

As for filesystem walks and scripts for --changed-files: I think this could be an option for me, however, I’m trying to use Duplicati to backup some colleagues PC’s to a local server, and the low level of technical capability would likely be inhibitory for this route

It works in my limited testing. Would be nice if at least another person or two could try it out. :slight_smile:

The difference in approach is that the clone is a lower-level thing. Duplicati has to use high-level APIs which mostly prefer to do the usual magic (rather than go below it), such as downloading on file open.

Reparse Points and File Operations shows that there might be hope if direct WIN32 access was used, however Duplicati tries to stay with what .NET Framework provides, and I haven’t found that one yet…
Duplicati also uses AlphaFS which at least can return FileSystemEntryInfo.ReparsePointTag Property, however I’m not sure how you set that tag or get/set the reparse data as mentioned in Reparse Points.

I can see the reason why you’d like to not download the world, and why you’d want to backup changes existing purely locally, but I’m interested in how backing up placeholders most useful online adds value. Even if it adds value, I’m not sure if it can be done, but without value it’s probably not worth much work.
NTFS multiple data streams on alternate data streams was a similar discussion, and has similar issues.

Wishing for a nightly build again. Given Beta goals, Canary may have to get selective on fixes.
Occasional settle-down-and-release are one price we pay for the simpler process that we use.
Perhaps Canary releases should do more to encourage testing and reporting on new features.

but in this case there’s a volunteer, although I don’t know how they feel about Canary releases.
Except for a Stop bug, it’s a good time now, and I guess the next round will see if fix-fixes work.

Possibly this change could go in for feature test by at least one person. If not useful, just revert.
Other Canary users will be along for usual ride of hopefully seeing no regressions in their uses.

Agreed, I don’t even need to sync the stubs (which is why I was trying to use exclude-files-attributes = Offline. I think I mispoke above, although I might understand if someone wanted to see a folder state during a particular backup to see if the contents had changed (even without knowing/having the actual contents).

As for testing with a canary build, please just let me know. I have no problem trying it out, as, unfortunately, I can’t use Duplicati at all for my intended purpose until this bug is fixed.

If you don’t mind sending me the src for the C# code your’e using I can compile and test once I’m home tonight. Or were you mean you did a local build and tested that?

I meant test out Duplicati with the customization. I would recommend a spare machine or VM so that you don’t affect Duplicati running on your production machine. On the spare machine/VM, you could install Visual Studio 2019. Not sure how familiar you are with GitHub - since my PR hasn’t been merged you’ll need to fork the Duplicati project and then merge the PR into your fork.

If you want to go down this path let me know if you need any pointers.