Job suddenly takes 10 times longer than usual

Eni_Ki · February 26, 2021, 5:04pm

Recently I observed that one of my jobs took more than 25 mins where it used to take ony less than 2 mins before. There were no major changes/additions/deletions in the source.
Example for good run (Feb 24, duration 1,75 mins):
“DeletedFiles”: 0,
“DeletedFolders”: 0,
“ModifiedFiles”: 2,
“ExaminedFiles”: 5082,
“OpenedFiles”: 4,
“AddedFiles”: 2,
“SizeOfModifiedFiles”: 109420,
“SizeOfAddedFiles”: 508534,
“SizeOfExaminedFiles”: 38153360714,
“SizeOfOpenedFiles”: 617954,
“NotProcessedFiles”: 0,
“AddedFolders”: 0,
“TooLargeFiles”: 0,
“FilesWithError”: 0,
“ModifiedFolders”: 0,
“ModifiedSymlinks”: 0,
“AddedSymlinks”: 0,
“DeletedSymlinks”: 0,
“PartialBackup”: false,
Example for bad run (Feb 25, duration 26 mins):
“DeletedFiles”: 0,
“DeletedFolders”: 0,
“ModifiedFiles”: 5082,
“ExaminedFiles”: 5084,
“OpenedFiles”: 5084,
“AddedFiles”: 2,
“SizeOfModifiedFiles”: 38912,
“SizeOfAddedFiles”: 1494286,
“SizeOfExaminedFiles”: 38154855000,
“SizeOfOpenedFiles”: 38154855000,
“NotProcessedFiles”: 0,
“AddedFolders”: 0,
“TooLargeFiles”: 0,
“FilesWithError”: 0,
“ModifiedFolders”: 0,
“ModifiedSymlinks”: 0,
“AddedSymlinks”: 0,
“DeletedSymlinks”: 0,
“PartialBackup”: false,
bad run right after previous (Feb 26, duration 39 mins):
“DeletedFiles”: 0,
“DeletedFolders”: 0,
“ModifiedFiles”: 5084,
“ExaminedFiles”: 5087,
“OpenedFiles”: 5087,
“AddedFiles”: 3,
“SizeOfModifiedFiles”: 227174,
“SizeOfAddedFiles”: 1384393,
“SizeOfExaminedFiles”: 38156243365,
“SizeOfOpenedFiles”: 38156243365,
“NotProcessedFiles”: 0,
“AddedFolders”: 0,
“TooLargeFiles”: 0,
“FilesWithError”: 0,
“ModifiedFolders”: 0,
“ModifiedSymlinks”: 0,
“AddedSymlinks”: 0,
“DeletedSymlinks”: 0,
“PartialBackup”: false,

Why were all these files opened while there was nothing to upload?
Using Duplicati 2.0.5.1_beta_2020-01-18, target location is webdav.magentacloud.de
Appreciate any insights.

ts678 · March 1, 2021, 11:22pm

You can see details on decisions in About → Show log → Verbose. Here’s an example explanation:

2021-01-20 19:40:35 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-CheckFileForChanges]: Checking file for changes C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2021-01-19_155510, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 1/21/2021 12:32:29 AM vs 1/20/2021 10:34:04 PM

Do you use usn-policy to speed up changed-file scan? Canary got a fix, but I don’t know if it’s related.

EDIT:

Example was from a log file, but live log should show the same data after the left hand data is done…

Eni_Ki · August 12, 2021, 12:43pm

Long time no hear from me - but I have to get back to this:

never heard of this option and I didn’t consciously specify it.

In About → Show log → Verbose I see a lot of messages like:
12. Aug. 2021 14:09: Checking file for changes T:<path><filename>, new: False, timestamp changed: False, size changed: False, metadatachanged: True, 21.07.2007 18:40:28 vs 21.07.2007 18:40:28

Note that date and tiem are exactly the same, still Duplicati seems to assume that metadata has changed - which is definitely not true.
As a result the job takes > 10 x longer than it usually takes.
Also: just today I notice that today’s backup is taking very long ALTHOUGH the previous run already took 25 mins. And there were definitely no changes in in any of the files in between the two runs since I didn’t even browse that drive.
Looking through my backup history of the past 6 months in Duplicati Monitoring I find an odd pattern: Normally most backups take < 3 mins. But whenever there is one with an excessive run time, the very next run also takes extremely long (mostly around 25 mins, but sometimes also in the 1 hr range). So the “long runs” always occur in a pattern of two (for once even 4 in a row). Then run time drops back to normal, for a while, with no consistent period.

All in all I have 3 jobs schedduled to the same destination (via WebDAV). Besides this one (“job T”) only one of the two other also shows a pattern like this, but not that clear (“job H”).
As you can see in the screenshots backup size hardly ever changed and data modifications were usually in the range of a few MB for the runs in question.

ts678 · August 12, 2021, 2:23pm

Would you be up to looking inside a database? It’s sort of like a spreadsheet. What OS is this running on?
If you’re not up for that, then Creating a bug report and posting it may let someone else look, however the paths are sanitized, so looking for a specific affected path would be hard. Do all the paths seem affected?

Backup not saved if only metadata has changed #4312

describes a problem where it can require an actual file content change to clear up timestamp problems.
You can look at the job logs to see whether you find a pattern of short/long/short relating to source stats

drwtsn32 · August 12, 2021, 2:26pm

Size and timestamp are only two components that make up the metadata Duplicati tracks. Duplicati thinks one of the other components changed, but logging doesn’t say exactly what that is.

When it does occur, backups will take longer because Duplicati is processing the entire file. But as you noted not much is uploaded because if the differences are only in the metadata, it doesn’t take much space at all.

You aren’t changing permissions on the files, or anything like that right?

ts678 · August 12, 2021, 3:01pm

We weren’t told the BackendStatistics from Complete log of the job log, but they might help little.
Presumably it would show a small upload, but it might be a few new files plus a lot of new metadata.

The database view of new metadata would be later paths in File view showing different MetadataID, however keeping the same BlocksetID for the most part (exceptions being files with content change).

Here’s me in DB Browser for SQLite, filtering for file that changes a lot in both content and metadata:

Original post has equivalent log info. Sorry I missed that. Bad run and one after both show added files.
This likely refutes the theory about sets of long runs being due to no file content having been modified.

I asked for OS earlier. I guess I should also ask for filesystem type whose metadata seemed changed.

Eni_Ki · August 12, 2021, 4:56pm

Thanks @ts678 and @drwtsn32 for your quick reply and input.
Duplicati is running on Win8.1 with latest patches. The data source itself is on a Synology NAS (DS211j) with DSM5.2 in the local (wired) network. Fwiw: Complete NAS shares also get backed up with Time Backup (Synology App) to an ext4-USB drive attached to the DS every 2 hours.
Again: Most of the time the jobs (subset of complete shares) run fine and I am 100% sure that I never consciously changed any permissions or metadata on the source drives to the extent that it would justify such excessive run times - let alone twice in a row.
I am willing to look into the database if you can guide me what to search for and how.
Source stats of latest backup (Job T, duration: 24 min):
Job T source stats
Excerpt of latest log file (job T). Most all of the time is spent between messages at 2021-08-12 14:03:28 and 2021-08-12 14:27:43:

{
“DeletedFiles”: 0,
“DeletedFolders”: 0,
“ModifiedFiles”: 5131,
“ExaminedFiles”: 5131,
“OpenedFiles”: 5131,
“AddedFiles”: 0,
“SizeOfModifiedFiles”: 0,
“SizeOfAddedFiles”: 0,
“SizeOfExaminedFiles”: 38851294043,
“SizeOfOpenedFiles”: 38851294043,
“NotProcessedFiles”: 0,
“AddedFolders”: 0,
“TooLargeFiles”: 0,
“FilesWithError”: 0,
“ModifiedFolders”: 0,
“ModifiedSymlinks”: 0,
“AddedSymlinks”: 0,
“DeletedSymlinks”: 0,
“PartialBackup”: false,
“Dryrun”: false,
[…]
“Messages”: [
“2021-08-12 14:03:22 +02 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: Die Operation Backup wurde gestartet”,
“2021-08-12 14:03:25 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()”,
“2021-08-12 14:03:28 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (1,53 KB)”,
“2021-08-12 14:27:43 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-20210812T120322Z.dlist.zip.aes (589,39 KB)”,
[…]

drwtsn32 · August 12, 2021, 6:20pm

Interesting… I have a Synology but Duplicati is running directly on the NAS. I could do some testing where I run Duplicati on a remote machine but I think we’ll need a way to reliably trigger the issue.

Another part of the metadata are the file attributes themselves. Are you aware of anything that may be altering the attributes (archive flag, read only flag, etc) of your files?

ts678 · August 12, 2021, 10:14pm

The database path is on the Database screen. Safest plan is to copy it when backup isn’t running.
See prior example link to download location, and see example for how to get to File table and look.

Unlike my screenshot, you would be seeing BlocksetID stay the same if file content is unchanged.
If metadata changes, then a new MetadataID would appear. You can sanity check that decision by
looking at other tables, but it’s more work, and actually inspecting the metadata takes file handling.

New stats look like they might fit the opened-everything-for-scan-found-nothing-so-no-new-version.
You can check Restore dropdown to see if you got a version at that time, although seeing the dlist uploading suggests you did. To see more complete BackendStatistics, view the Complete log at
bottom line of job log. This will have more detailed information, e.g. how many files were uploaded:

Although I doubt it’s worth it, uploaded file names can be seen at About → Show log → Live → Retry.
Seeing statistics might be enough to guess whether backup updated metadata for 5131 files, or less.

Do you have upload-unchanged-backups set in Options, by any chance? That would upload a dlist file
whether or not any dblock and dindex files were uploaded. Messages are limited, but your list is short.

You can view past uploads in job’s Show log → Remote. A small upload example (read bottom-up) is:

This is especially worth pursuing if the DB analysis confirms that metadata changed. This can be done without opening files. Initial sanity check would be on size and hash of the metadata that was recorded.

Is this just a home system with standard default NTFS file system, e.g. the C: drive with nothing exotic?

ts678 · August 13, 2021, 2:23am

Answered already below. It’s not local files. I assume this is accessed over SMB from Windows:

Any idea how these work on an SMB share? Windows and Linux (is that what Synology is?) differ.

Linux sample metadata from ext4 filesystem:

{"unix:uid-gid-perm":"1000-1000-436","unix:owner-name":"name","unix:group-name":"name","CoreAttributes":"Normal","CoreLastWritetime":"637603542177629573","CoreCreatetime":"637603542177629573"}

Windows sample metadata from NTFS:

{"win-ext:accessrules":"[]","CoreAttributes":"Archive","CoreLastWritetime":"636995086065080123","CoreCreatetime":"637280032989835983"}

Since all your files seem to attract attention at once, maybe you can pick a small file and give it its own backup going to a local folder without encryption. If it falls in the same pattern of long runs, analysis gets much easier. Just unzip the new dblock and look at what the backup did. You should see only metadata unless file content changed, in which case one of the files in the new dblock should contain the change.

drwtsn32 · August 13, 2021, 1:55pm

It depends on how SMB is configured and the underlying filesystem. SMB will translate as best it can Linux permissions to something that Windows understands. It’s not a perfect translation due to inherent differences between Windows and Linux. Some concepts just don’t exist in Linux (such as a file’s “archive” bit). On modern systems these Windows-only properties are stored as extended attributes on the Linux filesystem.

Here’s an example from one of my Linux systems that has an SMB share. I placed a file on it and tweaked its file attributes:

$ getfattr -d Test.txt 
# file: Test.txt
user.DOSATTRIB=0sAAAEAAQAAABRAAAAIAAAAOavCfVIkNcB5q8J9UiQ1wE=

Unfortunately my Synology doesn’t have the getfattr command so I can’t check there, but I’m almost certain it is making use of extended attributes.

In any case, Duplicati doesn’t see any of that … the SMB layer just dutifully translates everything as best it can and to Duplicati it looks like it’s seeing a regular DOS/Windows based system. So Duplicati will be storing Windows-style metadata.

Eni_Ki · August 13, 2021, 6:24pm

Being more of a Windows-person, I never bothered checking if a package was available for Synology when I started with Duplicati. But in fact it sounds reasonable to have it running right on the NAS where all the source data resides.

As I already mentioned: no. More than 50% of the 16684 files are images or videos from years ago, where I am well past the phase of fiddling with tags (metadata). I also mentioned that the files get backed up with Synology Time Backup to a USB drive every 2 hrs, so if this would touch any attributes, wouldn’t this mean that each and every Duplicati Backup (once a day) was affected?

I opened the database in the DB Browser but now I need a guiding hand.

when you say “stay the same”: what do you compare it to? Do I need to open 2 versions of the DB and compare the IDs or is the history somewhere buried in a single DB?

Here is a cross-reference between the latest versions available in Restore and the runs tracked in Monitoring:
Dup xref version vs. runs.xlsx.zip (7.9 KB)
All long runs of course are tracked as versions, but not all versions were long runs.

Nope, never heard about this.

Good idea. I will try to set up some test jobs with a smaller dataset, targeting local drives as well as the original WebDAV. Dataset shouldn’t be too small (> 1 GB I guess and a significant number of files) so that we should expect a significant difference in runtime between nothing to do and touch-all.

ts678 · August 13, 2021, 7:13pm

The number above it, as maybe better said below. In default sort, “later” means down, and filtering Path down to a single file (pick one) gives you a chronological (top-down) view of its data/metadata changes.

You can, of course, set up whatever you like (and probably learn something), but what I had in mind was picking one small file from the original set to practice looking at how data and metadata look in a backup.

When the main backup gets slow, you can run this single-file backup and very easily see new metadata. Compare that to the metadata from the previous backup. Maybe you’ll spot a change (no guess at what).

Doing this with the big backup is harder because so much is happening. You would look up the File table MetadataID as ID in the Metadataset table, then look up that BlocksetID as ID in the Blockset table. That’s where you could find the Length and FullHash to sanity-check whether metadata changed. Length might remain constant (e.g. if a timestamp number changes), but FullHash would notice any metadata change.

Actually getting the value for the metadata would usually mean looking up the BlocksetID in BlocksetEntry table, then looking up that BlockID as ID in the Block table, then looking up that VolumeID as ID in the Remotevolume table, then using Name to obtain the file. You then decrypt it with AES Crypt or Duplicati’s SharpAESCrypt.exe CLI tool. You then have a .zip file to look at. With a one-file backup, it’s right there…

As a note if you want to try tracing though the DB, put an = sign in front of filter value for an exact match, otherwise small numbers may get inadvertent match if they’re inside big ones, e.g. if BlockID is 1, use =1.

How the backup process works gets into some of this, but not in much depth (might be good background).

drwtsn32 · August 13, 2021, 8:53pm

Running Duplicati where the data resides is usually the best way to go. Not sure how much RAM your NAS has, what version of DSM you have, and if you have the ability to run Docker on it. The native package may not work for you if you use DSM 7. I personally stopped using the native package and switched to Docker, but that has a bit of a learning curve (but IMO it’s worth it).

Eni_Ki · August 14, 2021, 11:32am

Ah, I see - a much smarter approach! I was already wondering how to compose a fileset that might - hopefully - show the odd behavior.
I can follow your instructions through the DB tables. If I sort data in the File table by Metadata ID, is it correct that the files with highest MetadataID were changed most recently?
Interstingly the 6 ones with highest Metadata ID are Windows’ hidden “Thumbs.db” files in folders which I haven’t visited for a long time (I think).
I set up a single-file backup job and will monitor what the big job does - it may take a while before I get back with my findings.

Eni_Ki · August 14, 2021, 2:20pm

As I mentioned above my NAS DS211j runs DSM 5 (DSM 5.2-5967 Update 9 to be precise) on a MARVELL Kirkwood 88F6281 processor with 128 MB RAM - 10 years old but still running reliably. I stopped upgrading past 5.x because I didn’t want to lose support for TimeBackup which was discontinued with DSM 6.x. I didn’t bother checking further for an installation package on the DSM because there was none in the package center available and everything else I read about it sounded too complicated. IF you can point me to a suitable package I may look into this again.

ts678 · August 14, 2021, 3:12pm

I think that’s correct. The documentation notes “Metadata is treated like a normal block of data”, so looking at the bottom of the Blockset table might see either normal file lengths, or short metadata-sized blocksets. Although metadata is technically a blockset, there is usually just one block, as default blocksize is 102400.

Eni_Ki · August 14, 2021, 4:00pm

So do I get it right that each change in metadata increments the MetadataID to the next available value? This is what I conclude from sorting the File view by path: each <path + leaf> has the same BlockID but different Metadata IDs. Interestingly I find that almost all files (after casual browsing) have 3 Metadata IDs. This also roughly fits the fact that I currently have 5313 files and 16684 entries in the File table (roughly 3x file number, the rest could be deleted or more recently added files with less than 3 MetdatdaID variants).
How is an actual change in the file (not in metadata) reflected in the DB?

drwtsn32 · August 14, 2021, 8:18pm

Your NAS is almost certainly too underpowered to run either the native Duplicati package or a docker container, unfortunately.

ts678 · August 14, 2021, 8:35pm

That would be my guess, but it’s very indirect. The MetadataID comes from ID in the Metadataset table which autoincrements because it’s the primary key. Duplicati doesn’t do the increment. SQLite does it:

SQLite Autoincrement

On an INSERT, if the ROWID or INTEGER PRIMARY KEY column is not explicitly given a value, then it will be filled automatically with an unused integer, usually one more than the largest ROWID currently in use. This is true regardless of whether or not the AUTOINCREMENT keyword is used.

github.com

duplicati/duplicati/blob/7d71c6a4797cc62a8283c150bda90713fe472474/Duplicati/Library/Main/Database/Database schema/Schema.sql#L215-L222


      
          /*

          A metadata set, essentially a placeholder

          to easily extend metadatasets with new properties

          */

          CREATE TABLE "Metadataset" (

          	"ID" INTEGER PRIMARY KEY,

          	"BlocksetID" INTEGER NOT NULL

          );

That seems very weird… Maybe you should just look at one, e.g. the described test of a small file backup without encryption. Unzip the .dblock file and look at everything. Might be easiest to drag files to notepad.

You can take a MetadataID, put an = sign in front, and filter the File view. Many Path have that ID? Weird…

In File view, it’s a new BlocksetID (see my example picture). It’s also (Blockset, BlocksetEntry, Block), etc