[Feature Request] Data output changes


#1

I submitted this as an issue to github, but after submitting the document went to 404 and disappeared out the issues list immedately, which is a shame because I spent some time on it. Same thing happened when I tried open a new repository to house the template I’m working on. So, I’m going to post an earlier draft of the message here and hope it doesn’t get lost.

Edit: Looks like Github is having issues. Apologies if my issue is duplicated.

The JSON serializer and HTTP output modules are great tools to build a remote monitoring platform on. Having spent the last few days building a solution, I have a few recommendations to make the monitoring part a lot easier to setup. The current JSON output based on the HTTP template is kind of messy with duplicated fields in different formats, incorrect data, and one or two unhelpful key names.

Common output (all templates should have this) :
Generate GUID field to globally identify backup instance.
Provide for setting instance “friendly name” during initial setup
Backup ID name that is the same as DB name for that backup set.

JSON output standardization and tweaks:
Disassociate JSON template from standard HTTP message template and give it a new template
Output all timestamps in epoch format
Output all duration times in milliseconds
Backup & Verify operations should not be concatenated in output, and POSTed as separate events, even if run consecutively. On my end, it looks like after every backup is run, a Verify op is run and included, doubling up a lot of data.

Should probably have something like an API key field, or at least give the JSON output an option to include extra parameters, but that can wait.

Here is my suggestion for a new JSON Template:

"Data": {
	"Instance": {
		"GUID": "xxxxxx-xxxxxx",
		"Name": "string"
	},
	"Set": { 
		"ID": "string", #Same as internal DB name for set
		"Name": "string"
	},
	"Operation": {
		"Type": "string", #('Backup','Restore','Verify','Compact','Repair')
		"BeginTime": epoch,
		"EndTime": epoch,
		"Duration": millisecond,
		"Success": Boolean,
		"Dryrun": Boolean,
		"LastRun": epoch, #Last time any operation was run
		"LastRunSuccess": Boolean, #Was the last run above successful?
		"NextRun": epoch, #Next scheduled operation
	},
	"Storage": {
		"SourceSize": bytes,
		"RemoteSize": bytes,
		"TotalFiles": number,
		"Provider": "string", #Where is the backup being stored
		"QuotaSize": bytes,
		"QuotaExceeded": Boolean
		"RemoteCalls": number,
		"SizeUp": bytes,
		"SizeDown": bytes
	},
	"Backup": {
		"FilesAdded": number,
		"FilesDeleted": number,
		"SizeAdded": bytes,
		"SizeDeleted": bytes,
		"VersionsStored": number,
		"VersionsDeleted": number,
		"LastVersion": epoch, #should be a timestamp of the last time the backup executed without warnings/errors — should match most recent restore date available
		"PartialBackup": boolean
	},
	"Restore": {
		"FilesRestored": number,
		"FilesFailed": number,
		"SizeRestored": bytes,
		"SizeFailed": bytes
	},
	"Compact": {
		"Change": number #Expressed as percentage diff i.e. 97 =97% 
	},
	"Database": {
		"Size": bytes #Duplicati main DB size
	},
	"Messages": {
		"Message": "string" #Delimited
	}
}

Written more plainly:

Data.Instance.GUID xxxx-xxxxx
Data.Instance.Name string
Data.Set.ID string same as dB name
Data.Set.Name string
Data.Operation.Type “Backup”,”Restore”,”Verify”,”Compact”,”Repair”
Data.Operation.BeginTime epoch
Data.Operation.EndTime epoch
Data.Operation.Duration millisecond
Data.Operation.Success Boolean
Data.Operation.Dryrun Boolean
Data.Operation.LastRun epoch last time any operation was run
Data.Operation.LastRunSuccess Boolean was the last run above successful
Data.Operation.NextRun epoch Next scheduled op
Data.Storage.SourceSize bytes
Data.Storage.RemoteSize bytes
Data.Storage.TotalFiles number
Data.Storage.Provider string Where is the backup being stored
Data.Storage.QuotaSize bytes
Data.Storage.QuotaExceeded Boolean
Data.Backup.FilesAdded number
Data.Backup.FilesDeleted number
Data.Backup.SizeAdded bytes
Data.Backup.SizeDeleted bytes
Data.Backup.VersionsStored number
Data.Backup.VersionsDeleted number
Data.Backup.LastVersion epoch — should be a timestamp of the last time the backup executed without warnings/errors — should match most recent restore date available
Data.Restore.FilesRestored number
Data.Restore.FilesFailed number
Data.Restore.SizeRestored bytes
Data.Restore.SizeFailed bytes
Data.Compact.Change number expressed as percentage diff i.e. 97 =97%
Data.Database.Size bytes Duplicati main DB size
Data.Operation.Messages delimited string


#2

There is currently no GUID in Duplicati, but adding it would help in a few other places as well.

I have a half-done implementation of this. The idea is that each monitoring service can provide some settings (like a user-key, access token, etc), and the user can then simply supply something like “service-abc:tokenxyz” which will tweak the defaults to submit the report to “service-abc”.

Is this not provided in the output? There should be a “backup-name” property somewhere…

The db name is autogerenated, so the name can change if the database is regenerated. We can include it, but it would be dangerous to use it as a key.

I agree with all these.

Yes… That happens because internally Duplicati runs the backup, and then a verify operation and possibly a delete+compact operation.

I like it, it looks very clean.
To avoid breaking services that depend on the current format, I suggest we add a new format provider here:


#3

The ‘instance’ friendly name would be how a user would quickly tell different instances of Duplicati from each other in reporting.
If I have 5 client machines on the same network all backing up ‘documents’, how should I tell them apart in the reports? The GUID is great for building metrics, particularly at scale, but for display purposes we want something the user will recognize easier. Polling the machine hostname for this is a possibility, but it seemed like giving the user an option to specify a name for the specific instance would be the most friendly way to handle it.

We need a way of uniquely identifying a backup set, even if the friendly name changes. Using an immutable unique ID for metric comparisons is a way to ensure that doesn’t break if the name changes for some reason. Another GUID is possible here, but seemed like overkill. If the filename does change (I wasn’t aware), that’s not a great solution. Another possibility would be to build the UID off of the instance GUID such that the backup set ID looks like xxxx-xxxx-0001 or something.

If my template suggestion is liked and combined reports are still desired, it would have to be heavily modified to include branching result keys for multiple operations; doubling up a lot of values & significantly increasing the number of keys per document. It would make building the reporting a lot more complex as it requires accurately anticipating each query that pulls more than one result from a single document and writing the scripts for that. This really starts to add up If you are asking for a metric from more than one, but not all possible operations in a document. Aggregating scripts would have to be written for each possible combination or that kind of query won’t be possible.

From an indexing perspective, I think it’s much cleaner if each operation were atomically reported on. In the case where a single document contains multiple similar metrics that need to be aggregated together before being reported, you have to construct a script to handle each of those desired metrics and inline that with the search query every time the report is requested. The data is updated in near real-time, so this could be on the order of every 5 seconds, for upwards of 9 similar metrics that would need to be queried and summed 3 times each per document per report running. It’s much more expensive to run that when you have a reporting system that handles a lot of clients. The query system is optimized for serialization and collecting single metrics over time. Getting a single metric (e.g. total bytes downloaded month to date) is a computationally simple query if you have 100 different documents with just a single key to read. If you have 10 documents with 9 keys each that need to be evaluated and run through a script internally to add together before the result can be displayed, it’s a lot more overhead.

Putting it another way, atomic reports scale really well, because internally each individual report can potentially be stored in a different node in the same cluster, like stripes in a RAID. You’d have multiple nodes responding to many queries improving performance vs results stored and reported back by fewer or even a single document. Does that make sense the way I’m explaining it?

Makes sense. JSONv2 works fine.