Creating new issue as requested.
The query that does it is this:
RemovedFileSize = await cmd.ExecuteScalarInt64Async($@"
SELECT SUM(""C"".""Length"")
FROM
""{m_tablename}"" ""A"",
""FileLookup"" ""B"",
""Blockset"" ""C"",
""Metadataset"" ""D""
WHERE
""A"".""FileID"" = ""B"".""ID""
AND (
""B"".""BlocksetID"" = ""C"".""ID""
OR (
""B"".""MetadataID"" = ""D"".""ID""
AND ""D"".""BlocksetID"" = ""C"".""ID""
)
)
", 0, token)
.ConfigureAwait(false);
So what it counts is the size of the files that are removed. It then takes the sum of this number from all filesets and returns it as the “reduction”.
This would be correct if there was no deduplication. But it is quite possible that the same file exists in all versions unmodified, and removing the version does not affect the storage at all.
I have created an issue for tracking this.
In my case the files that were getting purged were present in 10+ filesets, hence the overcount.
Seems to me there is an interesting question here as to what/how to report? Number of files shouldn’t be an issue - report the number of files being removed from the fileset (on a per-fileset basis). But space used? I.e., if I purge a set of files from version “8”, but they still exist in version “9”, there really isn’t any reduction in storage at all (aside from the dlist/dindex files). Maybe just say “n files purged from fileset, file(s) exist in other filesets, no reduction in storage” or similar?
Does this also address the magnitude difference?