Are the full hashes looked up directly in that table?
I’m not sure that indexes well and the compare operation might be expensive since it’s fairly long strings.
There may be clever ways of indexing hashtables to improve the lookups Bitpacking techniques for indexing genomes: I. Hash tables | Algorithms for Molecular Biology | Full Text
I usually just store something like a row ID to the hash record and an SHA1 of the hash itself (primary key, for sorting proposed, allowing duplicates).
That opens up options like:
- a left join through the SHA1 temp table to the actual hash table
- exists check prompting the expensive current lookup on actual hash
- count > 1 prompting a less expensive current lookup on row ID (assuming it’s a key or indexed field) and hash
Again this all is based on my assumptions of the issue, I still haven’t dug into the code yet.
Oh, and if you stuck around this far into the post you get to know that my recreate finished after 4 days, 5 hours, 37 minutes.