-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751
Comments
I like this idea, especially if we do an actual enum type instead of string-typing new hash functions. Are you planning to do a RFC branch, or should I? |
A complication: [
{
"class": "sourmash_signature",
"email": "",
"hash_function": "0.murmur64",
"license": "CC0",
"signatures": [
{
"ksize": 20,
"max_hash": 0,
"md5sum": "98f13708210194c475687be6106a3b84",
"mins": [],
"molecule": "DNA",
"num": 1,
"seed": 42
}
],
"version": 0.4
}
] This complicates having
... and be only one field:
Going even further, I would like to see sourmash signatures become an object at the top-level (instead of a list), and different sketches being added to the old
Relevant issues:
Relevant PRs:
|
Also:
|
Sorry I missed this earlier! I like @luizirber's suggestion of keeping the hash function in each individual signature. Would the k-mer size and scale/num be included in this as well, since those also influence the compatibility of comparing signatures? |
here's a random thought for @olgabot @luizirber in particular --
right now our signatures contain entries for
where signatures can be flagged as incompatible due to hash function OR molecule type (or other things, like ksize). When @olgabot added dayhoff encoding, it ended up adding a whole bunch more possible incompatibilities (I'm not sure this is saved in the signature JSON currently, tho). And we're hoping to add more such things in the future, with skip-mers and other approaches.
So... I was thinking it would be possible to proliferate hash functions instead of molecule types etc.
the idea would be to create hash_functions enum types like
that would encode these features.
Then we could get rid of specific molecule type/other flag checks in the signatures and MinHash objects.
Thoughts?
The text was updated successfully, but these errors were encountered: