Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: skipmer improvements #3415

Merged
merged 16 commits into from
Dec 12, 2024
Merged

WIP: skipmer improvements #3415

merged 16 commits into from
Dec 12, 2024

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Dec 3, 2024

Make skipmers robust, but keep #3395 functional in the meantime.

This PR:

  • enables second skipmer types, so we have m1n3 in addition to m2n3
  • switches to a reading frame approach for both translation + skipmers, which means we first build the reading frame, then kmerize, rather than building kmers + translating/skipping on the fly
  • avoids "extended length" needed for skipping on the fly

Since this changes the SeqToHashes strategy a bit, there's one python test where we now see a different error.

Future thoughts:

  • with the new structure, it would be straightforward to add validation to exclude protein k-mers with invalid amino acids (X). I guess I'm not entirely sure what happens to those atm...

Copy link

codspeed-hq bot commented Dec 3, 2024

CodSpeed Performance Report

Merging #3415 will not alter performance

Comparing parameterize-skips (28781bc) with try-skipmers (d7f59cf)

Summary

✅ 21 untouched benchmarks

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 78.30189% with 23 lines in your changes missing coverage. Please review.

Project coverage is 86.38%. Comparing base (d7f59cf) to head (28781bc).
Report is 18 commits behind head on try-skipmers.

Files with missing lines Patch % Lines
src/core/src/signature.rs 83.14% 15 Missing ⚠️
src/core/src/encodings.rs 0.00% 6 Missing ⚠️
src/core/src/sketch/minhash.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           try-skipmers    #3415      +/-   ##
================================================
- Coverage         86.45%   86.38%   -0.07%     
================================================
  Files               137      137              
  Lines             16141    16155      +14     
  Branches           2219     2219              
================================================
+ Hits              13955    13956       +1     
- Misses             1879     1892      +13     
  Partials            307      307              
Flag Coverage Δ
hypothesis-py 25.43% <ø> (ø)
python 92.40% <ø> (ø)
rust 62.32% <78.30%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bluegenes bluegenes merged commit 96aea47 into try-skipmers Dec 12, 2024
42 of 44 checks passed
@bluegenes bluegenes deleted the parameterize-skips branch December 12, 2024 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant