-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] improve identifier & taxonomy parsing for lca index
#1542
Conversation
… identifiers/taxonomy
Codecov Report
@@ Coverage Diff @@
## latest #1542 +/- ##
==========================================
+ Coverage 90.26% 95.21% +4.94%
==========================================
Files 126 99 -27
Lines 21271 17584 -3687
Branches 1595 1600 +5
==========================================
- Hits 19201 16743 -2458
+ Misses 1843 611 -1232
- Partials 227 230 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Current command for building LCA indices 😆 sourmash lca index ../../gtdb-rs202.taxonomy.v2.csv gtdb-rs202.genomic.k31.lca.json.gz \
gtdb-rs202.genomic-reps.k31.sbt.zip --scaled=10000 \
--require-taxonomy --fail-on --split-identifier |
This is working in practice, but still needs to be tested. @bluegenes if y'all end up taking this over for #1515 well that'd be just fine by me 😁 |
can I merge this into #1543 and add tests over there? Or would it be best to add tests here and then integrate? |
please go ahead!
|
As our
sourmash_databases
fu continues to evolve, we are doing a better job of providing versioned accessions on signature names, but this is also changing the requirements for taxonomy spreadsheets.This PR provides a bunch of options to
lca index
to improve the UX and slightly improve the overall situation, while also highlighting how silly the current code and UX is :).