Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] update default db to rs207 reps #215

Merged
merged 7 commits into from
Jun 23, 2022
Merged

[MRG] update default db to rs207 reps #215

merged 7 commits into from
Jun 23, 2022

Conversation

taylorreiter
Copy link
Member

Addresses #212 by updating to the GTDB rs207 reps database by default. Notes of potential importance:

  1. downloads the full gtdb rs207 lineage file, not reps. Other than taking up space, this doesn't change how charcoal runs.
  2. changes scaled value from 1000 -> 2000, which is the scaled value reps is built at. May cause some short contigs to be missed that previously were detectable, but I think the benefit is outweighed having the larger database.

@taylorreiter
Copy link
Member Author

Interesting -- fails with

    raise ValueError(f"empty or improperly formatted pickfile '{pickfile}'")
ValueError: empty or improperly formatted pickfile '/tmp/charcoal_testr532yde4/stage1/TOBG_NAT-167.fna.gz.matches.csv'
Traceback (most recent call last):
  File "/usr/share/miniconda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/share/miniconda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/work/charcoal/charcoal/charcoal/contigs_search_taxonomy.py", line 151, in <module>
    returncode = cmdline(sys.argv[1:])
  File "/home/runner/work/charcoal/charcoal/charcoal/contigs_search_taxonomy.py", line 146, in cmdline
    return main(args)
  File "/home/runner/work/charcoal/charcoal/charcoal/contigs_search_taxonomy.py", line 36, in main
    picklist.load(args.matches_csv, picklist.column_name)
  File "/usr/share/miniconda/lib/python3.9/site-packages/sourmash/picklist.py", line 163, in load
    raise ValueError(f"empty or improperly formatted pickfile '{pickfile}'")
ValueError: empty or improperly formatted pickfile '/tmp/charcoal_testr532yde4/stage1/GCF_000005845-subset.fa.gz.matches.csv'

@taylorreiter
Copy link
Member Author

I'm pretty sure these are failing bc of missing dependencies lxml, plotly, and interval. @ctb, I'm not sure how to handle this within tests. I think tests is taking the environment from environmentl.yml, but snakemake installs these in their own env when it runs these scripts. Should i make a test_environment.yml and change the github actions?

@taylorreiter
Copy link
Member Author

hmm no, they were failing locally bc of missing dependencies, but now that I have those installed, i'm getting the same

File "/home/tereiter/miniconda3/envs/charcoal/lib/python3.9/site-packages/sourmash/picklist.py", line 163, in load
    raise ValueError(f"empty or improperly formatted pickfile '{pickfile}'")

and not totally sure why updating the databases would do that. wondering if it's on the sourmash side?

@mr-eyes
Copy link
Member

mr-eyes commented Jun 18, 2022

Is the Python version that created the pickle file the same as the one reading it?

@mr-eyes
Copy link
Member

mr-eyes commented Jun 18, 2022

Oh sorry, it's pickfile not pickle.

@taylorreiter taylorreiter changed the title [WIP] update default db to rs207 reps [MRG] update default db to rs207 reps Jun 21, 2022
@taylorreiter
Copy link
Member Author

Ok! so downgrading to sourmash 4.2.3 from 4.4.0 fixed the failed tests. I'll post an issue on sourmash repo, not sure what's causing the problem exactly, but this PR is ready for review and merge @ctb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants