-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sylph tool Wrapper #1518
base: master
Are you sure you want to change the base?
Sylph tool Wrapper #1518
Conversation
@astrovsky01 Connecting you to the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tcollins2011!
The DB is tiny? Is it the wrong DB?
A few initial comments inline.
tools/sylph/sylph.xml
Outdated
<expand macro="input_database"/> | ||
</inputs> | ||
<outputs> | ||
<data format="tabular" name="output" label="${tool.name} on ${on_string}" from_work_dir="output.tsv"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add here as metadata the column names if you like. they seem to be static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They query and profile commands both generate an output tabular file, but are slightly different (i.e for output_1.tabular and output_5.tabular are similar but slightly different, and only difference is query vs profile). We could split this into two tools if you think that would be valuable, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tcollins2011! I put some comments.
A big one: it seems that both Python scripts come from https://github.com/bluenote-1577/sylph-utils and there is no LICENSE in the repository. So we are not allowed to take the scripts and add them here.
It might be better to find a way (maybe talking with authors) so they make the scripts available via conda
maybe. That would avoid duplication
tools/sylph/sylph.xml
Outdated
<param name="functions" type="select" label="ANI querying or taxonomic profiling" help="ANI(Average Nucleotide Identity) is a measure of nucleotide-level similarity between the genomes of two microogranisms. Profile querying involves comparing functional or sequence profiles derived from the genomic data."> | ||
<option value="profile">Taxanomic Profile</option> | ||
<option value="ani">ANI Query</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it would not more sense to have 2 Galaxy tools (one for sylph profile
, one for sylph query
).
That would be closer of how people would use it via command-line. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tcollins2011 any opinion here?
tools/sylph/sylph.xml
Outdated
#else: | ||
&& ln -s "$database_select.metadata" "database.tsv.gz" | ||
#end if | ||
&& python "$__tool_directory__/sylph_to_taxprof.py" -s output.tsv -m database.tsv.gz -o metaphlan_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need to put in a # if "metaphlan" in $function.outputs:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metaphlan tool must be run regardless if krona, metphlan, or both are requested. The krona formatted file is generated from the metaphlan converted file. The #if part is about if we hand it back to the user
The DB we used in this for testing is basically just e. coli, and is as minimal as possible to run tests againstt. Even at that size, though, the metadata file was too big for the repo. That's also why the final test is just checking command text. The next tool we were planning to work on was one to generate sylph db and their associated metada files so users can build their own. In the meantime, I added a readme clarifying how to get the pre-made db and metadata. |
@bgruening @astrovsky01 We could instead wrap a larger database for more thorough testing. We essentially made as small a database as possible when we trying to push this to tools-iuc, but if you think it would provide more thorough testing we could instead swap it out for a larger database. In theory though, the tiny database should allow for fully functional testing within galaxy. |
Adding a profile to the tool Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
Changing to correct repository owner Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improving description of tool Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improved description of the tool Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
All good, I was just confused because you said the DB is 6MB and I wanted to check if the submitted file was a mistake. |
Updated due to publication Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
@bebatut the krona converter is actually a modified version of the one wrapped in the Metaphlan tool in galaxy. I believe it's from Galaxy, too, since I don't see it in the metaphlan original repo |
Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
It looks like test files are missing and the limiting is not passing. |
@bgruening Alright I think everything should be passing now except the file size check. |
tools/sylph/sylph.xml
Outdated
<param name="functions" value="profile"/> | ||
<param name="min_num_kmers" value="49"/> | ||
</conditional> | ||
<output name="output" value="output_2.tabular" compare="sim_size"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use sim_size on text outputs, use asserts :)
<when value="cached"> | ||
<param label="Select a sylph database" name="sylph_database" type="select"> | ||
<options from_data_table="sylph_databases"> | ||
<validator message="No Sylph databases are available" type="no_options" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add here a filter for version 1
or something like that ... and then we include version 1 in the test file.
Whenever the tool changes to DB layout we increase this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping, you are not filtering the DB here according to the version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me, just 2 small comments.
….xml file to have more shared logic
@bgruening Sylph has been split into two separate tools now both within this directory. Additionally, I see that we are still failing the file size check, but I'm not sure if it is possible to reduce the database metadata files to a smaller size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tcollins2011! Added a few more smaller comments!
<when value="cached"> | ||
<param label="Select a sylph database" name="sylph_database" type="select"> | ||
<options from_data_table="sylph_databases"> | ||
<validator message="No Sylph databases are available" type="no_options" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping, you are not filtering the DB here according to the version
</param> | ||
<!-- Only permitting fastq as tool input only allows fastq and fastq.gz as file ext --> | ||
<when value="single"> | ||
<param name="input" type="data" format="fastq,fastq.gz,fastqsanger,fastqsanger.gz" label="Single-end input reads"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed, can we not just go with the multiple=true option?
@tcollins2011 bluenote-1577/sylph-utils#2 seems to be resolved. |
@bgruening Tests are updated again. I think everything should be resolved if you want to take a final look. |
Moving the sylph tool wrapper pull request to galaxytools due to the metadeata file associated with the .syldb database having a size of 6mb.