Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sylph tool Wrapper #1518

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Sylph tool Wrapper #1518

wants to merge 23 commits into from

Conversation

tcollins2011
Copy link
Contributor

Moving the sylph tool wrapper pull request to galaxytools due to the metadeata file associated with the .syldb database having a size of 6mb.

@tcollins2011
Copy link
Contributor Author

@astrovsky01 Connecting you to the PR

Copy link
Owner

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tcollins2011!

The DB is tiny? Is it the wrong DB?
A few initial comments inline.

tools/sylph/sylph.xml Outdated Show resolved Hide resolved
tools/sylph/sylph.xml Outdated Show resolved Hide resolved
tools/sylph/sylph.xml Outdated Show resolved Hide resolved
tools/sylph/sylph.xml Outdated Show resolved Hide resolved
<expand macro="input_database"/>
</inputs>
<outputs>
<data format="tabular" name="output" label="${tool.name} on ${on_string}" from_work_dir="output.tsv"/>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add here as metadata the column names if you like. they seem to be static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They query and profile commands both generate an output tabular file, but are slightly different (i.e for output_1.tabular and output_5.tabular are similar but slightly different, and only difference is query vs profile). We could split this into two tools if you think that would be valuable, though?

tools/sylph/tool_data_table_conf.xml.sample Outdated Show resolved Hide resolved
tools/sylph/tool_data_table_conf.xml.test Outdated Show resolved Hide resolved
tools/sylph/test-data/test R1.fq Outdated Show resolved Hide resolved
Copy link
Collaborator

@bebatut bebatut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tcollins2011! I put some comments.

A big one: it seems that both Python scripts come from https://github.com/bluenote-1577/sylph-utils and there is no LICENSE in the repository. So we are not allowed to take the scripts and add them here.
It might be better to find a way (maybe talking with authors) so they make the scripts available via conda maybe. That would avoid duplication

tools/sylph/.shed.yml Outdated Show resolved Hide resolved
tools/sylph/.shed.yml Outdated Show resolved Hide resolved
tools/sylph/.shed.yml Outdated Show resolved Hide resolved
tools/sylph/macros.xml Outdated Show resolved Hide resolved
Comment on lines 178 to 180
<param name="functions" type="select" label="ANI querying or taxonomic profiling" help="ANI(Average Nucleotide Identity) is a measure of nucleotide-level similarity between the genomes of two microogranisms. Profile querying involves comparing functional or sequence profiles derived from the genomic data.">
<option value="profile">Taxanomic Profile</option>
<option value="ani">ANI Query</option>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it would not more sense to have 2 Galaxy tools (one for sylph profile, one for sylph query).
That would be closer of how people would use it via command-line. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcollins2011 any opinion here?

tools/sylph/sylph.xml Outdated Show resolved Hide resolved
#else:
&& ln -s "$database_select.metadata" "database.tsv.gz"
#end if
&& python "$__tool_directory__/sylph_to_taxprof.py" -s output.tsv -m database.tsv.gz -o metaphlan_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to put in a # if "metaphlan" in $function.outputs:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metaphlan tool must be run regardless if krona, metphlan, or both are requested. The krona formatted file is generated from the metaphlan converted file. The #if part is about if we hand it back to the user

tools/sylph/sylph.xml Outdated Show resolved Hide resolved
tools/sylph/sylph.xml Outdated Show resolved Hide resolved
tools/sylph/sylph.xml Outdated Show resolved Hide resolved
@astrovsky01
Copy link
Contributor

astrovsky01 commented Oct 9, 2024

@bgruening

The DB is tiny? Is it the wrong DB? A few initial comments inline.

The DB we used in this for testing is basically just e. coli, and is as minimal as possible to run tests againstt. Even at that size, though, the metadata file was too big for the repo. That's also why the final test is just checking command text. The next tool we were planning to work on was one to generate sylph db and their associated metada files so users can build their own. In the meantime, I added a readme clarifying how to get the pre-made db and metadata.

tools/sylph/macros.xml Outdated Show resolved Hide resolved
@tcollins2011
Copy link
Contributor Author

@bgruening @astrovsky01 We could instead wrap a larger database for more thorough testing. We essentially made as small a database as possible when we trying to push this to tools-iuc, but if you think it would provide more thorough testing we could instead swap it out for a larger database. In theory though, the tiny database should allow for fully functional testing within galaxy.

tcollins2011 and others added 4 commits October 9, 2024 09:53
Adding a profile to the tool

Co-authored-by: Björn Grüning <bjoern@gruenings.eu>
Changing to correct repository owner

Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improving description of tool

Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
Improved description of the tool

Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
@bgruening
Copy link
Owner

@bgruening @astrovsky01 We could instead wrap a larger database for more thorough testing. We essentially made as small a database as possible when we trying to push this to tools-iuc, but if you think it would provide more thorough testing we could instead swap it out for a larger database. In theory though, the tiny database should allow for fully functional testing within galaxy.

All good, I was just confused because you said the DB is 6MB and I wanted to check if the submitted file was a mistake.

Updated due to publication

Co-authored-by: Bérénice Batut <berenice.batut@gmail.com>
@astrovsky01
Copy link
Contributor

it seems that both Python scripts come from https://github.com/bluenote-1577/sylph-utils and there is no LICENSE in the repository.

@bebatut the krona converter is actually a modified version of the one wrapped in the Metaphlan tool in galaxy. I believe it's from Galaxy, too, since I don't see it in the metaphlan original repo

@bgruening
Copy link
Owner

It looks like test files are missing and the limiting is not passing.

@tcollins2011
Copy link
Contributor Author

@bgruening Alright I think everything should be passing now except the file size check.

<param name="functions" value="profile"/>
<param name="min_num_kmers" value="49"/>
</conditional>
<output name="output" value="output_2.tabular" compare="sim_size"/>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use sim_size on text outputs, use asserts :)

<when value="cached">
<param label="Select a sylph database" name="sylph_database" type="select">
<options from_data_table="sylph_databases">
<validator message="No Sylph databases are available" type="no_options" />
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add here a filter for version 1 or something like that ... and then we include version 1 in the test file.

Whenever the tool changes to DB layout we increase this version.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping, you are not filtering the DB here according to the version

Copy link
Owner

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, just 2 small comments.

@tcollins2011
Copy link
Contributor Author

@bgruening Sylph has been split into two separate tools now both within this directory. Additionally, I see that we are still failing the file size check, but I'm not sure if it is possible to reduce the database metadata files to a smaller size.

Copy link
Owner

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tcollins2011! Added a few more smaller comments!

tools/sylph/macros.xml Outdated Show resolved Hide resolved
tools/sylph/macros.xml Outdated Show resolved Hide resolved
tools/sylph/macros.xml Outdated Show resolved Hide resolved
tools/sylph/macros.xml Outdated Show resolved Hide resolved
tools/sylph/test-data/sylph_databases.loc Outdated Show resolved Hide resolved
<when value="cached">
<param label="Select a sylph database" name="sylph_database" type="select">
<options from_data_table="sylph_databases">
<validator message="No Sylph databases are available" type="no_options" />
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping, you are not filtering the DB here according to the version

tools/sylph/sylph_query.xml Outdated Show resolved Hide resolved
</param>
<!-- Only permitting fastq as tool input only allows fastq and fastq.gz as file ext -->
<when value="single">
<param name="input" type="data" format="fastq,fastq.gz,fastqsanger,fastqsanger.gz" label="Single-end input reads"/>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed, can we not just go with the multiple=true option?

tools/sylph/sylph_profile.xml Outdated Show resolved Hide resolved
tools/sylph/sylph_profile.xml Outdated Show resolved Hide resolved
@bgruening
Copy link
Owner

@tcollins2011 bluenote-1577/sylph-utils#2 seems to be resolved.

@tcollins2011
Copy link
Contributor Author

@bgruening Tests are updated again. I think everything should be resolved if you want to take a final look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants