-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add checkm2 #6542
base: main
Are you sure you want to change the base?
add checkm2 #6542
Conversation
description: Rapid assessment of genome bin quality using machine learning | ||
long_description: Enhanced version of checkm, using machine learning models for greater speed and accuracy | ||
homepage_url: https://github.com/chklovski/CheckM2 | ||
remote_repository_url: https://github.com/galaxyproject/tools-iuc/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be more precise pointing to the folder.
tools/checkm2/checkm2.xml
Outdated
<command detect_errors="exit_code"><![CDATA[ | ||
mkdir input_dir && | ||
#for $i, $file in enumerate($input): | ||
cp $file input_dir/${file.element_identifier}.dat && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single-quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can we symlink?
element_identifier
needs cleaning (eg usingre.sub
).
tools/checkm2/checkm2.xml
Outdated
<when value="no"/> | ||
<when value="yes"> | ||
<!-- It's not all numbers and there's a check internally if it's in a specific list, so it had to be spelled out --> | ||
<param argument="ttable" type="select" label="Prodigal table"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful for the user to tell what those numbers mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the code table names for the text https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
tools/checkm2/checkm2.xml
Outdated
#end if | ||
-x .dat | ||
--threads "\${GALAXY_SLOTS:-1}" | ||
--database_path "\${CHECKM2_DB_PATH:-$__tool_directory__/tool-data/CheckM2_database/uniref100.KO.1.dmnd}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this database stable forever and will not change? If those databases update over time, we need a location file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent timing: One of my users just asked for the tool :)
Could contribute a data manager.
tools/checkm2/checkm2.xml
Outdated
<token name="@IDX_DATA_TABLE@">checkm2_db_versioned</token> | ||
</macros> | ||
<xrefs> | ||
<xref type="bio.tools">dada2</xref> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkm2
tools/checkm2/checkm2.xml
Outdated
<command detect_errors="exit_code"><![CDATA[ | ||
mkdir input_dir && | ||
#for $i, $file in enumerate($input): | ||
cp $file input_dir/${file.element_identifier}.dat && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can we symlink?
element_identifier
needs cleaning (eg usingre.sub
).
tools/checkm2/checkm2.xml
Outdated
<option value="--specific">Force the use of the specific quality prediction model (neural network)</option> | ||
<option value="--allmodels">Output quality prediction for both models for each genome.</option> | ||
</param> | ||
<conditional name="ttable_manual"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use a conditional. Instead an optional select be used.
tools/checkm2/checkm2.xml
Outdated
<when value="no"/> | ||
<when value="yes"> | ||
<!-- It's not all numbers and there's a check internally if it's in a specific list, so it had to be spelled out --> | ||
<param argument="ttable" type="select" label="Prodigal table"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the code table names for the text https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
tools/checkm2/checkm2.xml
Outdated
#end if | ||
-x .dat | ||
--threads "\${GALAXY_SLOTS:-1}" | ||
--database_path $database.fields.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should be single quoted.
<inputs> | ||
<param name="input" type="data" format="fasta" label="Input MAG/SAG datasets" multiple="true"/> | ||
|
||
<param name="database" type="select" label="Select reference genome" help="Checkm2 Diamond database"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be of interest if users upload their own dmnd databases (https://github.com/galaxyproject/tools-iuc/blob/main/tools/diamond/diamond_makedb.xml)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the wording on their repo, I was under the impression you needed to use their specific diamond db?
<outputs> | ||
<data name="quality" label="${tool.name} on ${on_string}: Quality report" format="tabular" from_work_dir="output/quality_report.tsv"/> | ||
<collection name="protein_files" label="${tool.name} on ${on_string}: protein files" type="list"> | ||
<discover_datasets pattern="__name__" format="fasta" directory="output/protein_files"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The extension of the files will be part of the element identfiers. Should we remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a difference between ext
and format
(see below)
remove dbkey column rename tables
mkdir input_dir && | ||
#for $i, $file in enumerate($input): | ||
#set $cleaned = re.sub('[^\s\w\-\\.]', '_', str($file.element_identifier)) | ||
ln -s $file input_dir/${cleaned}.dat && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ln -s $file input_dir/${cleaned}.dat && | |
ln -s '$file' input_dir/${cleaned}.dat && |
#The <version> column indicates the checkm2 version that generated the database | ||
|
||
# | ||
#diamond_db_1.0.2 Diamond database /mnt/galaxyIndices/Checkm2_database/uniref100.KO.1.dmnd 1.0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the version before the path? I guess this is what we do in other location files.
<collection name="protein_files" label="${tool.name} on ${on_string}: protein files" type="list"> | ||
<discover_datasets pattern="__name__" format="fasta" directory="output/protein_files"/> | ||
</collection> | ||
<collection name="diamond_files" label="${tool.name} on ${on_string}: Diamond files" type="list"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add here that this is of type tabular
<outputs> | ||
<data name="quality" label="${tool.name} on ${on_string}: Quality report" format="tabular" from_work_dir="output/quality_report.tsv"/> | ||
<collection name="protein_files" label="${tool.name} on ${on_string}: protein files" type="list"> | ||
<discover_datasets pattern="__name__" format="fasta" directory="output/protein_files"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a difference between ext
and format
(see below)
FOR CONTRIBUTOR: