Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add retip tool suit wrapper(s) #3

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
da5b70f
Add retip tool suit wrapper(s) for deployment into TTS.
smartx-usman Sep 7, 2020
73a5b87
add aplcms
xtracko Sep 17, 2020
890ce18
switch to public image
martenson Sep 17, 2020
5a9decb
add xmsannotator
xtracko Sep 18, 2020
93745d1
adjust tool version so it corresponds to the downstream software
martenson Sep 18, 2020
53bd11c
Changed galaxy xml file to use conda biotransformer package
trachtok Sep 21, 2020
0d0df54
update versioning to iuc standard
martenson Sep 22, 2020
459f828
Changed galaxy xml file to use conda biotransformer package
trachtok Sep 21, 2020
0771801
Use proper versioning semantics. Add basic tests and associated files…
smartx-usman Sep 22, 2020
feeab17
Correction of flake8 errors in Python wrapper
trachtok Sep 22, 2020
c6918a6
lint omport order
martenson Sep 22, 2020
8338173
Merge pull request #7 from RECETOX/biot_up
martenson Sep 22, 2020
53a1a67
Remove the unnecesarry data tables
Sep 23, 2020
d5516ef
Populate help section and adapt wrappers to the package API changes
Sep 23, 2020
9c62378
Merge pull request #6 from RECETOX/xmsannotator
xtracko Sep 23, 2020
3ab532b
Migrate towards HDF5 outputs
Sep 23, 2020
5305d1a
Merge pull request #5 from RECETOX/aplcms
xtracko Sep 24, 2020
f4d7aa1
Change error detection policy
Sep 24, 2020
277cb8e
Fix requerement
Sep 24, 2020
b6bca87
Merge pull request #9 from RECETOX/aplcms
xtracko Sep 24, 2020
f92295d
Merge pull request #10 from RECETOX/xmsannotator
xtracko Sep 24, 2020
ef7d82e
Cleaned xml wrapper and added test.
trachtok Sep 24, 2020
c98db22
Repaired broken xml tags
trachtok Sep 24, 2020
4ce18f4
More cleaning of xml wrapper.
trachtok Sep 25, 2020
9465281
Merge pull request #12 from RECETOX/biot_up
martenson Sep 25, 2020
bcd9f6f
Add retip tool suit wrapper(s) for deployment into TTS.
smartx-usman Sep 7, 2020
d30d338
Use proper versioning semantics. Add basic tests and associated files…
smartx-usman Sep 22, 2020
0532b80
Fix tool version.
smartx-usman Sep 29, 2020
11db27e
Remove lines diff for binary files.
smartx-usman Sep 29, 2020
171af41
Merge remote-tracking branch 'origin/master'
smartx-usman Sep 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions tools/aplcms/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
owner: recetox
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/aplcms"
homepage_url: "http://web1.sph.emory.edu/apLCMS/"
categories:
- Metabolomics
repositories:
recetox_aplcms_unsupervised:
description: "Generate a feature table from a batch of LC/MS spectra using apLCMS's Unsupervised method."
include:
- aplcms_unsupervised.xml
- aplcms_macros.xml
recetox_aplcms_hybrid:
description: "Generate a feature table from a batch of LC/MS spectra using apLCMS's Hybrid method."
include:
- aplcms_hybrid.xml
- aplcms_macros.xml
75 changes: 75 additions & 0 deletions tools/aplcms/aplcms_hybrid.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<tool id="recetox_aplcms_hybrid" name="apLCMS - Hybrid" version="@TOOL_VERSION@+galaxy0">
<macros>
<import>aplcms_macros.xml</import>
</macros>

<expand macro="requirements" />

<command detect_errors="aggressive"><![CDATA[
#set file_str = str('", "').join([str($f) for $f in $files])

Rscript
-e 'x <- apLCMS::hybrid(
files = c("$file_str"),
known_table = rhdf5::h5read("$known_table", "aplcms_known_table"),
min_exp = $noise_filtering.min_exp,
min_pres = $noise_filtering.min_pres,
min_run = $noise_filtering.min_run,
mz_tol = $noise_filtering.mz_tol,
baseline_correct = $noise_filtering.baseline_correct,
baseline_correct_noise_percentile = $noise_filtering.baseline_correct_noise_percentile,
intensity_weighted = $noise_filtering.intensity_weighted,
shape_model = "$feature_detection.shape_model",
BIC_factor = $feature_detection.BIC_factor,
peak_estim_method = "$feature_detection.peak_estim_method",
min_bandwidth = $feature_detection.min_bandwidth,
max_bandwidth = $feature_detection.max_bandwidth,
sd_cut = c($feature_detection.sd_cut_min, $feature_detection.sd_cut_max),
sigma_ratio_lim = c($feature_detection.sigma_ratio_lim_min, $feature_detection.sigma_ratio_lim_max),
component_eliminate = $feature_detection.component_eliminate,
moment_power = $feature_detection.moment_power,
align_chr_tol = $peak_alignment.align_chr_tol,
align_mz_tol = $peak_alignment.align_mz_tol,
max_align_mz_diff = $peak_alignment.max_align_mz_diff,
match_tol_ppm = $history_db.match_tol_ppm,
new_feature_min_count = $history_db.new_feature_min_count,
recover_mz_range = $weak_signal_recovery.recover_mz_range,
recover_chr_range = $weak_signal_recovery.recover_chr_range,
use_observed_range = $weak_signal_recovery.use_observed_range,
recover_min_count = $weak_signal_recovery.recover_min_count
)'
-e 'rhdf5::h5write(x\$final_peaks, "$peaks", "peaks")'
-e 'rhdf5::h5write(x\$aligned_peaks, "$peaks", "aligned_peaks")'
-e 'rhdf5::h5write(x\$corrected_features, "$peaks", "corrected_features")'
-e 'rhdf5::h5write(x\$extracted_features, "$peaks", "extracted_features")'
-e 'rhdf5::h5write(x\$aligned_mz_tolerance, "$peaks", "aligned_mz_tolerance")'
-e 'rhdf5::h5write(x\$aligned_rt_tolerance, "$peaks", "aligned_rt_tolerance")'
-e 'rhdf5::h5write(x\$updated_known_table, "$updated_known_table", "aplcms_known_table")'
]]></command>

<expand macro="inputs">
<expand macro="history_db" />
<expand macro="noise_filtering" />
<expand macro="feature_detection" />
<expand macro="peak_alignment" />
<expand macro="weak_signal_recovery" />
</expand>

<outputs>
<data name="peaks" format="h5" />
<data name="updated_known_table" format="h5" />
</outputs>

<help>
This is the Hybrid version of apLCMS which is incorporating the knowledge of known metabolites and historically
detected features on the same machinery to help detect and quantify lower-intensity peaks.

CAUTION: To use such knowledge, especially historical data, you must keep using (1) the same chromatography
system (otherwise the retention time will not match), and (2) the same type of samples with similar extraction
technique, such as human serum.

@GENERAL_HELP@
</help>

<expand macro="citations" />
</tool>
145 changes: 145 additions & 0 deletions tools/aplcms/aplcms_macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
<macros>
<token name="@TOOL_VERSION@">6.6.6</token>

<xml name="requirements">
<requirements>
<container type="docker">recetox/aplcms:latest</container>
</requirements>
</xml>

<xml name="inputs">
<inputs>
<param name="files" type="data" format="mzdata,mzml,mzxml,netcdf" multiple="true" min="3" label="data"
help="Mass spectrometry files for peak extraction." />
<yield />
</inputs>
</xml>

<xml name="history_db">
<param name="known_table" type="data" format="aplcms_history.feather" label="known_table"
help="A data table containing the known metabolite ions and previously found features. The table must contain these 18 columns: chemical_formula (optional), HMDB_ID (optional), KEGG_compound_ID (optional), neutral.mass (optional), ion.type (the ion form - optional), m.z (either theoretical or mean observed m/z value of previously found features), Number_profiles_processed (the total number of processed samples to build this database), Percent_found (the percentage of historically processed samples in which the feature appeared), mz_min (minimum observed m/z value), mz_max (maximum observed m/z value), RT_mean (mean observed retention time), RT_sd (standard deviation of observed retention time), RT_min (minimum observed retention time), RT_max (maximum observed retention time), int_mean.log. (mean observed log intensity), int_sd.log. (standard deviation of observed log intensity), int_min.log. (minimum observed log intensity), int_max.log. (maximum observed log intensity)." />
<section name="history_db" title="Known-Table settings">
<param name="match_tol_ppm" type="integer" optional="true" min="0" label="match_tol_ppm (optional)"
help="The ppm tolerance to match identified features to known metabolites/features." />
<param name="new_feature_min_count" type="integer" value="2" min="1" label="new_feature_min_count"
help="The minimum number of occurrences of a historically unseen (unknown) feature to add this feature into the database of known features." />
</section>
</xml>

<xml name="noise_filtering">
<section name="noise_filtering" title="Noise filtering and peak detection">
<param name="min_exp" type="integer" min="1" value="2"
label="min_exp"
help="If a feature is to be included in the final feature table, it must be present in at least this number of spectra." />
<param name="min_pres" type="float" value="0.5"
label="min_pres"
help="The minimum proportion of presence in the time period for a series of signals grouped by m/z to be considered a peak." />
<param name="min_run" type="float" value="12"
label="min_run"
help="The minimum length of elution time for a series of signals grouped by m/z to be considered a peak." />
<param name="mz_tol" type="float" value="1e-05"
label="mz_tol"
help="The m/z tolerance level for the grouping of data points. This value is expressed as the fraction of the m/z value. This value, multiplied by the m/z value, becomes the cutoff level. The recommended value is the machine's nominal accuracy level. Divide the ppm value by 1e6. For FTMS, 1e-5 is recommended." />
<param name="baseline_correct" type="float" value="0" label="baseline_correct"
help="After grouping the observations, the highest intensity in each group is found. If the highest is lower than this value, the entire group will be deleted. The default value is NA, in which case the program uses a percentile of the height of the noise groups. If given a value, the value will be used as the threshold, and baseline.correct.noise.percentile will be ignored." />
<param name="baseline_correct_noise_percentile" type="float" value="0.05"
label="baseline_correct_noise_percentile"
help="The percentile of signal strength of those EIC that don't pass the run filter, to be used as the baseline threshold of signal strength." />
<param name="intensity_weighted" type="boolean" checked="false" truevalue="TRUE" falsevalue="FALSE"
label="intensity_weighted"
help="Whether to weight the local density by signal intensities in initial peak detection." />
</section>
</xml>

<xml name="feature_detection">
<section name="feature_detection" title="Feature detection">
<param name="shape_model" type="select" display="radio"
label="shape_model"
help="The mathematical model for the shape of a peak. There are two choices - bi-Gaussian and Gaussian. When the peaks are asymmetric, the bi-Gaussian is better.">
<option value="Gaussian">Gaussian</option>
<option value="bi-Gaussian" selected="true">bi-Gaussian</option>
</param>
<param name="BIC_factor" type="float" value="2.0"
label="BIC_factor"
help="The factor that is multiplied on the number of parameters to modify the BIC criterion. If larger than 1, models with more peaks are penalized more." />
<param name="peak_estim_method" type="select" display="radio"
label="peak_estim_method"
help="The estimation method for the bi-Gaussian peak model. Two possible values: moment and EM.">
<option value="moment" selected="true">Moment</option>
<option value="EM">EM</option>
</param>
<param name="min_bandwidth" type="float" optional="true"
label="min_bandwidth (optional)"
help="The minimum bandwidth to use in the kernel smoother." />
<param name="max_bandwidth" type="float" optional="true"
label="max_bandwidth (optional)"
help="The maximum bandwidth to use in the kernel smoother." />
<param name="sd_cut_min" type="float" value="0.01"
label="sd_cut_min"
help="The minimum standard deviation of a feature to be not eliminated." />
<param name="sd_cut_max" type="float" value="500"
label="sd_cut_max"
help="The maximum standard deviation of a feature to be not eliminated." />
<param name="sigma_ratio_lim_min" type="float" value="0.01"
label="sigma_ratio_lim_min"
help="The lower limit of the believed ratio range between the left-standard deviation and the right-standard deviation of the bi-Gaussian function used to fit the data." />
<param name="sigma_ratio_lim_max" type="float" value="100"
label="sigma_ratio_lim_max"
help="The upper limit of the believed ratio range between the left-standard deviation and the right-standard deviation of the bi-Gaussian function used to fit the data." />
<param name="component_eliminate" type="float" value="0.01"
label="component_eliminate"
help="In fitting mixture of bi-Gaussian (or Gaussian) model of an EIC, when a component accounts for a proportion of intensities less than this value, the component will be ignored." />
<param name="moment_power" type="float" value="1"
label="moment_power"
help="The power parameter for data transformation when fitting the bi-Gaussian or Gaussian mixture model in an EIC." />
</section>
</xml>

<xml name="peak_alignment">
<section name="peak_alignment" title="Peak Alignment">
<param name="align_chr_tol" type="float" optional="true"
label="align_chr_tol (optional)"
help="The retention time tolerance level for peak alignment. The default is NA, which allows the program to search for the tolerance level based on the data." />
<param name="align_mz_tol" type="float" optional="true"
label="align_mz_tol (optional)"
help="The m/z tolerance level for peak alignment. The default is NA, which allows the program to search for the tolerance level based on the data. This value is expressed as the percentage of the m/z value. This value, multiplied by the m/z value, becomes the cutoff level." />
<param name="max_align_mz_diff" type="float" value="0.01"
label="max_align_mz_diff"
help="As the m/z tolerance is expressed in relative terms (ppm), it may not be suitable when the m/z range is wide. This parameter limits the tolerance in absolute terms. It mostly influences feature matching in higher m/z range." />
</section>
</xml>

<xml name="weak_signal_recovery">
<section name="weak_signal_recovery" title="Weak Signal Recovery">
<param name="recover_mz_range" type="float" optional="true"
label="recover_mz_range (optional)"
help="The m/z around the feature m/z to search for observations. The default value is NA, in which case 1.5 times the m/z tolerance in the aligned object will be used." />
<param name="recover_chr_range" type="float" optional="true"
label="recover_chr_range (optional)"
help="The retention time around the feature retention time to search for observations. The default value is NA, in which case 0.5 times the retention time tolerance in the aligned object will be used." />
<param name="use_observed_range" type="boolean" checked="true" truevalue="TRUE" falsevalue="FALSE"
label="use_observed_range"
help="If the value is true, the actual range of the observed locations of the feature in all the spectra will be used." />
<param name="recover_min_count" type="integer" value="3"
label="recover_min_count"
help="The minimum number of raw data points to be considered as a true feature." />
</section>
</xml>

<token name="@GENERAL_HELP@">
apLCMS is a software which generates a feature table from a batch of LC/MS spectra. The m/z and retention time
tolerance levels are estimated from the data. A run-filter is used to detect peaks and remove noise.
Non-parametric statistical methods are used to find-tune peak selection and grouping. After retention time
correction, a feature table is generated by aligning peaks across spectra. For further information on apLCMS
please refer to http://web1.sph.emory.edu/apLCMS.
</token>

<xml name="citations">
<citations>
<citation type="doi">10.1093/bioinformatics/btp291</citation>
<citation type="doi">10.1186/1471-2105-11-559</citation>
<citation type="doi">10.1021/pr301053d</citation>
<citation type="doi">10.1093/bioinformatics/btu430</citation>
</citations>
</xml>
</macros>
65 changes: 65 additions & 0 deletions tools/aplcms/aplcms_unsupervised.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<tool id="recetox_aplcms_unsupervised" name="apLCMS - Unsupervised" version="@TOOL_VERSION@+galaxy0">
<macros>
<import>aplcms_macros.xml</import>
</macros>

<expand macro="requirements" />

<command detect_errors="aggressive"><![CDATA[
#set file_str = str('", "').join([str($f) for $f in $files])

Rscript
-e 'x <- apLCMS::unsupervised(
files = c("$file_str"),
min_exp = $noise_filtering.min_exp,
min_pres = $noise_filtering.min_pres,
min_run = $noise_filtering.min_run,
mz_tol = $noise_filtering.mz_tol,
baseline_correct = $noise_filtering.baseline_correct,
baseline_correct_noise_percentile = $noise_filtering.baseline_correct_noise_percentile,
intensity_weighted = $noise_filtering.intensity_weighted,
shape_model = "$feature_detection.shape_model",
BIC_factor = $feature_detection.BIC_factor,
peak_estim_method = "$feature_detection.peak_estim_method",
min_bandwidth = $feature_detection.min_bandwidth,
max_bandwidth = $feature_detection.max_bandwidth,
sd_cut = c($feature_detection.sd_cut_min, $feature_detection.sd_cut_max),
sigma_ratio_lim = c($feature_detection.sigma_ratio_lim_min, $feature_detection.sigma_ratio_lim_max),
component_eliminate = $feature_detection.component_eliminate,
moment_power = $feature_detection.moment_power,
align_chr_tol = $peak_alignment.align_chr_tol,
align_mz_tol = $peak_alignment.align_mz_tol,
max_align_mz_diff = $peak_alignment.max_align_mz_diff,
recover_mz_range = $weak_signal_recovery.recover_mz_range,
recover_chr_range = $weak_signal_recovery.recover_chr_range,
use_observed_range = $weak_signal_recovery.use_observed_range,
recover_min_count = $weak_signal_recovery.recover_min_count
)'
-e 'rhdf5::h5write(x\$final_peaks, "$peaks", "peaks")'
-e 'rhdf5::h5write(x\$aligned_peaks, "$peaks", "aligned_peaks")'
-e 'rhdf5::h5write(x\$corrected_features, "$peaks", "corrected_features")'
-e 'rhdf5::h5write(x\$extracted_features, "$peaks", "extracted_features")'
-e 'rhdf5::h5write(x\$aligned_mz_tolerance, "$peaks", "aligned_mz_tolerance")'
-e 'rhdf5::h5write(x\$aligned_rt_tolerance, "$peaks", "aligned_rt_tolerance")'
]]></command>

<expand macro="inputs">
<expand macro="noise_filtering" />
<expand macro="feature_detection" />
<expand macro="peak_alignment" />
<expand macro="weak_signal_recovery" />
</expand>

<outputs>
<data name="peaks" format="h5" />
</outputs>

<help>
This is the Unsupervised version of apLCMS which is not relying on any existing knowledge about metabolites or
any historically detected features. For such functionality please use the Hybrid version of apLCMS.

@GENERAL_HELP@
</help>

<expand macro="citations" />
</tool>
Loading