Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retip tool suit wrappers #13

Merged
merged 8 commits into from
Sep 30, 2020
16 changes: 16 additions & 0 deletions tools/retip/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
categories:
- Metabolomics
owner: "recetox"
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/retip"
homepage_url: "https://github.com/PaoloBnn/Retip"
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "The tool {{ tool_name }} from the Retip tool suite."
suite:
name: "suite_retip"
description: "A suite of Retip (Retention Time Prediction for metabolomics) tools."
long_description: |
"Retip is an R package for predicting Retention Time (RT) for small molecules in a
high pressure liquid chromatography (HPLC) Mass Spectrometry analysis. Retention
time calculation can be useful in identifying unknowns and removing false positive annotations.
37 changes: 37 additions & 0 deletions tools/retip/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<macros>
<token name="@TOOL_VERSION@">0.5.4</token>
<xml name="requirements">
<requirements>
<container type="docker">recetox/retip:@TOOL_VERSION@-recetox0</container>
</requirements>
</xml>
<xml name="citations">
<citations>
<citation type="doi">https://doi.org/10.1021/acs.analchem.9b05765</citation>
</citations>
</xml>
<token name="@HELP@"><![CDATA[
Retip is an R package for predicting Retention Time (RT) for small molecules in a high pressure liquid
chromatography (HPLC) Mass Spectrometry analysis. Retention time calculation can be useful in identifying
unknowns and removing false positive annotations. It uses five different machine learning algorithms to built a
stable, accurate and fast RT prediction model:

- Random Forest: a decision tree algorithms
- BRNN: Bayesian Regularized Neural Network
- XGBoost: an extreme Gradient Boosting for tree algorithms
- lightGBM: a gradient boosting framework that uses tree based learning algorithms.
- Keras: a high-level neural networks API for Tensorflow

Retip also includes useful biochemical databases like: BMDB, ChEBI, DrugBank, ECMDB, FooDB, HMDB, KNApSAcK,
PlantCyc, SMPDB, T3DB, UNPD, YMDB and STOFF.

**Get started**

To use Retip, a user needs to prepare a compound retention time library. The input file
needs compound Name, InChiKey, SMILES code and experimental retention time information for each compound.
The input must be a CSV file. Retip will use this input file to build a the model and will predict
retention times for other biochemical databases or an input query list of compounds. It is suggested that
the file has at least 300 compounds to build a good retention time prediction model.
]]>
</token>
</macros>
37 changes: 37 additions & 0 deletions tools/retip/retip_apply.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<tool id="retip_apply" name="Retip prediction" version="@TOOL_VERSION@+galaxy0">
<description>is retention time predictor for Metabolomics</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
/run.sh spell.R '$descr_train' '$model_hdf5' '$input_smiles' 'output.tsv'
]]>
</command>
<inputs>
<param name="descr_train" label="Select Descriptors.Feather Dataset" type="data" format="h5"
optional="false"/>
<param name="model_hdf5" label="Select Model.hdf5 Dataset" type="data" format="h5" optional="false"/>
<param name="input_smiles" label="Select Input Dataset" type="data" format="tabular" optional="false"/>
</inputs>
<outputs>
<data format="tabular" name="output1" label="Predicted RT" from_work_dir="output.tsv"/>
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="descr_train" value="descriptors.feather"/>
<param name="model_hdf5" value="model.hdf5"/>
<param name="input_smiles" value="input.tsv"/>
<output name="output1" file="output.tsv" ftype="tabular"/>
</test>
</tests>
<help><![CDATA[
.. class:: infomark

This tool is used for **Retention Time Prediction** on a whole database.

@HELP@
]]>
</help>
<expand macro="citations"/>
</tool>
33 changes: 33 additions & 0 deletions tools/retip/retip_descriptors.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<tool id="retip_descriptors" name="Retip chemical descriptors" version="@TOOL_VERSION@+galaxy0">
<description>for retention time prediction</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
/run.sh chemdesc.R '$compounds' 'descriptors.feather'
]]>
</command>
<inputs>
<param name="compounds" label="Select Compounds Dataset" type="data" format="tabular" optional="false"/>
</inputs>
<outputs>
<data format="h5" name="output1" label="Descriptors.Feather Dataset"
from_work_dir="descriptors.feather"/>
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="compounds" value="compounds-small.tsv"/>
<output name="output1" file="descriptors.feather" ftype="h5"/>
</test>
</tests>
<help><![CDATA[
.. class:: infomark

This tool **computes chemical descriptors** with CDK a JAVA based open source project aimed at cheminformatics.

@HELP@
]]>
</help>
<expand macro="citations"/>
</tool>
33 changes: 33 additions & 0 deletions tools/retip/retip_train.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<tool id="retip_train" name="Retip training" version="@TOOL_VERSION@+galaxy0">
<description>the Keras model to predict retention times</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
/run.sh trainKeras.R '$descr_train' 'model.hdf5'
]]>
</command>
<inputs>
<param name="descr_train" label="Select Descriptors.Feather Dataset" type="data" format="h5"
optional="false"/>
</inputs>
<outputs>
<data format="h5" name="output1" label="Model.hdf5 Dataset" from_work_dir="model.hdf5"/>
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="descr_train" value="descriptors.feather"/>
<output name="output1" file="model.hdf5" ftype="h5" lines_diff="2"/>
</test>
</tests>
<help><![CDATA[
.. class:: infomark

This tool uses ALMA mater: Advanced Learning Machine Algorithms to **train models**.

@HELP@
]]>
</help>
<expand macro="citations"/>
</tool>
80 changes: 80 additions & 0 deletions tools/retip/test-data/compounds-small.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Name InChIKey SMILES RT
Withanone FAZIYUIDUNHZRG-UHFFFAOYNA-N CC(C1CC(C)=C(C)C(=O)O1)C1(O)CCC2C3C4OC4C4(O)CC=CC(=O)C4(C)C3CCC12C 6.82
Corosolic acid HFGSQOYIOKBQOW-UHFFFAOYNA-N CC1CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1C)C(O)=O 9.89
Maslinic acid MDZKJHQSJHYOHJ-UHFFFAOYNA-N CC1(C)CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1)C(O)=O 9.77
Soyasapogenol A CDDWAYFUFNQLRZ-UHFFFAOYNA-N CC1(C)CC2C3=CCC4C5(C)CCC(O)C(C)(CO)C5CCC4(C)C3(C)CCC2(C)C(O)C1O 8.94
Ginsenoside Rh3 PHLXREOMFNVWOH-UHFFFAOYNA-N CC(C)=CCC=C(C)C1CCC2(C)C1C(O)CC1C3(C)CCC(OC4OC(CO)C(O)C(O)C4O)C(C)(C)C3CCC21C 7.63
Ginsenoside compound K FVIZARNDLVOMSU-UHFFFAOYNA-N CC(C)=CCCC(C)(OC1OC(CO)C(O)C(O)C1O)C1CCC2(C)C1C(O)CC1C3(C)CCC(O)C(C)(C)C3CCC21C 9.51
Ginsenoside F1 XNGXWSFSJIQMNC-UHFFFAOYNA-N CC(C)=CCCC(C)(OC1OC(CO)C(O)C(O)C1O)C1CCC2(C)C1C(O)CC1C3(C)CCC(O)C(C)(C)C3C(O)CC21C 6.59
alpha-Hederin KEOITPILCOILGM-UHFFFAOYNA-N CC1OC(OC2C(O)C(O)COC2OC2CCC3(C)C(CCC4(C)C3CC=C3C5CC(C)(C)CCC5(CCC43C)C(O)=O)C2(C)CO)C(O)C(O)C1O 7.96
Ginsenoside Rg5 NJUXRKMKOFXMRX-UHFFFAOYNA-N CC(C)=CCC=C(C)C1CCC2(C)C1C(O)CC1C3(C)CCC(OC4OC(CO)C(O)C(O)C4OC4OC(CO)C(O)C(O)C4O)C(C)(C)C3CCC21C 9.63
Ginsenoside F3 HJRVLGWTJSLQIG-UHFFFAOYNA-N CC(C)=CCCC(C)(OC1OC(COC2OCC(O)C(O)C2O)C(O)C(O)C1O)C1CCC2(C)C1C(O)CC1C3(C)CCC(O)C(C)(C)C3C(O)CC21C 6.11
Ginsenoside Rb2 NODILNFGTFIURN-UHFFFAOYNA-N CC(C)=CCCC(C)(OC1OC(COC2OCC(O)C(O)C2O)C(O)C(O)C1O)C1CCC2(C)C1C(O)CC1C3(C)CCC(OC4OC(CO)C(O)C(O)C4OC4OC(CO)C(O)C(O)C4O)C(C)(C)C3CCC21C 6.14
Ginsenoside Rb1 GZYPWOGIYAIIPV-UHFFFAOYNA-N CC(C)=CCCC(C)(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)C1CCC2(C)C1C(O)CC1C3(C)CCC(OC4OC(CO)C(O)C(O)C4OC4OC(CO)C(O)C(O)C4O)C(C)(C)C3CCC21C 5.88
Saikosaponin D KYWSCMDFVARMPN-UHFFFAOYNA-N CC1OC(OC2CCC3(C)C(CCC4(C)C3C=CC35OCC6(CCC(C)(C)CC36)C(O)CC45C)C2(C)CO)C(O)C(OC2OC(CO)C(O)C(O)C2O)C1O 8.25
Licoricesaponin H2 LPLVUJXQOOQHMX-UHFFFAOYNA-N CC1(C)C(CCC2(C)C1CCC1(C)C2C(=O)C=C2C3CC(C)(CCC3(C)CCC12C)C(O)=O)OC1OC(C(O)C(O)C1OC1OC(C(O)C(O)C1O)C(O)=O)C(O)=O 7.03
Saikosaponin C PYJMYPPFWASOJX-UHFFFAOYNA-N CC1OC(OC2C(COC3OC(CO)C(O)C(O)C3O)OC(OC3CCC4(C)C(CCC5(C)C4C=CC4=C6CC(C)(C)CCC6(CO)C(O)CC54C)C3(C)C)C(O)C2O)C(O)C(O)C1O 6.29
Ginsenoside Ro NFZYDZXHKFHPGA-UHFFFAOYNA-N CC1(C)CCC2(CCC3(C)C(=CCC4C5(C)CCC(OC6OC(C(O)C(O)C6OC6OC(CO)C(O)C(O)C6O)C(O)=O)C(C)(C)C5CCC34C)C2C1)C(=O)OC1OC(CO)C(O)C(O)C1O 6.18
Asiaticoside WYQVAPGDARQUBT-UHFFFAOYNA-N CC1CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(CO)C5CCC34C)C2C1C)C(=O)OC1OC(COC2OC(CO)C(OC3OC(C)C(O)C(O)C3O)C(O)C2O)C(O)C(O)C1O 5.21
Madecassoside BNMGUJRJUUDLHW-UHFFFAOYNA-N CC1CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(CO)C5C(O)CC34C)C2C1C)C(=O)OC1OC(COC2OC(CO)C(OC3OC(C)C(O)C(O)C3O)C(O)C2O)C(O)C(O)C1O 4.88
Chrysanthellin B WNGIVKPPGCCJNP-UHFFFAOYNA-N CC1OC(OC2C(O)COC(OC3C(C)OC(OC4C(O)C(O)COC4OC(=O)C45CCC(C)(C)CC4C4=CCC6C7(C)CCC(OC8OC(CO)C(O)C(O)C8O)C(C)(CO)C7CCC6(C)C4(C)CC5O)C(O)C3O)C2O)C(O)C(O)C1O 5.79
Hederacoside C RYHDIBJJJRNDSX-UHFFFAOYNA-N CC1OC(OC2C(O)C(O)C(OCC3OC(OC(=O)C45CCC(C)(C)CC4C4=CCC6C7(C)CCC(OC8OCC(O)C(O)C8OC8OC(C)C(O)C(O)C8O)C(C)(CO)C7CCC6(C)C4(C)CC5)C(O)C(O)C3O)OC2CO)C(O)C(O)C1O 5.38
Lyalosidic acid UZLBTLIRYSYTRG-UHFFFAOYNA-N OCC1OC(OC2OC=C(C(CC3=NC=CC4=C3NC3=C4C=CC=C3)C2C=C)C(O)=O)C(O)C(O)C1O 3.72
5(S)-5-carboxystrictosidine LHKZIVMTXZLOTP-UHFFFAOYNA-N COC(=O)C1=COC(OC2OC(CO)C(O)C(O)C2O)C(C=C)C1CC1NC(CC2=C1NC1=C2C=CC=C1)C(O)=O 4.13
Thalsimine YWNUNVSMOKMJMG-UHFFFAOYNA-N COC1=CC=C2CC3N(C)CCC4=C3C(OC3=CC5=C(CCN=C5CC5=CC=C(OC1=C2)C=C5)C=C3OC)=C(OC)C(OC)=C4OC 4.25
Isohernandezine FUZMQNZACIFDBL-UHFFFAOYNA-N COC1=CC=C2CC3N(C)CCC4=C3C(OC3=CC5=C(CCN(C)C5CC5=CC=C(OC1=C2)C=C5)C=C3OC)=C(OC)C(OC)=C4OC 4.41
3.4,5-Trihydroxystilbene LUKBXSAWLPMMSZ-UHFFFAOYSA-N OC1=CC=C(C=CC2=CC(O)=CC(O)=C2)C=C1 5
trans-pterostilbene VLEUZFDZJKSGMX-UHFFFAOYSA-N COC1=CC(C=CC2=CC=C(O)C=C2)=CC(OC)=C1 8.02
E-Resveratrol trimethyl ether GDHNBPHYVRHYCC-UHFFFAOYSA-N COC1=CC=C(C=CC2=CC(OC)=CC(OC)=C2)C=C1 9.6
Triacetyl resveratrol PDAYUJSOJIMKIS-UHFFFAOYSA-N CC(=O)OC1=CC=C(C=CC2=CC(OC(C)=O)=CC(OC(C)=O)=C2)C=C1 8.46
trans-piceid HSTZMXCBWJGKHG-UHFFFAOYNA-N OCC1OC(OC2=CC(C=CC3=CC=C(O)C=C3)=CC(O)=C2)C(O)C(O)C1O 4.05
Pseudojervine HYDDDNUKNMMWBD-UHFFFAOYNA-N CC1C2NCC(C)CC2OC11CCC2C3CC=C4CC(CCC4(C)C3C(=O)C2=C1C)OC1OC(CO)C(O)C(O)C1O 4.01
Digitonin UVYVLBIGDKGWPX-UHFFFAOYNA-N CC1C2C(OC11CCC(C)CO1)C(O)C1C3CCC4CC(OC5OC(CO)C(OC6OC(CO)C(O)C(OC7OCC(O)C(O)C7O)C6OC6OC(CO)C(O)C(OC7OC(CO)C(O)C(O)C7O)C6O)C(O)C5O)C(O)CC4(C)C3CCC21C 6.99
N-Acetylsolasodine JXAZKNVJWYDQJY-UHFFFAOYNA-N CC1C2C(CC3C4CC=C5CC(O)CCC5(C)C4CCC23C)OC11CCC(C)CN1C(C)=O 7.36
O-Acetylsolasodine MCQNPWNREVNWDQ-UHFFFAOYNA-N CC1C2C(CC3C4CC=C5CC(CCC5(C)C4CCC23C)OC(C)=O)OC11CCC(C)CN1 8.16
Bergenin YWJXCIXBAKGUKZ-UHFFFAOYNA-N COC1=C(O)C2=C(C=C1O)C(=O)OC1C(O)C(O)C(CO)OC21 2.86
Hydrocotarnine XXANNZJIZQTCBP-UHFFFAOYSA-N COC1=C2CN(C)CCC2=CC2=C1OCO2 3.33
(-)-B-Hydrastine JZUTXVTYJDCMDU-UHFFFAOYNA-N COC1=C(OC)C2=C(C=C1)C(OC2=O)C1N(C)CCC2=C1C=C1OCOC1=C2 4.21
4-Aminoantipyrin RLFWWDJHLFCNIJ-UHFFFAOYSA-N CN1N(C(=O)C(N)=C1C)C1=CC=CC=C1 2.76
Vanillin acetate PZSJOBKRSVRODF-UHFFFAOYSA-N COC1=C(OC(C)=O)C=CC(C=O)=C1 5.59
4-Hydroxyquinoline PMZDQRJGMBOQBF-UHFFFAOYSA-N OC1=CC=NC2=C1C=CC=C2 2.93
Schizandrin YEFOAORQXAOVJQ-UHFFFAOYNA-N COC1=CC2=C(C(OC)=C1OC)C1=C(OC)C(OC)=C(OC)C=C1CC(C)(O)C(C)C2 7.59
Iso-gamma-fagarine VNBUMBNLPGLBML-UHFFFAOYSA-N COC1=CC=CC2=C1N(C)C1=C(C=CO1)C2=O 5.59
Atractylenolide III FBMORZZOJSDNRQ-UHFFFAOYNA-N CC1=C2CC3C(=C)CCCC3(C)CC2(O)OC1=O 7.91
Amygdalin XUCIJNAGGSZNQT-UHFFFAOYNA-N OCC1OC(OCC2OC(OC(C#N)C3=CC=CC=C3)C(O)C(O)C2O)C(O)C(O)C1O 3.32
Procaine MFDFERRIHVXMIY-UHFFFAOYSA-N CCN(CC)CCOC(=O)C1=CC=C(N)C=C1 2.75
Aristolochic acid B MEEXETVZNQYRSP-UHFFFAOYSA-N OC(=O)C1=CC2=C(OCO2)C2=C1C(=CC1=CC=CC=C21)[N+]([O-])=O 7.55
Aristolochic acid C NBFGYDJKTHENDP-UHFFFAOYSA-N OC(=O)C1=CC2=C(OCO2)C2=C1C(=CC1=CC=C(O)C=C21)[N+]([O-])=O 5.96
Gallic acid LNTHITQWFMADLM-UHFFFAOYSA-N OC(=O)C1=CC(O)=C(O)C(O)=C1 2.04
Paeonol UILPJVPSNHJFIK-UHFFFAOYSA-N COC1=CC=C(C(C)=O)C(O)=C1 6.57
Oxyacanthine HGNHIFJNOKGSKI-UHFFFAOYNA-N COC1=CC2=C3C(CC4=CC=C(OC5=CC(CC6N(C)CCC7=CC(OC)=C(OC3=C1OC)C=C67)=CC=C5O)C=C4)N(C)CC2 3.76
Thalsimidine CLDCTFPNFRITPI-UHFFFAOYNA-N COC1=CC=C2CC3N(C)CCC4=C(O)C(OC)=C(OC)C(OC5=C(OC)C=C6CCN=C(CC7=CC=C(OC1=C2)C=C7)C6=C5)=C34 3.96
(S.S)-(+)-Tetrandrine WVTKBKWTSCPRNU-UHFFFAOYNA-N COC1=CC=C2CC3N(C)CCC4=C3C(OC3=CC5=C(CCN(C)C5CC5=CC=C(OC1=C2)C=C5)C=C3OC)=C(OC)C(OC)=C4 4,19
Seco-isolariciresinol diglucoside SBVBJPHMDABKJV-UHFFFAOYNA-N COC1=C(O)C=CC(CC(COC2OC(CO)C(O)C(O)C2O)C(COC2OC(CO)C(O)C(O)C2O)CC2=CC(OC)=C(O)C=C2)=C1 3.67
Enterolactone HVDGDHBAMCBBLR-UHFFFAOYNA-N OC1=CC(CC2COC(=O)C2CC2=CC(O)=CC=C2)=CC=C1 6.18
Matairesinol MATGKVZWFZHCLI-UHFFFAOYNA-N COC1=CC(CC2COC(=O)C2CC2=CC=C(O)C(OC)=C2)=CC=C1O 5.96
Arctigenin NQWVSMVXKMHKTF-UHFFFAOYNA-N COC1=CC=C(CC2COC(=O)C2CC2=CC=C(O)C(OC)=C2)C=C1OC 6.63
Enterodiol DWONJCNDULPHLV-UHFFFAOYNA-N OCC(CC1=CC(O)=CC=C1)C(CO)CC1=CC(O)=CC=C1 5.26
Secoisolariciresinol PUETUDUXMCLALY-UHFFFAOYNA-N COC1=CC(CC(CO)C(CO)CC2=CC=C(O)C(OC)=C2)=CC=C1O 4.76
Justicidin G VINGQMQXGDIELG-UHFFFAOYSA-N COC1=C2OCOC2=CC2=C(C3=C(C=C12)C(=O)OC3)C1=CC2=C(OCO2)C=C1 8.93
Jusmicranthin ethyl ether JJXCEOLNFSCNNE-UHFFFAOYNA-N CCOC1OC(=O)C2=C1C(C1=CC3=C(OCO3)C=C1)=C1C3=C(OCO3)C=CC1=C2 8.82
Carbazochrome sulfonate OZCACMPSTYQSMM-UHFFFAOYNA-N CN1C(CC2=CC(=NNC(O)=N)C(=O)C=C12)S(O)(=O)=O 2.69
Gelsenicine BIGABVPVCRHEES-UHFFFAOYNA-N CCC1=NC2CC3(C4CC1C2CO4)C(=O)N(OC)C1=C3C=CC=C1 3.99
Gramine OCDGBSUVYYVKQZ-UHFFFAOYSA-N CN(C)CC1=CNC2=CC=CC=C12 3.06
Koumine VTLYEMHGPMGUOT-UHFFFAOYNA-N CN1CC2(C=C)C3CC4OCC3C1CC21C4=NC2=CC=CC=C12 3.58
Gardneramine RIMDDIPKIZTBHU-UHFFFAOYNA-N COCC=C1CN2C3CC45C2CC1C3COC4=NC1=C5C(OC)=C(OC)C=C1OC 3.82
Gentiopicroside DUAGQYUORDTXOR-UHFFFAOYNA-N OCC1OC(OC2OC=C3C(=O)OCC=C3C2C=C)C(O)C(O)C1O 3.44
Swertiamarin HEYZWPRKKUGDCR-UHFFFAOYNA-N OCC1OC(OC2OC=C3C(=O)OCCC3(O)C2C=C)C(O)C(O)C1O 3.2
1-Isothiocyanato-4-(methylsulfinyl)-butane SUVMJBTUFCVSAD-UHFFFAOYNA-N CS(=O)CCCCN=C=S 3.69
1-Methylsulfinylbutenyl isothiocyante QKGJFQMGPDVOQE-UHFFFAOYNA-N CS(=O)C=CCCN=C=S 3.77
7-Methylsulfenylheptyl isothiocyanate LDIRGNDMTOGVRB-UHFFFAOYSA-N CSCCCCCCCN=C=S 9.19
Ginkgolide B SQOJOAFXDQDRGF-UHFFFAOYNA-N CC1C(=O)OC2C(O)C34C5CC(C(C)(C)C)C33C(O)C(=O)OC3OC4(C(=O)O5)C12O 5.36
Ginkgolide C AMOGMTLMADGEOQ-UHFFFAOYNA-N CC1C(=O)OC2C(O)C34C5OC(=O)C3(OC3OC(=O)C(O)C43C(C5O)C(C)(C)C)C12O 4.33
Gossypetin-8-C-glucoside SJRXVLUZMMDCNG-UHFFFAOYNA-N OCC1OC(OC2=C(O)C=C(O)C3=C2OC(=C(O)C3=O)C2=CC=C(O)C(O)=C2)C(O)C(O)C1O 4.64
isosakuranetin-7-O-neohesperidoside NLAWPKPYBMEWIR-UHFFFAOYNA-N COC1=CC=C(C=C1)C1CC(=O)C2=C(O1)C=C(OC1OC(CO)C(O)C(O)C1OC1OC(C)C(O)C(O)C1O)C=C2O 5.42
isosakuranetin-7-O-rutinoside RMCRQBAILCLJGU-UHFFFAOYNA-N COC1=CC=C(C=C1)C1CC(=O)C2=C(O)C=C(OC3OC(COC4OC(C)C(O)C(O)C4O)C(O)C(O)C3O)C=C2O1 5.33
Icariin TZJALUIVHRYQQB-UHFFFAOYNA-N COC1=CC=C(C=C1)C1=C(OC2OC(C)C(O)C(O)C2O)C(=O)C2=C(O1)C(CC=C(C)C)=C(OC1OC(CO)C(O)C(O)C1O)C=C2O 5.43
Kaempferol-3-O-robinoside-7-O-rhamnoside PEFASEPMJYRQBW-UHFFFAOYNA-N CC1OC(OCC2OC(OC3=C(OC4=C(C(O)=CC(OC5OC(C)C(O)C(O)C5O)=C4)C3=O)C3=CC=C(O)C=C3)C(O)C(O)C2O)C(O)C(O)C1O 3.55
Myricetin-3-O-xyloside SBEOEJNITMVWLK-UHFFFAOYNA-N OC1COC(OC2=C(OC3=CC(O)=CC(O)=C3C2=O)C2=CC(O)=C(O)C(O)=C2)C(O)C1O 3.9
Quercetin-3-O-vicianoside YNMFDPCLPIMRFD-UHFFFAOYNA-N OC1COC(OCC2OC(OC3=C(OC4=CC(O)=CC(O)=C4C3=O)C3=CC(O)=C(O)C=C3)C(O)C(O)C2O)C(O)C1O 3.7
Kaempferol-3-O-galactoside-6''-rhamnoside-3'''-rha UYVBMGULWGRDQT-UHFFFAOYNA-N CC1OC(OC2C(O)C(C)OC(OCC3OC(OC4=C(OC5=CC(O)=CC(O)=C5C4=O)C4=CC=C(O)C=C4)C(O)C(O)C3O)C2O)C(O)C(O)C1O 3.98
Binary file added tools/retip/test-data/descriptors.feather
Binary file not shown.
6 changes: 6 additions & 0 deletions tools/retip/test-data/input.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Name InChIKey SMILES
Withanone FAZIYUIDUNHZRG-UHFFFAOYNA-N CC(C1CC(C)=C(C)C(=O)O1)C1(O)CCC2C3C4OC4C4(O)CC=CC(=O)C4(C)C3CCC12C
Corosolic acid HFGSQOYIOKBQOW-UHFFFAOYNA-N CC1CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1C)C(O)=O
Maslinic acid MDZKJHQSJHYOHJ-UHFFFAOYNA-N CC1(C)CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1)C(O)=O
Soyasapogenol A CDDWAYFUFNQLRZ-UHFFFAOYNA-N CC1(C)CC2C3=CCC4C5(C)CCC(O)C(C)(CO)C5CCC4(C)C3(C)CCC2(C)C(O)C1O
Ginsenoside Rh3 PHLXREOMFNVWOH-UHFFFAOYNA-N CC(C)=CCC=C(C)C1CCC2(C)C1C(O)CC1C3(C)CCC(OC4OC(CO)C(O)C(O)C4O)C(C)(C)C3CCC21C
Binary file added tools/retip/test-data/model.hdf5
Binary file not shown.
6 changes: 6 additions & 0 deletions tools/retip/test-data/output.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Name InChIKey SMILES RTP
Withanone FAZIYUIDUNHZRG-UHFFFAOYNA-N CC(C1CC(C)=C(C)C(=O)O1)C2(O)CCC3C4C5OC5C6(O)CC=CC(=O)C6(C)C4CCC23C 5.61
Corosolic acid HFGSQOYIOKBQOW-UHFFFAOYNA-N CC1CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1C)C(O)=O 8.24
Maslinic acid MDZKJHQSJHYOHJ-UHFFFAOYNA-N CC1(C)CCC2(CCC3(C)C(=CCC4C5(C)CC(O)C(O)C(C)(C)C5CCC34C)C2C1)C(O)=O 8.31
Soyasapogenol A CDDWAYFUFNQLRZ-UHFFFAOYNA-N CC1(C)CC2C3=CCC4C5(C)CCC(O)C(C)(CO)C5CCC4(C)C3(C)CCC2(C)C(O)C1O 7.23
Ginsenoside Rh3 PHLXREOMFNVWOH-UHFFFAOYNA-N CC(C)=CCC=C(C)C1CCC2(C)C1C(O)CC3C4(C)CCC(OC5OC(CO)C(O)C(O)C5O)C(C)(C)C4CCC23C 7.57