- ClinicalTrials.gov
- AACT-CTTI-db
- NextMove LeadMine
- JensenLab Tagger.
- BioClients
- IDG Pharos/TCRD
- PubChem REST API
- ChEMBL REST API
- ChEMBL webresource client (Python client library).
- nextmove-tools
- AACT-CTTI database from Duke.
- CTTI = Clinical Trials Transformation Initiative
- AACT = Aggregate Analysis of ClinicalTrials.gov
- According to website (accessed June 2022), data is refreshed daily.
- AACT structure changed in November 2021, reflecting newer ClinicalTrials.gov API.
- Identify drugs by intervention ID, since may be multiple drugs per trial (NCT_ID).
- AACT Data Dictionary, which references https://prsinfo.clinicaltrials.gov/definitions.html and https://prsinfo.clinicaltrials.gov/results_definitions.html.
- The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty, Tasneem et al., https://doi.org/10.1371/journal.pone.0033677 (2012).
- The Clinical Trials Transformation Initiative. One Decade of Impact. One Vision Ahead, Clinical Trials (2018).
- How to avoid common problems when using ClinicalTrials.gov in research: 10 issues to consider, Tse et al., BMJ 2018; 361, https://doi.org/10.1136/bmj.k1452 (2018).
- See also: https://www.ctti-clinicaltrials.org/briefing-room/publications
- Chemical NER by NextMove LeadMine.
- Disease NER by JensenLab Tagger.
- Associate drugs with diseases/phenotypes.
- Associate drugs with protein targets.
- Associate protein targets with diseases/phenotypes (via drugs).
- Predict and score disease-target associations.
Drugs may be experimental candidates.
Table | Notes |
---|---|
studies | titles |
keywords | Reported; multiple vocabularies. |
brief_summaries | (max 5000 chars) |
detailed_descriptions | (max 32000 chars) |
conditions | diseases/phenotypes |
browse_conditions | MeSH links |
interventions | Our focus is drugs only among several types. |
browse_interventions | MeSH links |
intervention_other_names | synonyms |
study_references | PubMed links |
reported_events | including adverse events |
See top level script Go_tictac_Workflow.sh
.
- Data:
Go_aact_GetData.sh
- Fetch data from AACT db.Go_jensenlab_GetData.sh
- Fetch dictionary data from JensenLab.Go_pubmed-aact_GetData.sh
- Fetch referenced records from PubMed API.- Cross-references:
Go_pubchem_GetXrefs.sh
- PubChem IDs via APIs.Go_chembl_GetXrefs.sh
- ChEMBL IDs via APIs.- LeadMine (chemical NER):
Go_aact_NER_leadmine_chem.sh
- LeadMine NER, CT descriptions.Go_pubmed-aact_NER_leadmine_chem.sh
- LeadMine NER, referenced PubMed abstracts.- Tagger (disease NER):
Go_aact_NER_tagger_disease.sh
- Tagger NER, CT descriptions.Go_pubmed-aact_NER_tagger_disease.sh
- Tagger NER, referenced PubMed abstracts.- Results, analysis:
tictac.Rmd
- Results described and analyzed.
- keywords, conditions, studies and summaries: reported terms and free text which may be text mined for intended associations.
- descriptions: may be text mined for both the intended and other conditions, symptoms and phenotypic traits, which may be non-obvious from the study design.
- study_references: via PubMed, text mining of titles, abstracts can associate disease/phenotypes, protein targets, chemical entities and more. The "results_reference" type may include findings not anticipated in the design/protocol.
- interventions include drug names which can be recognized and mapped to standard IDs, a task for which NextMove LeadMine is particularly suited.
- LeadMine chemical NER also resolves entities to structures via SMILES, enabling downstream cheminformatics such as aggregation by chemical substructure and similarity.
Running NextMove Leadmine NER via nextmove-tools
.
$ java -jar ${LIBDIR}/unm_biocomp_nextmove-0.0.1-SNAPSHOT-jar-with-dependencies.jar
usage: LeadMine_Utils [-config <CFILE>] [-h] -i <IFILE> [-idcol <IDCOL>]
[-lbd <LBD>] [-max_corr_dist <MAX_CORR_DIST>] [-min_corr_entity_len
<MIN_CE_LEN>] [-min_entity_len <MIN_E_LEN>] [-o <OFILE>]
[-spellcorrect] [-textcol <TEXTCOL>] [-unquote] [-v]
LeadMine_Utils: NextMove LeadMine chemical entity recognition
-config <CFILE> Input configuration file
-h,--help Show this help.
-i <IFILE> Input file
-idcol <IDCOL> # of ID input column
-lbd <LBD> LeadMine look-behind depth
-max_corr_dist <MAX_CORR_DIST> LeadMine Max correction (Levenshtein)
distance
-min_corr_entity_len <MIN_CE_LEN> LeadMine Min corrected entity length
-min_entity_len <MIN_E_LEN> LeadMine Min entity length
-o <OFILE> Output file
-spellcorrect LeadMine spelling correction
-textcol <TEXTCOL> # of text/document input column
-unquote unquote quoted column
-v,--verbose Verbose.
$ tagcorpus
Usage: tagcorpus [OPTIONS]
Required Arguments
--types=filename
--entities=filename
--names=filename
Optional Arguments
--documents=filename Read input from file instead of from STDIN
--groups=filename
--type-pairs=filename Types of pairs that are allowed
--stopwords=filename
--local-stopwords=filename
--autodetect Turn autodetect on
--tokenize-characters Turn single-character tokenization on
--document-weight=1.00
--paragraph-weight=2.00
--sentence-weight=0.20
--normalization-factor=0.60
--threads=1
--out-matches=filename
--out-pairs=filename
--out-segments=filename