Skip to content

Latest commit

 

History

History
80 lines (60 loc) · 2.54 KB

README.md

File metadata and controls

80 lines (60 loc) · 2.54 KB

MEDOBO

Automatically tagging MEDLINE abstracts with OBO ontologies

MEDOBO Schema

Step 1: Processing UMLS

Download the UMLS bulk after acquiring the licence (e.g., umls-2022AA-full.zip) and place it in the 'umls' folder.

Run the following commands in the console one by one.

unzip umls-2022AA-full.zip

mkdir META
mkdir NET
unzip 2022AA-full/2022aa-1-meta.nlm
unzip 2022AA-full/2022aa-2-meta.nlm
unzip 2022AA-full/2022aa-otherks.nlm

gunzip 2022AA/META/MRCONSO.RRF.aa.gz
gunzip 2022AA/META/MRCONSO.RRF.ab.gz
gunzip 2022AA/META/MRCONSO.RRF.ac.gz
cat 2022AA/META/MRCONSO.RRF.aa 2022AA/META/MRCONSO.RRF.ab 2022AA/META/MRCONSO.RRF.ac > META/MRCONSO.RRF

gunzip 2022AA/META/MRDEF.RRF.gz
mv 2022AA/META/MRDEF.RRF META/

gunzip 2022AA/META/MRSTY.RRF.gz
mv 2022AA/META/MRSTY.RRF META/

mv 2022AA/NET/SRDEF NET/
mv 2022AA/NET/SRSTRE1 NET/

gunzip 2022AA/META/MRXNS_ENG.RRF.aa.gz
gunzip 2022AA/META/MRXNS_ENG.RRF.ab.gz
cat 2022AA/META/MRXNS_ENG.RRF.aa 2022AA/META/MRXNS_ENG.RRF.ab > META/MRXNS_ENG.RRF

gunzip 2022AA/META/MRXNW_ENG.RRF.aa.gz
gunzip 2022AA/META/MRXNW_ENG.RRF.ab.gz
gunzip 2022AA/META/MRXNW_ENG.RRF.ac.gz
cat 2022AA/META/MRXNW_ENG.RRF.aa 2022AA/META/MRXNW_ENG.RRF.ab 2022AA/META/MRXNW_ENG.RRF.ac > META/MRXNW_ENG.RRF

Step 2: Create an environment

$ conda create -n medobo python=3.6
$ conda activate medobo
(medobo)$ pip install -r requirements.txt

Step 3: Get OBO ontologies

Download OBO ontologies as a folder and place in the root of project

Step 4: Processing MEDLINE

(medobo)$ python dataset.py 

Or download the preprocessed data from the Switch drived (for replication purposes, please make sure not to generate a new dataset, instead download the official splits and contents from Switch drive)

Step 5: Pre-processing OBO

(medobo)$ python chi_sqaure.py 

Step 6: Download embeddings

Download BioASK embedding, unzip and place it in 'Resources' folder

Naive Bayes baseline

(medobo)$ python main_NB.py <num_of_training_data>
(medobo)$ python main_NB.py 100000

Deep learning baseline

(medobo)$ python main_DL.py <num_of_training_data> <num_of_features>
(medobo)$ python main_DL.py 100000 50000