mimic3_phenotyping

This is the source code for a project I did at the end of 2016 which applied some machine learning techniques (mostly unsupervised learning) to the MIMIC-III Critical Care Database. This project was for a course I took as part of my CS masters: CSE 8803 - Big Data Analytics for Healthcare.

The paper describing this work in more detail is: https://arxiv.org/abs/1612.08425

Requirements

The MIMIC-III dataset
SBT (Scala Build Tools) >= 0.13; other versions may work, but I have not tried them.
Apache Spark
Python 2.7 or 3.x, and the following packages (pip versions should be fine):
- Keras and ideally a GPU-enabled backend (Theano or TensorFlow)
- h5py (if you want to save and load trained networks from Keras)
- scikit-learn
- pydot-ng (optional)

Building

sbt compile should handle pulling dependencies and building everything. sbt package should produce a JAR that spark-submit can handle.

pip install keras h5py scikit-learn pydot-ng should handle the Python prerequisites, but note that you may need to configure Keras or its backend further in order to have GPU acceleration.

Example run

To produce what was in the paper, run the below commands from the same directory as the code. For the first command, you will need to supply two paths: the path containing the .csv.gz files from MIMIC-III (for the -i option), and the full path to the data directory in this archive (for the -o option).

spark-submit --master "local[*]" \
    --repositories https://oss.sonatype.org/content/groups/public/ \
    --packages "com.github.scopt:scopt_2.11:3.5.0" \
    target/scala-2.11/mimic3_phenotyping_2.11-1.0.jar \
    -i "file:////mnt/dev/mimic3/" \
    -o "file:///home/hodapp/source/bd4h-project-code/data/" \
    -m -c -r -b --icd9a 428 --icd9b 571 -l "1742-6"

python timeseries_plots.py -d ./data -o ./data \
    --icd9a 428 --icd9b 571 --loinc 1742-6
    
python feature_learning.py -d ./data -o ./data \
    --icd9a 428 --icd9b 571 --loinc 1742-6 \
    --activity_l1 0.0001 --weight_l2 0.001 \
    --load_model 428_571_1742-6.h5 --tsne --logistic_regression

The spark-submit command still sometimes exhibits an issue in which it completes the job but fails to return to the prompt. Check Spark's web UI (i.e. http://localhost:4040) for all jobs actually being done.

For expediency, this will skip hyperparameter optimization (which can take 20-30 minutes depending on machine) and use hyperparameters already estimated, and it will use weights from a pre-trained neural network instead of training it. To actually run through the full process, add -h to the first command, and remove the --load_model option from the feature_learning invocation.

All output will be in the data directory in PNG and EPS format. This will include CSV and Parquet files from the Spark code, and PNG and EPS files from the Python code.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
data		data
ec2		ec2
project		project
src/main/scala/mimic3_phenotyping		src/main/scala/mimic3_phenotyping
.gitignore		.gitignore
428_571_1742-6.h5		428_571_1742-6.h5
428_584_1742-6.h5		428_584_1742-6.h5
README.md		README.md
build.sbt		build.sbt
feature_learning.py		feature_learning.py
timeseries_plots.py		timeseries_plots.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mimic3_phenotyping

Requirements

Building

Example run

About

Releases

Packages

Languages

Hodapp87/mimic3_phenotyping

Folders and files

Latest commit

History

Repository files navigation

mimic3_phenotyping

Requirements

Building

Example run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages