KrisprKreme

An efficient command-line tool for CRISPR-CAS9 experiment design.

Disclaimer: The indices from the human genome take approx 2 days to build, if you want the prebuilt indices contact us

ssures11@jhu.edu pvsriv@cs.jhu.edu

Usage

The following steps were successfully executed in an Ubuntu Box - 64-bit 16.04 Xenial.

Step 1: Install the tool dependencies using

$ pip install -r requirements.txt

Step 2: Download the Gemtools aligner binaries from http://gemtools.github.io and add the Gemtools' bin folder to $PATH

Step 3: Download and install the ViennaRNA package from https://www.tbi.univie.ac.at/RNA

Step 4: For the scoring Machine Learning model we have the unfeaturied data in src/model/processed.json. This has to be converted to one-hot encoded featurized csv. In order to do that run

$ python src/model/featurize.py

Note: This will take some time

Step 5: Now that the data is featurized we need to create the Random Forest model by doing

$ python src/model/scoring_model.py create_model

Step 6: To easily test out a smaller genome, we have added the phi-X virus genome to source code in Phi-Genome folder. To build Bloom and GEM indices for this

$ python krispr_kreme.py build -p Phi-Genome

Step 7: To querying step happens in 2 steps: 1. The 20nt sgRNA sequence is queried in the Bloom Filter indices of all chunks (for phi-X virus it is only one chunk but for human genome each chunk is a chromosome). 2. The sequence is then aligned using GEM indices of only the chunks obtained from the previous step because only in those chunks the sequence is found.

To do this, run
```
$ python src/krispr_kreme.py query -s "GATGCTGTTCAACCACTAAT" -p Phi-Genome -o alignments-phi.json
```

Step 8: To do on-target and off-target scoring of the output of the query and alignment steps run

$ python src/krispr_kreme.py score -s "GATGCTGTTCAACCACTAAT" -p alignments-phi.json -o output-phi.json

Machine Learning Model Evaluation

To plot the ROC and Precision-Recall curves for the SVM model and the Random Forest model run

$ python src/model/scoring_model.py plot_curves

Deep Learning Model Evaluation

To run the training/validation process of the Deep Convolution Neural Network follow these steps:

Install Caffe by following the instructions https://github.com/BVLC/caffe/wiki/Ubuntu-16.04-or-15.10-Installation-Guide. Use the CPU mode of Caffe and make sure you do export $PATH=<caffe-root-path>/build/tools/:$PATH. To train the network

$ cd src/model/cnn
$ caffe train --solver=sgrna-dcnn-solver.prototxt

Acknowledgements

For our relative Cutting Frequence Determination (CFD) we have used the code provided by Doench et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744125/)
To generate hamming neighbors we have used Prof.Benjamin Langmead's code from his homework solutions
For indexing and querying the genome using Bloom Filter we have used https://pypi.python.org/pypi/pybloom

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Phi-Genome		Phi-Genome
datasets		datasets
docs		docs
results		results
src		src
test		test
.gitignore		.gitignore
README.md		README.md
alignments-phi.json		alignments-phi.json
alignments.json		alignments.json
output-phi.json		output-phi.json
output.json		output.json
output.map		output.map
output.sam		output.sam
query.fa		query.fa
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KrisprKreme

Usage

Machine Learning Model Evaluation

Deep Learning Model Evaluation

Acknowledgements

About

Releases

Packages

Contributors 4

Languages

srivathsapv/KrisprKreme

Folders and files

Latest commit

History

Repository files navigation

KrisprKreme

Usage

Machine Learning Model Evaluation

Deep Learning Model Evaluation

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages