Skip to content


Repository files navigation


An efficient command-line tool for CRISPR-CAS9 experiment design.

Disclaimer: The indices from the human genome take approx 2 days to build, if you want the prebuilt indices contact us


The following steps were successfully executed in an Ubuntu Box - 64-bit 16.04 Xenial.

Step 1: Install the tool dependencies using

$ pip install -r requirements.txt

Step 2: Download the Gemtools aligner binaries from and add the Gemtools' bin folder to $PATH

Step 3: Download and install the ViennaRNA package from

Step 4: For the scoring Machine Learning model we have the unfeaturied data in src/model/processed.json. This has to be converted to one-hot encoded featurized csv. In order to do that run

$ python src/model/

Note: This will take some time

Step 5: Now that the data is featurized we need to create the Random Forest model by doing

$ python src/model/ create_model

Step 6: To easily test out a smaller genome, we have added the phi-X virus genome to source code in Phi-Genome folder. To build Bloom and GEM indices for this

$ python build -p Phi-Genome

Step 7: To querying step happens in 2 steps: 1. The 20nt sgRNA sequence is queried in the Bloom Filter indices of all chunks (for phi-X virus it is only one chunk but for human genome each chunk is a chromosome). 2. The sequence is then aligned using GEM indices of only the chunks obtained from the previous step because only in those chunks the sequence is found.

To do this, run
$ python src/ query -s "GATGCTGTTCAACCACTAAT" -p Phi-Genome -o alignments-phi.json

Step 8: To do on-target and off-target scoring of the output of the query and alignment steps run

$ python src/ score -s "GATGCTGTTCAACCACTAAT" -p alignments-phi.json -o output-phi.json

Machine Learning Model Evaluation

To plot the ROC and Precision-Recall curves for the SVM model and the Random Forest model run

$ python src/model/ plot_curves

Deep Learning Model Evaluation

To run the training/validation process of the Deep Convolution Neural Network follow these steps:

Install Caffe by following the instructions Use the CPU mode of Caffe and make sure you do export $PATH=<caffe-root-path>/build/tools/:$PATH. To train the network

$ cd src/model/cnn
$ caffe train --solver=sgrna-dcnn-solver.prototxt


  1. For our relative Cutting Frequence Determination (CFD) we have used the code provided by Doench et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 (
  2. To generate hamming neighbors we have used Prof.Benjamin Langmead's code from his homework solutions
  3. For indexing and querying the genome using Bloom Filter we have used


An efficient tool for CRISPR experiment design







No releases published


No packages published

Contributors 4
