A deep learning framework for predicting efficient Cas12a guides. This software accompanies the paper:
Genetic interaction mapping and exon-resolution functional genomics with a hybrid Cas9-Cas12a platform
Gonatopoulos-Pournatzis T, Aregger M, Brown KR, Farhangmehr S, Braunschweig U,
Ward HN, Ha KCH, Weiss A, Bilmann M, Durbic T, Myers Cl, Blencowe BJ, Moffat J.
Nature Biotechnology (2020).
DOI: 10.1038/s41587-020-0437-z
CHyMErA-Net is a Python software package that scores the efficacy of 39-nt Cas12a guide sequences (6 nt flanking upstream + 4 nt PAM + 23 nt guide + 6 nt flanking downstream). The model is a convolutional neural network trained using Keras (2.2.4) and TensorFlow (1.13.1), which takes as input the guide sequence as well as computed auxiliary features: secondary structure (minimum free energy) and melting temperatures.
The recommended installation method is via the conda package manager. CHyMErA-Net requires Python 3.5 or higher.
-
Install Miniconda3
-
Clone or download the CHyMErA-Net repository:
git clone https://github.com/BlencoweLab/CHyMErA-Net.git cd CHyMErA-Net
-
Create a virtual environment using the provided
environment.yml
file, which contains the pinned dependencies used.conda env create -n cas12a -f environment.yml
-
Activate the virtual environment
conda activate cas12a
-
Install the package
python setup.py install
If the environment.yml
is not used (step 2 above), then the
ViennaRNA package must be installed
manually:
conda install -c bioconda viennarna
Then, install CHyMErA-Net as described in steps 4-5 above.
A FASTA file with 39 nt guide sequences. Each guide sequence must consist of:
- 6 nt upstream flanking sequence
- 4 nt PAM sequence
- 23 nt guide sequence
- 6 nt downstream flanking sequence
Sequences cannot contain N characters.
chymeranet guides.fasta > scores.txt
The default output is a two column table:
1. ID of the guide sequence (taken from the FASTA file)
2. Prediction score between 0 and 1, where 0 is not effective and 1 is highly effective.
For additional options, open the help message by calling chymeranet -h
.
The trained models are saved in the chymeranet/data
directory:
- The CNN Keras model
- Scikit-learn scaling model: contains the scaling factors used in the training data