Skip to content
Amir Mohseni edited this page Nov 12, 2024 · 20 revisions

Prelude

# Downloading ALLEGRO
# This may take a while. We provide all data used in the paper.
# You do not need to use FUGUE to download our data.
git clone https://github.com/ucrbioinfo/allegro.git
cd allegro

# Installing Dependencies
conda create -n allegro python=3.10 numpy=1.26.0 pandas=2.1.1 pyyaml=6.0.1 conda-forge::biopython=1.78 bioconda::bowtie=1.0.0
conda activate allegro
pip install scikit-learn==1.3.2 Cython==3.0.5 setproctitle==1.3.3

# Checking ALLEGRO
$ python src/main.py --soundcheck

If there is an install conflict/version issue, keep python=3.10 and bowtie=1.0.0 and remove the version requirement for the other packages.

Conducting a Simple Experiment

ALLEGRO comes with 50 out of the 2263 species used in its paper as example input. You may find the manifest file under data/input/fifty_example_input_species.csv and the fasta files under data/input/example_input. Note that these fasta files contain the orthologous genes for LYS2, MET17, TRP1, URA3, FCY1, GAP1, and CAN1 in S. cerevisiae S288C as determined by DIAMOND. Additionally, these fasta files have been modified to delimit intron/exon boundaries using the respective GFF files.

To conduct an experiment using the default settings in your config.yaml file, simply execute the following:

python src/main.py

ALLEGRO will output the smallest gRNA library to target every record/gene in the 50 input files and place your library under data/output/ALLEGRO_EXAMPLE_RUN/ALLEGRO_EXAMPLE_RUN_library.txt. You may modify the name of the folder and experiment in the config file.

Additionally, we provide all 2263 species and their CDS orthologous to LYS2, TRP1, URA3, FCY1, GAP1, and CAN1 (no MET17) under data/input/cds/compressed_ortho_from_gff.zip and their manifest file at fourdbs_input_species.csv. To replicate the results in the paper, unzip the CDS files and configure ALLEGRO to use that directory and manifest accordingly. In config.yaml, Set the input_species_path_column to ortho_file_name. If you would like to know how this data was acquired and processed, see the documentation at FUGUE.

Interpreting the Output

Using the default configuration file, you will find five files under the output directory of your experiment. Four of these files begin with the name of your experiment, and one is the solver's log. Let's go over each of these:

  1. ALLEGRO_EXAMPLE_RUN_config_used.txt
    • A copy of the config.yaml file used to conduct this experiment
  2. ALLEGRO_EXAMPLE_RUN_library.txt
    • Contains your Cas9 guide RNAs library without the PAM
  3. ALLEGRO_EXAMPLE_RUN.csv
    • Contains a detailed report about the guide RNAs in the library including their sequences with PAM, target files, target reference names, strands, positions, efficiency scores, and the target file paths
  4. ALLEGRO_EXAMPLE_RUN.txt
    • Reports how many targets each guide in the library cuts
  5. solver_log.txt
    • Reports the total number of Cas9 guides discovered, the total number of genes/references (if track E) or the total number of input files/species (if track A), cut multiplicity, beta, and the LP and ILP non-zero values for each guide