brakeMake

A repository for the Braker Snakemake pipeline, designed to be run on the completed assemblies, after assembly & scaffolding.

Edit the config.yaml as follows:

Update 'input_dir' to be the path where all assemblies you want to annotate are stored. Make sure that all assemblies end in '.fasta'.
Update 'results_dir' to the path where all results will be stored.
Change 'rna_dir' to the location where bulk RNA sequencing files are stored to be used for transcriptome generation. Even when no RNAseq is used this needs to be an existing directory.
Change 'rna_list' to the list of IDs for files within the rna_dir. Can also include SRA IDs which are not in the rna_dir, these will be downloaded and used for the annotation. If no RNAseq is used, leave this section empty.
If you do not wish to use the uniprot protein DB, 'protein_file' needs to be updated to the path to a fasta file of amino acid sequences to use. This fasta file then needs to be used to generate a blastp database, this protein DB will be used within the script, though I recommend leaving this as uniprot. If you wish to use other protein fasta files alongside uniprot for genome annotation, they should be put in the input_dir with the assemblies to be annotated. Make sure that all protein datasets end in '.faa'.

The pipeline can be test run with the following command in an interactive session:

sh pipeline_ctrl.sh npr $PWD

Assuming that you are in the directory from this repository.

The pipeline can be fully run with the following command:

sbatch --time=10-00:00:00 pipeline_ctrl.sh process $PWD

Assuming that you are in the directory from this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
README.md		README.md
Snakefile		Snakefile
clean_gtf.py		clean_gtf.py
cluster.json		cluster.json
config.yaml		config.yaml
pipeline_ctrl.sh		pipeline_ctrl.sh
pipeline_rulegraph.dot		pipeline_rulegraph.dot
pipeline_rulegraph.svg		pipeline_rulegraph.svg

Provide feedback