Takes a VCF and infers annotations and variant effects from a GFF using SnpEff.
SnpEff is a tool that annotates and predicts the effects of variants on genes. SnpEffWrapper takes a VCF and, using SnpEff, infers annotations and variation effects from a GFF. If you use SnpEffWrapper, please consider citing SnpEff. This software is not endorsed in any respect by the original authors.
SnpEffWrapper has the following dependencies:
- SnpEff (>= 4.1)
- Java (>= 1.7)
- Jinja2
- PyVCF
- PyYAML
Details for the installation are provided below. If you encounter an issue when installing SnpEffWrapper please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk
Install snpEff and Java 1.7 then
pip install git+https://github.com/sanger-pathogens/SnpEffWrapper.git
The test can be run from the top level directory:
./snpEffWrapper/tests/test_wrapper.py
$ snpEffBuildAndRun --help
usage: snpEffBuildAndRun [-h] [--snpeff-exec SNPEFF_EXEC]
[--java-exec JAVA_EXEC] [--coding-table CODING_TABLE]
[-o OUTPUT_VCF] [--debug] [--keep]
gff_file vcf_file
Takes a VCF and applies annotations from a GFF using SnpEff
positional arguments:
gff_file GFF with annotations including a reference genome
sequence
vcf_file VCF input to annotate (NB must be aligned to the
reference in your GFF
optional arguments:
-h, --help show this help message and exit
--snpeff-exec SNPEFF_EXEC
Path to your prefered SnpEff executable (default:
snpEff.jar)
--java-exec JAVA_EXEC
Path to Java 1.7 (default: java)
--coding-table CODING_TABLE
A mapping of contig name to coding table formatted in
YAML
-o OUTPUT_VCF, --output_vcf OUTPUT_VCF
Output for the annotated VCF (default: stdout)
--debug Show lots of SnpEff and other debug output
--keep Keep temporary files and databases (useful for
debugging)
- snpEffBuildAndRun will look for SnpEFF.jar in the following locations:
- the file specified by
--snpeff-exec
snpEff.jar
in your local directorysnpEff.jar
in yourPATH
- the file specified by
- SnpEff needs Java 1.7 to run; snpEffBuildAndRun will look in the following locations:
- the file specified by
--java-exec
java
in yourPATH
- the file specified by
$ snpEffBuildAndRun snpEffWrapper/tests/data/minimal.gff snpEffWrapper/tests/data/minimal.vcf -o minimal.annotated.vcf
You can provide a coding table for each VCF contig otherwise it'll default to SnpEff's 'Bacterial_and_Plant_Plastid'. You can do this by providing a mapping for each contig in your VCF to the relevant table in snpEffWrapper/data/config.template in YAML format.
For example:
snpEffBuildAndRun minimal.gff minimal.vcf \
--coding-table 'default: Standard'
snpEffBuildAndRun minimal.gff minimal.vcf \
--coding-table '{CHROM1: Standard, MITO1: Mitochondrial}'
snpEffBuildAndRun minimal.gff minimal.vcf \
--coding-table '{default: Standard, MITO1: Mitochondrial}'
NB you don't need curly brackets if you're only mapping one contig (or setting a default); you do need them if you're setting different coding tables.
- The GFF must contain the reference sequence in Fasta format
- The VCF must be aligned against the reference in the GFF
- At least one of the contigs in the VCF must have annotation data in the GFF (you'll get warnings for each VCF config not in the GFF)
- You cannot provide unknown coding tables (i.e. that can't be found in config.template)
SnpEffWrapper is free software, licensed under GPLv3.
Please report any issues to the issues page or email path-help@sanger.ac.uk.
If you use this, please consider citing SnpEff. This software is not endorsed in any respect by the original authors.