Ensembl's Variant Effect Predictor (vep) provides functional annotations of genomic variants. However, the output of vep, as well as variant-calling format (vcf) files, can be hard to parse especially due nested key-value pair fields common to both file types. This script, implemented in Haskell, provides a conversion of the standard output of vep and vcf to a fully tab-delimited version of output that is fully and easily accessible to downstream filtration and further parsing.
bvp.hs assumes you have a the GHC compiler and packages installed that it imports. The easiest way to do this is to download the Haskell Platform.
To install the peripheral packages bvp.hs requires, you can call the following command assuming you have cabal, a package manager and build system for Haskell, installed on your system (it comes with the Haskell Platform).
$ cabal install [packagename]
Required packages
- Data.List
- Data.List.Extra
- Data.List.Split
- Data.Ord
- Data.Tuple
- System.Console.GetOpt
- System.Directory
- System.Environment
- System.Exit
- System.IO
- System.IO.Temp
- System.Process
A prerequisite for getting useful output from this script is to have the correct input file structure. This script requires that you provide a file that was produced using vep, or a vcf file.
bvp.hs is easy to use.
You can call it using the runghc command provided by the GHC compiler as such:
$ runghc bvp.hs -I vcf -O tvcf -g -G -o output.vcf example.vcf.gz
For maximum performance, please compile and run the source code as follows:
$ ghc -O2 -o BVP bvp.hs
$ ./BVP -I vcf -O tvcf -g -G -o output.vcf example.vcf.gz
bvp.hs has few different command line arguments:
Basic Variant Parser, Copyright (c) 2019 Matthew Mosior.
Usage: bvp [-vV?IoOgG] [file]
-v --verbose Output on stderr.
-V, -? --version Show version number.
-I IN --InFormat=IN The format of the input file.
-O OUT --OutFormat=OUT The format of the output file.
-o OUTFILE --OutputFile=OUTFILE The output file.
-g --GzipIn Gzipped input file?
-G --GzipOut Gzipped output file?
--help Print this help message.
The -v
option, the verbose
option, will provide a full error message.
The -V
option, the version
option, will show the version of bvp
in use.
The -I
option, the InFormat
option, specifies the format of the input file, and is required to run bvp.hs
.
The -O
option, the OutFormat
option, specifies the format of the output file, and is required to run bvp.hs
.
The -o
option, the outputfile
option, is used to output the operation on the input file into a output file, whose name is specified by the user.
The -g
option, the GzipIn
option, specifies that the input file is gzip compressed.
The -G
option, the GzipOut
option, specifies that the output file is to be compressed using gzip.
Finally, the --help
option outputs the help
message seen above.
A docker-based solution (Dockerfile) is availible in the corresponding repository. Currently, this Dockerfile assumes that you run docker interactively.
Documentation was added March 2019.
Author : Matthew Mosior