Skip to content

Usage: get_gens_dfs.py

Michael Olvera edited this page May 3, 2018 · 6 revisions

get_gens_dfs.py generates a table (tsv file) listing all variants in a defined interval for a specified individual (based on input VCF file). This basically reformats genotypes from VCF for easier processing later when designing sgRNAs. Written in Python v 3.6.1. Kathleen Keough et al 2017-2018.

Usage:

get_gens_dfs.py <vcf_file> <locus> <out> [-f][--bed] [—chrom]

Expanded Examples:
Producing a gens file for one locus:
python3 get_gens_dfs.py INPUT.vcf.gz\
 1:11980181-12013515\
 OUT_GENS
Producing a gens file from several loci:
python3 get_gens_file.py INPUT.vcf.gz\
 loci.bed\
 OUT_multi_loci_gens\
 --bed

where the loci.bed file is formated like so:

1	11976269	12018380	MFN2
7	76298036	76308038	HSPB1
11	61940001	61963675	BEST1
Arguments and Options:
Arguments: Details
vcf_file BCF/VCF file with genotypes. Files should be gzipped (using bcftools or bgzip ) and include an index (using bcftools or tabix).
locus Locus from which to pull variants, in format chromosome:start-stop, or a BED file if --bed.
out The name for the output file and directory in which to save the output files.
Options: Details
-f If this option is specified, keeps homozygous variants in output file.
--bed Indicates that a BED file is being used in place of a locus. BED files are expected to include the CHROM, START, STOP, and ID column.
--chrom Run on entire chromosome.