`transcriptionary`` takes user-defined parameters to create a static .html document displaying gene transcripts with introns compressed.
Exons are annotated with coordinates and length.
Variants are specified as VCF or BED and are annotated with interactive hover boxes. For VCF, lollipops are automatically annotated with REF, ALT, allele count, allele frequency and are colored by severity if this information is provided. Radio buttons in the HTML plot give the option to view lollipop heights as allele count or allele frequency, and linear or log scale. If a variant file with VEP annotations is provided, lollipops will be categorized as LOW, MED, or HIGH impact as per geneimpacts and colored accordingly. If no VEP annotation is present for a given transcript, lollipops on that transcript will be annotated as NONE. For a variant file with VEP annotations, lollipops can be turned on and off by severity with a checkbox. If more than one variant file is provided, each lollipop is annotated with the variant set it comes from, and each set of variants can be turned on and off with a checkbox.
Tracks are specified as GTF or BED and are annotated with name, coordinates, and length, along with other specified fields from the GTF/BED. Colors can be specified by the user or chosen randomly from a color palette.
Coordinate-based information can be provided as CSV/TSV (point-based) or BEDGRAPH (interval-based). The user can customize the y axis with tick precision and scientific notation. The user can specify the line color, alpha value, and choose whether to fill in the area under the curve.
If BED file contains a header, the line must begin with #
. BED files must have at least three columns, which should be chrom
, start
, and end
.
Plots can be output as HTML, PNG, or SVG.
For HTML plots: UTRs, direction arrows, and lollipops, tracks, and coordinate-based information can be turned on and off in the plot with checkboxes. A smoothing slider can be added to coordinate-based information. Click on any exon to expand it. Click on any white space to revert the plot.
https://home.chpc.utah.edu/~u6038618/transcriptionary/plot.html
config_file
(required): user-defined parameters (sample config file at test/test.yaml)
output_format
: can be HTML (interactive); PNG or SVG (non-interactive)
output_filepath
: path to desired output file
transcripts
: list transcript names or specify transcripts to plot. can be:
transcript-names
: list all possible transcript names for the given configuration file (does not create a plot).flattened-exons
: overlay all transcripts to create a transcript with the largest possible exons (used to view all possible exonic variants).all
: plot all transcripts, includingflattened-exons
.- '[
<transcript_name_1>
,<transcript_name_2>
, ...]': specify transcripts
gff_path
: path to GFF or GTF (for feature coordinates). When running the first time, a gff.db
file will be created for you. When rerunning, can change this parameter to the .gff.db file to avoid recreating it.
title
: title of HTML; used for HTML output only
gene_name
: gene name
chrom
: chromosome (from features GFF/GTF)
plot_height
: height of plot in pixels; default 200
plot_width
: width of plot in pixels; default 1500
track_height
: height of tracks in pixels; default 10
exon_height
: height of exons in pixels; default 16
intron_size
: size to which introns are compressed; default 10
plot_UTRs
: show UTRs (boolean)
plot_direction
: show arrows with direction (boolean)
min_lollipop_height
: minimum height of lollipop in pixels; default 15
lollipop_radius
: radius of lollipop in pixels; default 5
lollipop_line_width
: line width of lollipop in pixels; default 2
default_y_axis
: set lollipop heights according to allele count (AC
) or allele frequency (AF
) by default (can be toggled with HTML output)
default_y_axis_scale
: scale lollipop heights according on a linear (linear
) or log (log
) scale by default (can be toggled with HTML output)
glyph_colors
: specify feature colors (can be hex code or any name from default_colors/named_colors.yaml)
intron
: default 'gray 11'exon
: default 'davys gray'arrow
: default '#252525'UTR
: default '#969696'
palettes_filepath
: filepath to config file with color palettes; default default_colors/palettes.yaml
named_colors_filepath
: filepath to config file mapping hex color codes to named colors; default default_colors/named_colors.yaml
track_palette
: palette to draw random track colors from; can be any palette in default_colors/palettes.yaml
#VCF
<variant_set>
: variant set label
format
: file format of variant file; 'vcf'filepath
: path to VCFchrom
: chromosomeinfo_annotations
: for VCF, INFO fields to add to hover boxes<info_field_1>
vep
: VCF only; leave empty if not VEP annotatedfield_name
: name of INFO field with VEP string (e.g. vep, ann, csq)vep_fields
: vep fields to add to hover boxes<vep_field_1>
annotate_severity_by
: VCF only; possible arguments aretranscript_severity
(use VEP annotations from given transcript) andmax_severity
(use VEP annotation from transcript with most severe consequenced specified intranscripts
argument); these apply tovep_fields
alsocolor
: default lollipop color; use hex codes or predefined colors from default_colors/named_colors.yamlvariant_severity_colors
: specify lollipop colors by variant severity; use hex codes or predefined colors from default_colors/named_colors.yamlLOW
:MED
:HIGH
:
#BED (if header in file, it must start with #)
<variant_set>
: variant set label
format
: file format of variant file; 'vcf'filepath
: path to VCFchrom
: chromosomeheader
: BED only; list of column names in BED fileinfo_annotations
: for BED, column names to add to hover boxes -<info_field_1>
color
: default lollipop color; use hex codes or predefined colors from default_colors/named_colors.yamlvariant_severity_colors
: specify lollipop colors by variant severity; use hex codes or predefined colors from default_colors/named_colors.yamlLOW
:MED
:HIGH
:
consequence_idx
: BED only; index of file containing 'Consequence' field from VEP (leave empty or set to False if no such column); 0-based
<track_name>
: track label
format
: file format of track file, can be GTF or BEDfilepath
: path to file with track coordinate informationseqid
: chromosomeheader
: list of column names (BED only)color_by
: field to color boxes byannotate_with
: fields to annotate with<field_name>
: (for BED, must be from header)
<axis_name>
:
y_axis_label
: y axis labelnum_ticks
: number of y ticks; default 3tick_precision
: number of decimal placestick_scientific_notation
: (boolean)smoothing slider
: include slider widget to smooth lines (boolean)lines
: lines to plot on this axis<line_name>
:filepath
: (CSV or BEDGRAPH)color
:alpha
:fill_area
: fill area under the curve (boolean)
git clone https://github.com/quinlan-lab/transcriptionary.git
cd transcriptionary
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -n transcriptionary --file requirements.txt python=3.10
conda activate transcriptionary
python setup.py install
or
git clone https://github.com/quinlan-lab/transcriptionary.git
cd transcriptionary
pip install -r requirements.txt
python setup.py install
transcriptionary test/test.yaml
will create transcriptionary-example.html
. Open transcriptionary-example.html
in browser to view plots.
Define or adjust named colors by modifying default_colors/named_colors.yaml
. Create a custom color palette to pull random track colors from by adding to default_colors/palettes.yaml
(use hex codes or use the named colors defined in default_colors/named_colors.yaml
). Alternatively, create your own named_colors and palettes config files and change the file path in the main config file.
Color-blind friendly color palettes are included in default_colors/palettes.yaml
- bang_wong_palette: https://www.nature.com/articles/nmeth.1618
- paul_tol_palette: https://personal.sron.nl/~pault/