-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: heat tree
Make a tree with edges colored according to the placement mass of the samples.
Usage: gappa examine heat-tree [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed. |
Settings | |
--mass-norm |
TEXT:{absolute,relative}=absolute Set the per-sample normalization method. With absolute , the total mass is not changed, so that input jplace samples with more pqueries (more placed sequences) have a higher influence on the result. With relative , the total mass of each sample is normalized to 1.0, so that each sample has the same influence on the result, independent of its number of sequences and their abundances. |
--point-mass |
FLAG Treat every pquery as a point mass concentrated on the highest-weight placement. In other words, ignore all but the most likely placement location (the one with the highest LWR), and set its LWR to 1.0. |
--ignore-multiplicities |
FLAG Set the multiplicity of each pquery to 1.0. The multiplicity is the equvalent of abundances for placements, and hence ignored with this flag. |
Color | |
--color-list |
TEXT=BuPuBk List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual comma-separated list of colors. Colors can be specified in the format #rrggbb using hex values, or by web color names. |
--reverse-color-list |
FLAG If set, the order of colors of the --color-list is reversed. |
--under-color |
TEXT=#ff00ff Color used to indicate values below the min value. Color can be specified in the format #rrggbb using hex values, or by web color names. |
--clip-under |
FLAG Clip (i.e., clamp) values less than min to be inside [ min, max ] , by setting values that are too low to the specified min value. If set, --under-color is not used to indicate values out of range. |
--over-color |
TEXT=#00ffff Color used to indicate values above the max value. Color can be specified in the format #rrggbb using hex values, or by web color names. |
--clip-over |
FLAG Clip (i.e., clamp) values greater than max to be inside [ min, max ] , by setting values that are too high to the specified max value. If set, --over-color is not used to indicate values out of range. |
--clip |
FLAG Clip (i.e., clamp) values to be inside [ min, max ] , by setting values outside of that interval to the nearest boundary of it. This option is a shortcut to set --clip-under and --clip-over at once. |
--mask-color |
TEXT=#ffff00 Color used to indicate masked or invalid values, such as infinities or NaNs. Color can be specified in the format #rrggbb using hex values, or by web color names. |
--log-scaling |
FLAG If set, the sequential color list is logarithmically scaled instead of linearily. |
--min-value |
FLOAT=0 Minimum value that is represented by the color scale. If not set, the minimum value of the data is used. |
--max-value |
FLOAT=1 Maximum value that is represented by the color scale. If not set, the maximum value of the data is used. |
--mask-value |
FLOAT=nan Mask value that identifies invalid values (in addition to infinities and NaN values, which are always considered invalid, and hence always masked). Value of the data that compare equal to the mask value are colored using --mask-color. This is meant as a simple means of filtering and visualizing invalid values. If not set, no masking value is applied. |
Output | |
--out-dir |
TEXT=. Directory to write output files to. |
--file-prefix |
TEXT File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
--file-suffix |
TEXT File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
Tree Output | |
--write-newick-tree |
FLAG If set, the tree is written to a Newick file. This format cannot store color information. |
--write-nexus-tree |
FLAG If set, the tree is written to a Nexus file. This can for example be opened in FigTree. |
--write-phyloxml-tree |
FLAG If set, the tree is written to a Phyloxml file. This can for example be used in Archaeopteryx. |
--write-svg-tree |
FLAG If set, the tree is written to a SVG file. This gives a file for vector graphics editors. |
Newick Tree Output | |
--newick-tree-branch-length-precision |
INT=6 Needs: --write-newick-tree Number of digits to print for branch lengths in Newick format. |
--newick-tree-quote-invalid-chars |
FLAG Needs: --write-newick-tree If set, node labels that contain characters that are invalid in the Newick format (i.e., spaces and :;()[],{} ) are put into quotation marks. If not set (default), these characters are instead replaced by underscores, which changes the names, but works better with most downstream tools. |
Svg Tree Output | |
--svg-tree-shape |
TEXT:{circular,rectangular}=circular Needs: --write-svg-tree Shape of the tree. |
--svg-tree-type |
TEXT:{cladogram,phylogram}=cladogram Needs: --write-svg-tree Type of the tree, either using branch lengths ( phylogram ), or not (cladogram ). |
--svg-tree-stroke-width |
FLOAT=5 Needs: --write-svg-tree Svg stroke width for the branches of the tree. |
--svg-tree-ladderize |
FLAG Needs: --write-svg-tree If set, the tree is ladderized. |
Global Options | |
--allow-file-overwriting |
FLAG Allow to overwrite existing output files instead of aborting the command. |
--verbose |
FLAG Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
The command takes one or more jplace files as input and visualizes the distribution of placements on the branches of the tree. It uses color coding to show how much placement mass there is per branch.
The tree shows the distribution of placements across the tree. That is, it sums the LWR per branch for all query sequences that have placement mass on that branch (theoretically, all of them have, but usually, jplace files only store the top n
most likely locations). Note that this hence does not correspond to an actual numer of sequences per branch - it is a distribution!
However, when using --point-mass
, each query sequence is reduced to only its most likely placement location, and the LWR of that location is set to 1.0. Hence, when this flag is set, the heat-tree visualization produced here also corresponds to individual query sequences, with the downside of losing the uncertainty information contained in the placement distribution.
Important remark:
If multiple jplace files are provided as input, their combined placements are visualized. It is then critical to correctly set the --mass-norm
option. If set to absolute
, no normalization is performed per jplace file - thus, absolute abundances are shown. However, if set to relative
, the placement mass in each input file is normalized to unit mass 1.0 first, thus showing relative abundances.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools