-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: accumulate
Accumulate the masses of each query in jplace files into basal branches so that they exceed a given mass threshold.
Usage: gappa edit accumulate [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed. |
Settings | |
--threshold |
FLOAT:FLOAT in [0.5 - 1]=0.95 Threshold of how much mass needs to be accumulated into a basal branch. |
Output | |
--out-dir |
TEXT=. Directory to write output files to. |
--file-prefix |
TEXT File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
--file-suffix |
TEXT File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
Global Options | |
--allow-file-overwriting |
FLAG Allow to overwrite existing output files instead of aborting the command. |
--verbose |
FLAG Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
The command is useful to assess placements that are distributed across (nearby) branches of the reference tree - for example, if the reference tree contains multiple representatives for the same species. It accumulates the placement mass (likelihood weight ratio) of the placements of each pquery upwards the tree (towards the root), until the accumulated mass at a basal branch reaches the given --threshold
:
That is, each pquery is treated separately. Its mass is first normalized to a total of 1.0. Then, the command looks for the basal branch whose underlying clade accumulates more than the threshold mass. This can be understood as finding the clade that contains most of the placement mass. All placements of the pquery are then removed, and only one placement at the basal branch is added, with a mass of 1.0, which hence represents the accumulated original masses. The pendant length of the resulting pquery is set to the weighted average of the pendant lengths that have been accumulated in the clade, using the masses (likelihood weight ratios) as weights.
It can happen that a pquery contains placement mass across different sides of the root. If no side contains more than the given --threshold
mass, there is no basal branch or clade that satisfies the above description. In that case, the whole pquery is removed from the output, and its name(s) are printed in order to inform about this. This can for example happen with chimeric sequences that fit in multiple places of the tree, and hence should be treated as a warning sign. Another reason can be that the root of reference tree is not chosen properly. In that case, it can help to reroot the tree first.
The output of the command is a file called accumulated.jplace
, potentially using the --file-prefix
and --file-suffix
.
The file can then be visualized, for example via the heat-tree command,
or examined in even greater detail with the graft command.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools