Skip to content

Subcommand: random alignment

Lucas Czech edited this page Aug 9, 2020 · 10 revisions

Create a random alignment with a given numer of sequences of a given length.

Usage: gappa random random-alignment [options]

Options

Input
--sequence-count Required. UINT=0
Number of sequences to create.
--sequence-length Required. UINT=0
Length of the sequences to create.
--characters TEXT=-ACGT
Set of characters to use for the sequences.
Output
--out-dir TEXT=.
Directory to write files to
--file-prefix TEXT
File prefix for output files
--write-fasta Write sequences to a fasta file.
--write-strict-phylip Excludes: --write-relaxed-phylip
Write sequences to a strict phylip file.
--write-relaxed-phylip Excludes: --write-strict-phylip
Write sequences to a relaxed phylip file.
Global Options
--allow-file-overwriting Allow to overwrite existing output files instead of aborting the command.
--verbose Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

The command creates a random alignment with a given number of sequences of a given length. The sequences are named with simple letter combinations, going a, ..., z, aa, ..., az, ba, .... The characters in the alignment sequences are randmonly chosen from the provided character set.

At least one of the output format option flags --write-fasta, --write-strict-phylip, and --write-relaxed-phylip has to be provided, but not both of the phylip formats at the same time. The output files are named random-alignment.fasta and random-alignment.phylip, respectively, potentially using the --file-prefix if provided.

The differences between strict and relaxed phylip are as follows: Strict phylip is the original specification, which uses exactly the first 10 characters of a line to denote the name (filled with spaces if shorter), and requires the whole sequence to be in the rest of the (potentially very long) line. Relaxed phylip allows arbitrarily long names, separated by at least one white space from the actual sequence, and the sequence can be broken down into multiple lines.

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Clone this wiki locally