wLogDate is a method for dating phylogenetic trees. Given a phylogeny and either sampling times for leaves or calibration points for internal nodes, wLogDate outputs a "dated" tree that conforms to the sampling times or calibration points. It can also work with no sampling time or calibration points where it would simply turn the tree into ultrametric, fixing its height to a given value. Its optimization criterion is to minimize the variance of the mutation rates in log scale (hence the term logDate).
The algorithm is developed by Uyen Mai and Siavash Mirarab. The code is developed by Uyen Mai.
Mai, Uyen, and Siavash Mirarab. “Log Transformation Improves Dating of Phylogenies.” Molecular Biology and Evolution, in press 2020
- Abstract appeared at RECOMB 2020
Please submit questions and bug reports as issues.
A Galaxy-based web server with graphical user interface is available at http://wlogdate.com/
The latest release is v1.0.2.
We release executables for Windows users. The current stable release is v1.0.2, which can be downloaded here. Double click the exe file and follow the install wizard to setup wLogDate.
After installation, restart your computer. Then open the cmd and run launch_wLogDate -h
to see the commandline help of wLogDate.
You can install wLogDate using Anaconda, Pip, or from source code.
You need to have:
- Python >= 3.6
- Anaconda would make the installation slightly easier. But pip (already installed with Python 3 >=3.4) would also work.
wLogDate is available on the Python Package Index (PyPI). To install, use pip
as follow
python3 -m pip install wlogdate
First, add bioconda
and conda-forge
to your active channels.
conda config --add channels conda-forge
conda config --add channels bioconda
Now use conda install
to install wLogDate
conda install wlogdate
-
Download the source code.
- Either clone the repository to your machine
git clone https://github.com/uym2/wLogDate.git
- or simply download this zip file to your machine and unzip it in your preferred destination.
-
To install, go to the wLogDate folder.
- If you have
pip
, use
python3 -m pip install .
- Otherwise, type
python3 setup.py install
- If you have
After installation, run:
launch_wLogDate.py -h
to see the commandline help of wLogDate.
wLogDate accepts calibration points (hard constraints on divergence times) for internal nodes, sampling times at leaf nodes, and a mixture of the two. Below we give examples for the three most common use-cases.
- All examples are given in use_cases.zip of this repository.
- If you cloned or downloaded the repository, go to the wLogDate folder and
unzip use_cases.zip
. - If you installed wLogDate using Anaconda or PyPI, download use_cases.zip to your machine and unzip it before trying the examples.
- If you cloned or downloaded the repository, go to the wLogDate folder and
Note: the examples below assume you are using Linux or MacOS. For Windows users who installed wLogDate using the installation wizard, change launch_wLogDate.py
to launch_wLogDate
in all commands in the examples below.
If there is no calibration given, wLogDate will infer the unit (depth 1) ultrametric tree.
launch_wLogDate.py -i <INPUT_TREE> -o <OUTPUT_TREE>
We give an example in folder use_cases/unit_time_tree, inside which you can find the sampled input tree input.nwk
.
cd use_cases/unit_time_tree
launch_wLogDate.py -i input.nwk -o output.nwk
The output tree is output.nwk
.
- It is an ultrametric tree and has depth (root-to-tip distance) 1.
- The relative divergence times of the internal nodes are annotated on the tree inside the square brackets with attribute
t
, as in,[t=0.095]
.
A typical use-case in virus phylogeny is to infer the time tree from a phylogram inferred from sequences and their sampling times (i.e. calibration points given at leaf nodes). wLogDate reads the calibration points or sampling times from an input file via the -t
option.
launch_wLogDate.py -i <INPUT_TREE> -o <OUTPUT_TREE> -t <SAMPLING_TIMES>
An example is given in the folder use_cases/virus_all_samplingTime
. Starting from the base directory,
cd use_cases/virus_all_samplingTime
Inside this folder you will find an input tree (input.nwk
) and the input sampling times (input.txt
).
In this example, we give wLogDate all the sampling times for all leaves (i.e. complete sampling times).
- The sampling time file (
input.txt
) is a tab-delimited file, with one pair of species-time per line - It must have two columns: the species names and the corresponding sampling times.
For example, lines
000009 9.36668
000010 9.36668
000011 11.3667
000012 11.3667
show that leaves 000009
and 000010
are sampled at time 9.36668 while nodes 000011
and 000012
are sampled at time 11.3667.
Note: These times are assumed to be forward; i.e, smaller values mean closer to the root of the tree. The top of the branch above the root is assumed to be 0.
Now, run:
cd use_cases/virus_all_samplingTime
launch_wLogDate.py -i input.nwk -o output.nwk -t input.txt
The output tree output.nwk
- has branch lengths in time units and
- divergence times annotated on every internal nodes using the
[t=9.55]
notation.
wLogDate allows missing sampling times for the leaves, as long as there exists at least one pair of leaves with different sampling times. The usage of wLogDate is the same as in the case of complete sampling times. An example is given in the folder use_cases/virus_some_samplingTime
. Here we give the sampling times for 52 species out of 110 in total.
cd use_cases/virus_some_samplingTime/
launch_wLogDate.py -i input.nwk -o output.nwk -t input.txt
wLogDate allows the sampling times to be given in both internal nodes and at leaves. An example is given in the folder use_cases/virus_internal_smplTime
. In the example tree, each of the nodes (including leaf nodes and internal nodes) has a unique label. If the internal nodes have unique labeling, wLogDate allows the internal calibrations to be specified by their labels such as the leaves.
cd use_cases/virus_internal_smplTime
launch_wLogDate.py -i input.nwk -o output.nwk -t input.txt -k
The -k
flag (or --keep
) is used to inform wLogDate that the tree already has unique internal node labels and that wLogDate should suppress the auto-labeling of internal nodes.
For calibration points obtained from fossils, the calibration points are usually specified in backward time such as "million years ago" ("mya"). For these cases, wLogDate allows specification of backward time via the -b
flag.
launch_wLogDate.py -i <INPUT_TREE> -o <OUTPUT_TREE> -t <CALIBRATIONS> -b
Calibration points can be given in the same way as sampling times.
- If the tree nodes are uniquely labeled, we can use the node labels to specify the internal nodes associated with the calibration points.
- Alternatively, wLogDate allows the identification of a node as the Least Common Ancestor (LCA) of a set of species and allows optional label assignment to these calibration points. You may know LCA by their other name: MRCA.
We give an example of the LCA specification in use_cases/fossil_backward_time
. From the base directory, go to this example.
cd use_cases/fossil_backward_time
Because the input tree input.nwk
does not have labels for internal nodes, we need to use LCA to specify calibration points. Here we use 4 calibration points in input.txt
:
calib1=t1+t30+t40+t26 0.551
t27+t3 0.057
calib2=t24+t37 0.152
t46+t31+t48 2.699
An internal node is identified as the LCA of 2 or more species separated by +
. Moreover, a name for this internal node can be optionally specified using =
. In our example, the 4 calibration points are the LCAs of (t1, t30, t40, and t26)
, (t27 and t3)
, (t24 and t37)
, and (t46, t31, and t48)
, with two node labels calib1
and calib2
assigned to two of these nodes. Note that label assignments in input.txt
override both the input tree's labels and automatic labeling of wLogDate.
launch_wLogDate.py -i input.nwk -t input.txt -o output.nwk -b
With the -b
flag, wLogDate understands the time as backward and enforces each parent node's divergence time to be larger (i.e. "older") than those of its children.
The output tree output.nwk
is ultrametric, has branch lengths in time units, and has divergence times annotated onto the internal nodes in backward time.
- By default, the leaf nodes are set to present time (t = 0). You can adjust the leaf time using the
-f
option. - The output tree has internal node labels assigned arbitrarily by wLogDate, except for the two calibration points "calib1" and "calib2" assigned by user via
input.txt
.
The following options are useful to explore:
-p 10
(or some other number) can be used to run the optimization problem 10 times instead of the default once, each starting from a different initial point.-s
can be used to set the seed number, to enable reproducible results.-l
can be used to set the length of the sequences from which the tree is inferred. Impacts the pseudocount used internally by wLogDate for super short branches.-m
to adjust the maximum number of iterations of the internal optimizer.-z
to assign a value to zero length branches.-r
and-f
can be used to set the time at the root and the leaves.