Skip to content

A new fast method for building multiple consensus trees using k-medoids

Notifications You must be signed in to change notification settings

TahiriNadia/CKMedoidsTreeClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CKMedoidsTreeClustering

A new fast method for building multiple consensus trees using k-medoids

Documentation

See https://bmcevolbiol.biomedcentral.com/articles/10.1186/s12862-018-1163-8
Citation Tahiri,N., Willems, M., and Makarenkov, V. (2018). A new fast method for building multiple consensus trees using k-medoids. BMC Evolutionary Biology 18(48)

About

=> =============================================================================================================
=> Program   : KMedoidsTreeClustering - 2017
=> Authors   : Nadia Tahiri and Vladimir Makarenkov (Universite du Quebec a Montreal)
=> This program computes a clustering of phylogenetic trees based on the k-medoids partitioning algorithm.
=> The set of trees in the Newick format should be provided as input.
=> The optimal partitioning in K classes is returned as output. The number of classes can be determined by the 
=> Silhouette and Calinski-Harabasz cluster validity indices adapted for tree clustering. The non-squared and 
=> squared Robinson and Foulds topological distance can be used. 
=> The recommended option: Silhouette + non-squared Robinson and Foulds distance.
=> =============================================================================================================

Installation

$ git clone https://github.com/TahiriNadia/CKMedoidsTreeClustering.git
$ make 
or
$ make install
    
clean project
$ make clean

Help

$ make help

Examples

Please execute the following command line:
=> For trees: ./KMTC -tree input_file criterion

Input_file - the input file for the program 
criterion - the criterion for the k-medoids algorithm (1, 2, 3 or 4, see below)

List of criteria for the k-medoids algorithm:
=> criterion 1 - Calinski-Harabasz with RF (Robinson and Foulds distance)
=> criterion 2 - Calinski-Harabasz with RF-squared
=> criterion 3 - Silhouette with RF
=> criterion 4 - Silhouette with RF-squared

Command line execution:
./KMTC -tree ../input/input.tre 3

Input

=> See the folder "data"
Phylogenetic trees in the Netwick format (see the example in: input/input.tre)

Output

=> See the folder "output"
The output is in the following files:
1) stat.csv - for clustering statistics;
2) output.txt - for cluster content.