A new fast method for building multiple consensus trees using k-medoids
See https://bmcevolbiol.biomedcentral.com/articles/10.1186/s12862-018-1163-8
Citation Tahiri,N., Willems, M., and Makarenkov, V. (2018). A new fast method for building multiple consensus trees using k-medoids. BMC Evolutionary Biology 18(48)
=> =============================================================================================================
=> Program : KMedoidsTreeClustering - 2017
=> Authors : Nadia Tahiri and Vladimir Makarenkov (Universite du Quebec a Montreal)
=> This program computes a clustering of phylogenetic trees based on the k-medoids partitioning algorithm.
=> The set of trees in the Newick format should be provided as input.
=> The optimal partitioning in K classes is returned as output. The number of classes can be determined by the
=> Silhouette and Calinski-Harabasz cluster validity indices adapted for tree clustering. The non-squared and
=> squared Robinson and Foulds topological distance can be used.
=> The recommended option: Silhouette + non-squared Robinson and Foulds distance.
=> =============================================================================================================
$ git clone https://github.com/TahiriNadia/CKMedoidsTreeClustering.git
$ make
or
$ make install
clean project
$ make clean
$ make help
Please execute the following command line:
=> For trees: ./KMTC -tree input_file criterion
Input_file - the input file for the program
criterion - the criterion for the k-medoids algorithm (1, 2, 3 or 4, see below)
List of criteria for the k-medoids algorithm:
=> criterion 1 - Calinski-Harabasz with RF (Robinson and Foulds distance)
=> criterion 2 - Calinski-Harabasz with RF-squared
=> criterion 3 - Silhouette with RF
=> criterion 4 - Silhouette with RF-squared
Command line execution:
./KMTC -tree ../input/input.tre 3
=> See the folder "data"
Phylogenetic trees in the Netwick format (see the example in: input/input.tre)
=> See the folder "output"
The output is in the following files:
1) stat.csv - for clustering statistics;
2) output.txt - for cluster content.