Skip to content

Commit

Permalink
Likelihood calculation for subst models and pip (#5)
Browse files Browse the repository at this point in the history
* Added TN93 to DNA models, fixed GTR

Fixed DNA model row/column placement, the matrices were transposed before.
Fixed also the summation on the diagonal, doesnt seem necessary with the other
fix, but this way the written array for matrix values is easier to read.
Added TN93 model.

* Initial implementation of substitution likelihood

Initial implementation of the phylogenetic lieklihood for models of DNA
substitution.
Tests for the computations from the CB ETH lecture and the Molecular
Evolution book.

* Minor changes to make clippy happy

* Running coverage on all branches

Running coverage on all branches on a push, and additionally on main on a pull request

* trying to add manual trigger to code coverage

* Fix to make tn93 tests work

* Fixed stationarity tests for proteins and HIVB pi

Fixed the tests for protein model stationarity.
Fixed HIVB AA substitution model stationary frequencies.
Added tests for HIVB model.

* Fix missing empty fun for Alignment

Added back a missing function for creating an empty alignment that is
needed by the parsimony aligner.

* Parsimony alignment fix (#2)

* Minor fixes to make linter happy

Minor fixes to make clippy happy, removed unnecessary vec!, into_iter()
and the like.

* Fixed sequence order to match leaves

Fixed sequence order in the vector to match leaf indices. The code relied
on that property to begin with, but it was neither being tested nor
was it being enforced in any way which lead to very malformed alignments
in IndelMaP.
This contains the fix and a test to check that the order matches together
with the data for the test.

* Change LikelihoodCostFunction to trait

Refactored the LikelihoodCostFunction into a trait for future different
implementations.
Removed duplicated EvoModel code from SubstitutionModel module.
Added notes for things that need to be fixed:
    The character probabilities only work correctly for ACGT-, not
for ambiguous chars at the moment, and not for proteins;
    Evolutionary model should be a trait to then work with PIP and
anything else, will also fix the char probabilities when implemented.

* Refactoring likeihood computation for subst models

Refactoring likelihood computation, moved all relevant functions to
the substitution model module.

* Test data for subst likelihoods

Test data files for substitution likelihood computation

* EvolitionaryModel trait impl for DNA substitution

Added the EvolutionaryModel trait which should be used for all models
of sequence evolution.
Implemented the trait for DNA substitution models.
All generic implementations remain for all substitution models, but both
DNA and protein have to independently implement EvoModel trait to avoid
implementation conflicts for the template.
Also made sure that the basic character probabilities are computed through
the trait -- there was an error where ambiguous chars were ignored.
EvolutionaryModelInfo trait now requires a model with the EvoModel
trait rather than just a substitution model.

Additionally, fixed likelihood computation for multiple sites, expected only
one value before in the final array, but it should contain len(MSA) column
likelihoods.

* Removed new() from PhyloInfo

Removed the new() method from PhyloInfo due to it being misleading,
it didn't check for data validity (that tree tips/sequences correspond to one
another, that the sequences are stored in the right order).

* Tests for char probabilities at tips

Added tests for computing nucleotide probabilities on sequences.

* Code cleanup: removed unnecessary vec!'s

Code cleanup for clippy: removed unnecessary vector creation in favour of passing
slices.

* Alignment likelihood tests

Added alignment likelihood tests for sequences that are longer than 1:
    An exampkle from Huelsenbeck with 50 sites;
    A computed example from the CB lectures with X or N characters to
    check how ambiguous chars are processed.

* Implemented EvolutionaryModel for protein subst

Implemented the EvolutionaryModel trait for protein substitution models.
Added tests for getting correct character probabilities.

* Added likelihood calculation to ProteinSubst

Added likelihood calculation to the protein substitution models.
Added a test example with the likelihoods being close to what phyml estimates
but not quite.
The phyml_protein_nogap_example files are for that example, phyml wants .phy
sequences and an unrooted tree.

* Added tests for subst likelihood reversibility

Added 2 test cases for substitution likelihood reversibility:
    1. simple fabricated example, tn93 likelihoods are the same on two trees
    2. rerooted the huelsenbeck example tree, GTR and k80 likelihoods are
    the same.

* Refactor: EvoModel to own module

Refactored to move the EvolutionaryModel trait to its own module so that
evolutionary models are not just substitution models.

* Added msa field to PhyloInfo

Added the MSA field to the PhyloInfo struct, now when reading data from a file the
msa will only be set if all the read sequences have the same length.
Need to add tests for making sure that sequences are aligned where they need to be.

* Added tests for aligned/unaligned read sequences

Added some test that check that the sequences don't get used as the MSA if
they are different lengths, otherwise they get copied to the MSA field.

* Changed FreqVector and SubstMatrix to dynamic size

Changed the FreqVector and the SubstMatrix type to dynamic sizing to make them usable
when the parametrisation on N (number of chars) becomes different for a model with gaps.
It also makes sure that all the data types we use are the same, no real point in using
statically typed matrices when most will still be dynamically typed (e.g. the partial
likelihood matrices).

* Fix to make tn93 tests work

* Parsimony alignment fix (#2)

* Minor fixes to make linter happy

Minor fixes to make clippy happy, removed unnecessary vec!, into_iter()
and the like.

* Fixed sequence order to match leaves

Fixed sequence order in the vector to match leaf indices. The code relied
on that property to begin with, but it was neither being tested nor
was it being enforced in any way which lead to very malformed alignments
in IndelMaP.
This contains the fix and a test to check that the order matches together
with the data for the test.

* Protein substitution matrix transpose fix (#3)

* Added struct for defining rounding

Added a struct that contains 2 values -- whether to round some of the numbers
and if yes, to how many decimal digits. Not necessary in general, but used in
testing against the values produced by the python scripts, so now rounding is
optional for parsimony scores derived from the substitution models and for the
branch percentiles.

* Added NodeIdx display and node id printing

Implemented Display for NodeIdx so that it always prints what kind of node
it is.
Added a function that helps with logging -- generates string "with ID xxx"
if there is an ID attached to the current node, or gives back an empty string.

* Fixed protein matrices from by row to by cols

The provided protein substitution matrices were actually given by rows
whereas the Matrix struct reads them by columns.
Transposed the matrices to match proper order of cols/rows.

* Added tests for output to appease codecov

Added tests for the helper functions to appease codecov so it lets me merge.

* Initial implementation of substitution likelihood

Initial implementation of the phylogenetic lieklihood for models of DNA
substitution.
Tests for the computations from the CB ETH lecture and the Molecular
Evolution book.

* Change LikelihoodCostFunction to trait

Refactored the LikelihoodCostFunction into a trait for future different
implementations.
Removed duplicated EvoModel code from SubstitutionModel module.
Added notes for things that need to be fixed:
    The character probabilities only work correctly for ACGT-, not
for ambiguous chars at the moment, and not for proteins;
    Evolutionary model should be a trait to then work with PIP and
anything else, will also fix the char probabilities when implemented.

* Refactoring likeihood computation for subst models

Refactoring likelihood computation, moved all relevant functions to
the substitution model module.

* EvolitionaryModel trait impl for DNA substitution

Added the EvolutionaryModel trait which should be used for all models
of sequence evolution.
Implemented the trait for DNA substitution models.
All generic implementations remain for all substitution models, but both
DNA and protein have to independently implement EvoModel trait to avoid
implementation conflicts for the template.
Also made sure that the basic character probabilities are computed through
the trait -- there was an error where ambiguous chars were ignored.
EvolutionaryModelInfo trait now requires a model with the EvoModel
trait rather than just a substitution model.

Additionally, fixed likelihood computation for multiple sites, expected only
one value before in the final array, but it should contain len(MSA) column
likelihoods.

* Implemented EvolutionaryModel for protein subst

Implemented the EvolutionaryModel trait for protein substitution models.
Added tests for getting correct character probabilities.

* Changed FreqVector and SubstMatrix to dynamic size

Changed the FreqVector and the SubstMatrix type to dynamic sizing to make them usable
when the parametrisation on N (number of chars) becomes different for a model with gaps.
It also makes sure that all the data types we use are the same, no real point in using
statically typed matrices when most will still be dynamically typed (e.g. the partial
likelihood matrices).

* Commit to fix rebase merge issues

Decided to rebase to main to use the Rounding and GapMultiplier classes,
probably was a bad idea to do it right now. Adding small fixes to correct
my rebasing blunders.

* Added pip model definition

Added the PIP model definition.
Added HKY for DNA to use in the PIP tests.

* Fixed HIVB Q matrix

Finally fixed the HIVB Q matrix to match the one in the python version
and the PhyML one.

* Refactor to make likelihood creation uniform

Changed signatures of the setup_dna_likelihood and setup_protein_likelihood
methods to match each other and the substitution model creation signatures:
the model name is now &str and both methods need a list of parameters.

* Tests for sanity of substitution likelihood

Added more test to check substitution likelihood computation sanity.
Checking that protein likelihood is also reversible.

* Tests for protein char probabilities + fix

Added tests for getting protein character probabilities at the leaves
and found a bug in the values, now fixed.

* Added generic impls for PIPModel

Added generic implementations of normalise, get_rate, get_p and get_stationary_distribution
for any size PIP model.
Added a unified method that makes a PIP matrix from a generic Substitution model.
Added a specific implementation of the EvolutionaryModel trait for PIP with protein models.

* More tests for DNA PIP model

More test scenarios for the DNA PIP model, checking that it is created correctly
when the parameters are provided properly and doesn't get created when there's not enough
parameters given.

* PIP protein model tests

Added PIP protein model tests, checking that the stationary frequencies are correct and that
the rates correspond to what the underlyin substitution model would define.

* get_idx_by_id function added to tree

Added a get_idx_by_id function to the tree struct to make node lookup easier
in tests.

* Added flag for normalisation in evo models

Added a flag to normalise the model matrices to all evolutionary models.

* Initial impl of PIP likelihood

Initial implementation of the PIP likelihood with tests based on the python
implementation.

* Fixed strange protein example tree

Fixed the phyml primate example tree to be rooted and removed the confusing duplicate
tree for the nogap alignment.

* Added test for PIP reversibility on rerooted tree

Added a test to verify PIP reversibility for a rerooted tree and DNA sequences.

* Added a test for protein PIP likelihood.

* Fixed surv probability becoming NaN, improved phi

Fixed survival probabilities becoming NaN when a branch length is set to 0.0,
now that means that the survival probability becomes 1.0.
Rearranged the phi computation to avoid huge numbers, using ln instead.

* Protein models get normalised now

Protein models now get normalised instead of ignoring the flag.

* Added tests to PIP methods missing from coverage

Added tests for methods in in the PIP models that were not covered before.

* Making sure the EvoModel trait is used in tests

Making sure the method called for PIP is the method from the EvolutionaryModel
trait rather than the implementation of the model so that the trait's methods
are tested properly and all are being covered.

* SubstModels treating unknown chars as X

Made sure that Substitution models treat potential unknown characters (including gaps)
as ambiguous chars (X).

* More tests for subst models

Made sure that substitution model methods are called through EvolutionaryModel trait
methods rather than directly.
Added tests for too many parameters for different DNA models to improve coverage.

* Minor changes to make clippy happy

* Change LikelihoodCostFunction to trait

Refactored the LikelihoodCostFunction into a trait for future different
implementations.
Removed duplicated EvoModel code from SubstitutionModel module.
Added notes for things that need to be fixed:
    The character probabilities only work correctly for ACGT-, not
for ambiguous chars at the moment, and not for proteins;
    Evolutionary model should be a trait to then work with PIP and
anything else, will also fix the char probabilities when implemented.

* Refactoring likeihood computation for subst models

Refactoring likelihood computation, moved all relevant functions to
the substitution model module.

* EvolitionaryModel trait impl for DNA substitution

Added the EvolutionaryModel trait which should be used for all models
of sequence evolution.
Implemented the trait for DNA substitution models.
All generic implementations remain for all substitution models, but both
DNA and protein have to independently implement EvoModel trait to avoid
implementation conflicts for the template.
Also made sure that the basic character probabilities are computed through
the trait -- there was an error where ambiguous chars were ignored.
EvolutionaryModelInfo trait now requires a model with the EvoModel
trait rather than just a substitution model.

Additionally, fixed likelihood computation for multiple sites, expected only
one value before in the final array, but it should contain len(MSA) column
likelihoods.

* Implemented EvolutionaryModel for protein subst

Implemented the EvolutionaryModel trait for protein substitution models.
Added tests for getting correct character probabilities.

* Added tests for aligned/unaligned read sequences

Added some test that check that the sequences don't get used as the MSA if
they are different lengths, otherwise they get copied to the MSA field.

* Changed FreqVector and SubstMatrix to dynamic size

Changed the FreqVector and the SubstMatrix type to dynamic sizing to make them usable
when the parametrisation on N (number of chars) becomes different for a model with gaps.
It also makes sure that all the data types we use are the same, no real point in using
statically typed matrices when most will still be dynamically typed (e.g. the partial
likelihood matrices).

* Protein substitution matrix transpose fix (#3)

* Added struct for defining rounding

Added a struct that contains 2 values -- whether to round some of the numbers
and if yes, to how many decimal digits. Not necessary in general, but used in
testing against the values produced by the python scripts, so now rounding is
optional for parsimony scores derived from the substitution models and for the
branch percentiles.

* Added NodeIdx display and node id printing

Implemented Display for NodeIdx so that it always prints what kind of node
it is.
Added a function that helps with logging -- generates string "with ID xxx"
if there is an ID attached to the current node, or gives back an empty string.

* Fixed protein matrices from by row to by cols

The provided protein substitution matrices were actually given by rows
whereas the Matrix struct reads them by columns.
Transposed the matrices to match proper order of cols/rows.

* Added tests for output to appease codecov

Added tests for the helper functions to appease codecov so it lets me merge.

* Initial implementation of substitution likelihood

Initial implementation of the phylogenetic lieklihood for models of DNA
substitution.
Tests for the computations from the CB ETH lecture and the Molecular
Evolution book.

* Refactoring likeihood computation for subst models

Refactoring likelihood computation, moved all relevant functions to
the substitution model module.

* Commit to fix rebase merge issues

Decided to rebase to main to use the Rounding and GapMultiplier classes,
probably was a bad idea to do it right now. Adding small fixes to correct
my rebasing blunders.

* Fixing rebase merge blunders

* Removed duplicate test

Removed a duplicate test coming from merge mistake during rebase
  • Loading branch information
junniest authored Nov 15, 2023
1 parent 6b37db3 commit b7b49c2
Show file tree
Hide file tree
Showing 29 changed files with 5,188 additions and 1,241 deletions.
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"git.ignoreLimitWarning": true
}
1 change: 1 addition & 0 deletions phylo/data/Huelsenbeck_example.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
((Species1:0.1,Species2:0.1):0.2,((Species3:0.1,Species4:0.1):0.1,Species5:0.2):0.1):0.0;
13 changes: 13 additions & 0 deletions phylo/data/Huelsenbeck_example_long_DNA.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
>Species2
TGACTTTAAAGGACGACCCTACCAGGGCGGACACAAACGGACAGCGCAGC
>Species4
CGAGTTCAGAAGACGGCACCAACACAGCGGACGTATGCAGACGACGCACC
>Species5
TGCCCTTAGGAGGCGGCACTAACACCGCGGACGAGTGCGGACAACGTACC
>Species1
TAACTGTAAAGGACAACACTAGCAGGCCAGACGCACACGCACAGCGCACC
>Species3
CAAGTTTAGAAAACGGCACCAACACAACAGACGTATGCAACTGACGCACC



1 change: 1 addition & 0 deletions phylo/data/Huelsenbeck_example_reroot.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
((Species5:0.20000000000000004,(Species1:0.10000000000000003,Species2:0.10000000000000003):0.30000000000000004):0.05,(Species3:0.10000000000000003,Species4:0.10000000000000003):0.05);
10 changes: 10 additions & 0 deletions phylo/data/ambiguous_example.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
>orangutan
XCCCCTCCCCTCATGTGTAC
>chimp
ACCCCTCCCCTCATGTGTAC
>human
ACCCCTCCCCTCATGTGTAC
>gorilla
ACCCCTCCCCTCATGTGTAC
>unicorn
TGCCCTCCCCTCATGTGTAC
1 change: 1 addition & 0 deletions phylo/data/ambiguous_example.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(unicorn:15,(orangutan:13,(gorilla:10.25,(human:5.5,chimp:5.5):4.75):2.75):2);
10 changes: 10 additions & 0 deletions phylo/data/ambiguous_example_N.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
>orangutan
NCCCCTCCCCTCATGTGTAC
>chimp
ACCCCTCCCCTCATGTGTAC
>human
ACCCCTCCCCTCATGTGTAC
>gorilla
ACCCCTCCCCTCATGTGTAC
>unicorn
TGCCCTCCCCTCATGTGTAC
40 changes: 40 additions & 0 deletions phylo/data/phyml_protein_example.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
>Patas
MASGILLNVKEEVTCPICLELLTEPLSLPCGHSFCQACITANHKKSMLYKEEERSCPVCRISYQPENIQPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDRKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVLADFEQLREILDWEESNELQYLEKEEEDILKSLTKSETKMVRQTQYVRELISDLEHRLQGSMMELLQGVDGIIKRIENMTLKKPETFHKNQRRVFRAPALKGMLDMFRELTDVRRYWVDVTLAPNNISHVVIAEDKRQVSSRNPQIMYWAQGKLF--------------------QSLKNFNYCTGILGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMYDVEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGVKYSVFQD-----------GSSHTPFAPFIAPLSVIFCPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>Colobus
MASGILVNIKEEVTCPICLELLTEPLSLHCGHSFCQACITANHKKSMLYKEGERSCPVCRISYQPENIRPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDRKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVLADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELVSDLEHRLQGSVMELLQGVDGIIKRIEDMTLKKPKTFPKNQRRVFRAPDLKGMLDMFRELTDVRRYWVDVTLAPNNISHAVIAEDKRRVSSPNPQIMYRAQGTLF--------------------QSLKNFIYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGVKYSVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>DLangur
MASGILVNIKEEVTCPICLELLTEPLSLHCGHSFCQACITANHKKSMLYKEGERSCPVCRISYQPENIRPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDRKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDCDKTNVLADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELISDLEHRLQGSMMELLQGVDGIIKRIENMTLKKPKTFPKNQRRVFRAPDLKGILDMFRELTDVRRYWVDVTLAPNNISHAVIAEDKRQVSSPNPQIMCRARGTLF--------------------QSLKNFIYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGVKYNVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>AGM_cDNA
MASGILVNVKEEVTCPICLELLTEPLSLPCGHSFCQACITANHKESMLYKEEERSCPVCRISYQPENIQPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDSKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVSADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELISDLEHRLQGSMMELLQGVDGIIKRVENMTLKKPKTFHKNQRRVFRAPDLKGMLDMFRELTDVRRYWVDVTLAPNNISHAVIAEDKRQVSYRNPQIMYQSPGSLFGSLTNFSYCTGVPGSQSITSGKLTNFNYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDATYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGDKYSVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>Tant_cDNA
MASGILLNVKEEVTCPICLELLTEPLSLPCGHSFCQACITANHKESMLYKEEERSCPVCRISYQPENIQPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDSKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVSADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELISDLEHRLQGSMMELLQGVDGIIKRIENMTLKKPKTFHKNQRRVFRAPDLKGMLDMFRELTDVRRYWVDVTLAPNNISHAVIAEDKRQVSYQNPQIMYQAPGSSFGSLTNFNYCTGVLGSQSITSRKLTNFNYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDATYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGDKYSVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>Rhes_cDNA
MASGILLNVKEEVTCPICLELLTEPLSLHCGHSFCQACITANHKKSMLYKEGERSCPVCRISYQPENIQPNRHVANIVEKLREVKLSPEEGQKVDHCARHGEKLLLFCQEDSKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVSADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELISELEHRLQGSMMDLLQGVDGIIKRIENMTLKKPKTFHKNQRRVFRAPDLKGMLDMFRELTDARRYWVDVTLATNNISHAVIAEDKRQVSSRNPQIMYQAPGTLF------------------TFPSLTNFNYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQSDAMYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGVKYSVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>Baboon
MASGILLNVKEEVTCPICLELLTEPLSLPCGHSFCQACITANHRKSMLYKEGERSCPVCRISYQPENIQPNRHVANIVEKLREVKLSPEEGLKVDHCARHGEKLLLFCQEDSKVICWLCERSQEHRGHHTFLMEEVAQEYHVKLQTALEMLRQKQQEAEKLEADIREEKASWKIQIDYDKTNVSADFEQLREILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQYMRELISDLEHRLQGSMMELLQGVDGIIKRIENMTLKKPKTFHKNQRRVFRAPDLKGMLDMFRELTDVRRYWVDVTLAPNNISHAVIAEDKRQVSSRNPQITYQAPGTLF------------------SFPSLTNFNYCTGVLGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLQEGVKYSVFQD-----------GSSHTPFAPFIVPLSVIICPDRVGVFVDYEACTVSFFNITNHGFLIYKFSQCSFSKPVFPYLNPRKCTVPMTLCSPSS
>Gibbon
MASGILVNVKEKVTCPICLELLTQPLSLDCGHSFCQACLTANHKTSMPDE-GERSCPVCRISYQHKNIRPNRHVANIVEKLREVKLSPEEGQKVDHCARHGKKLLLFCQEDRKVICWLCERSQEHRGHHTFLTEEVAQEYQMKLQAALQMLRQKQQEAEELEADIREEKASWKTQIQYDKTNILADFEQLRHILDWVESNELQNLEKEEKDVLKRLMRSEIEMVQQTQSVRELISDLEHRLQGSVMELLQGVDGVIKRMKNVTLKKPETFPKNRRRVFRAADLKVMLEVLRELRDVRRYWVDVTVAPNNISYAVISEDMRQVSSPEPQIIFEAQGTIS--------------------QTFVNFNYCTGILGSQSITSGKHYWEVDVSKKSAWILGVCAGLQPDAMYNIEQNENYQPKYGYWVI-------------------------------------------------------------GLEEGVKCNAFQD-----------GSIHTPSAPFVVPLSVNICPDRVGVFLDYEACTVSFFNITDHGFLIYKFSHCSFSQPVFPYLNPRKCTVPMTLCSPSS
>Orangutan
MASGILVNVKEEVTCPICLELLTQPLSLDCGHSFCQACLTANHKKSTLDK-GERSCPVCRVSYQPKNIRPNRHVANIVEKLREVKLSPE-GQKVDHCARHGEKLLLFCKEDGKVICWLCERSQEHRGHHTFLTEEVAQKYQVKLQAALEMLRQKQQEAEELEADIREEKASWKTQIQYDKTSVLADFEQLRDILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQSVRELISDVEHRLQGSVMELLQGVDGIIKRMQNVTLKKPETFPKNQRRVFRAPNLKGMLEVFRELTDVRRYWVDVTVAPNDISYAVISEDMRQVSCPEPQIIYGAQGTTY--------------------QTYVNFNYCTGILGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMYNIEQNENYQPQYGYWVI-------------------------------------------------------------GLEEGVKCSAFQD-----------GSFHNPSAPFIVPLSVIICPDRVGVFLDYEACTVSFFNITNHGFLIYKFSHCSFSQPVFPYLNPRKCRVPMTLCSPSS
>Human
MASGILVNVKEEVTCPICLELLTQPLSLDCGHSFCQACLTANHKKSMLDK-GESSCPVCRISYQPENIRPNRHVANIVEKLREVKLSPE-GQKVDHCARHGEKLLLFCQEDGKVICWLCERSQEHRGHHTFLTEEVAREYQVKLQAALEMLRQKQQEAEELEADIREEKASWKTQIQYDKTNVLADFEQLRDILDWEESNELQNLEKEEEDILKSLTNSETEMVQQTQSLRELISDLEHRLQGSVMELLQGVDGVIKRTENVTLKKPETFPKNQRRVFRAPDLKGMLEVFRELTDVRRYWVDVTVAPNNISCAVISEDKRQVSSPKPQIIYGARGTRY--------------------QTFVNFNYCTGILGSQSITSGKHYWEVDVSKKTAWILGVCAGFQPDAMCNIEKNENYQPKYGYWVI-------------------------------------------------------------GLEEGVKCSAFQD-----------SSFHTPSVPFIVPLSVIICPDRVGVFLDYEACTVSFFNITNHGFLIYKFSHCSFSQPVFPYLNPRKCGVPMTLCSPSS
>Gorilla
MASGILVNVKEEVTCPICLELLTQPLSLDCGHSFCQACLTANHKKSMLDK-GESSCPVCRISYQPENIRPNRHVANIVEKLREVKLSPE-GQKVDHCARHGEKLLLFCQEDGKVICWLCERSQEHRGHHTFLTEEVAQEYQVKLQAALEMLRQKQQEAEELEADIREEKASWKTQIQYDKTNVLADFEQLRDILDWEESNELQNLEKEEEDILKRLTKSETEMVQQTQSVRELISDLEHRLQGSVMELLQGVDGVIKRMENVTLKKPETFPKNRRRVFRAPDLKGMLEVFRELTDVRRYWVDVTVAPNNISCAVISEDMRQVSSPKPQIIYGAQGTRY--------------------QTFMNFNYCTGILGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDATCNIEKNENYQPKYGYWVI-------------------------------------------------------------GLEEGVKCSAFQD-----------GSFHTPSAPFIVPLSVIICPDRVGVFLDYEACTVSFFNITNHGFLIYKFSHCSFSQPVFPYLNPRKCRVPMTLCSPSS
>Chimp
MASGILVNVKEEVTCPICLELLTQPLSLDCGHSFCQACLTANHKKSMLDK-GESSCPVCRISYQPENIRPNRHVANIVEKLREVKLSPE-GQKVDHCAHHGEKLLLFCQEDGKVICWLCERSQEHRGHHTFLTEEVAREYQVKLQAALEMLRQKQQEAEELEADIREEKASWKTQIQYDKTNVLADFEQLRDILDWEESNELQNLEKEEEDILKSLTKSETEMVQQTQSVRELISDLERRLQGSVMELLQGVDGVIKRMENVTLKKPETFPKNQRRVFRAPDLKGMLEVFRELTDVRRYWVDVTVAPNNISCAVISEDMRQVSSPKPQIIYGARGTRY--------------------QTFMNFNYCTGILGSQSITSGKHYWEVDVSKKSAWILGVCAGFQPDAMCNIEKNENYQPKYGYWVI-------------------------------------------------------------GLEEGVKCSAFQD-----------GSFHTPSAPFIVPLSVIICPDRVGVFLDYEACTVSFFNITNHGSLIYKFSHCSFSQPVFPYLNPRKCGVPMTLCSPSS
>Squirrel
MASRILGSIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESMLHQ-GERSCPLCRLPYQSENLRPNRHLASIVERLREVMLRPEERQNVDHCARHGEKLLLFCEQDGNIICWLCERSQEHRGHNTFLVEEVAQKYREKLQVALETMRQKQQDAEKLEADVRQEQASWKIQIQNDKTNIMAEFKQLRDILDCEESNELQNLEKEEKNILKRLVQSENDMVLQTQSVRVLISDLERRLQGSVVELLQDVDGVIKRIEKVTLQKPKTFLNEKRRVFRAPDLKRMLQVLKELTEVQRYWAHVTLVPSHPSYTIISEDGRQVRYQKPIR-----------------------------HLLVKVQYFYGVLGSPSITSGKHYWEVDVSNKRAWTLGVCVSLKCTANQSVSGTENYQPKNGYWVI-------------------------------------------------------------GLRNAGNYRAFQSSFEFR--DFLAGSRLTLSPPLIVPLFMTICPNRVGVFLDYEARTISFFNVTSNGFLIYKFSDCHFSYPVFPYFNPMTCELPMTLCSPRS
>Howler
MASKILVNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESR-----ERSCPLCRVSYHSENLRPNRHLANIAERLREVMLSPEEGQKVDRCARHGEKLLLFCQQHGNVICWLCERSEEHRGHRTSLVEEVAQKYREKLQAALEMMRQKEQDAEMLEADVREEQASWKIQIENDKTSTLAEFKQLRDILDCEESNELQKLEKEEENLLKRLVQSENDMVLQTQSIRVLIADLERRLQGSVMELLQGVEGVIKRIKNVTLQKPETFLNEKRRVFQAPDLKGMLQVFKELKEVQCYWAHVTLIPNHPSCTVISEDKREVRYQEQIHH----------------------------HPSMEVKYFYGILGSPSITSGKHYWEVDVSNKSAWILGVCVSLKCIG--NFPGIENYQPQNGYWVIGLRNADNYSAFQDAVPETENYQPKNRN-RFTGLQNADNCSAFQNAFPGIQSYQPKKSHLFTGLQNLSNYNAFQNKVQYNYIDFQDDSLSTPSAPLIVPLFMTICPKRVGVFLDYEACTVSFFNVTSNGYLIYKFSNCQFSYPVFPYFSPMTCELPMTLCSPSS
>Spider
MASEILLNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESTLHQ-GERSCPLCRVSYQSENLRPNRHLANIAERLREVMLSPEEGQKVDRCARHGEKLLLFCQQHGNVICWLCERSQEHRGHSTFLVEEVAQKYQEKLQVALEMMRQKQQDAEKLEADVREEQASWKIQIENDKTNILAEFKQLRDILDCEESNELQNLEKEEENLLKTLAQSENDMVLQTQSMRVLIADLEHRLQGSVMELLQDVEGVIKRIKNVTLQKPKTFLNEKRRVFRAPDLKGMLQVFKELKEVQCYWAHVTLVPSHPSCTVISEDERQVRYQEQIH-----------------------------QPSVKVKYFCGVLGSPGFTSGKHYWEVDVSDKSAWILGVCVSLKCTA--NVPGIENYQPKNGYWVIGLQNANNYSAFQDAVPGTENYQPKNGNRRNKGLRNADNYSAFRDTF------QPINDSWVTGLRNVDNYNAFQDAVKYS--DFQDGSCSTPSAPLMVPLFMTICPKRVGVFLDCKACTVSFFNVTSNGCLIYKFSKCHFSYPVFPYFSPMICKLPMTLCSPSS
>Woolly
MASEILVNIKEEVTCPICLDLLTEPLSLDCGHSFCQACITADHKESTLHQ-GERSCPLCRVGYQSENLRPNRHLANIAERLREVMLSPEEGQKVDRCARHGEKLLLFCQQHGNVICWLCERSQEHRGHSTFLVEEVAQKYREKLQVALEMMREKQQDAEKLEADVREEQASWKIQIKNDKTNILAEFKQLRDILDCEESNELQNLEKEEENLLKILAQSENDMVLQTQSMRVLIADLEHRLQGSVMELLQGVEGIIKRTTNVTLQKPKTFLNEKRRVFRAPNLKGMLQVFKELKEVQCYWAHVTLVPSHPSCAVISEDQRQVRYQKQRH-----------------------------RPSVKAKYFYGVLGSPSFTSGKHYWEVDVSNKSAWILGVCVSLKCTA--NVPGIENYQPKNGYWVIGLQNADNYSAFQDAVPGTEDYQPKNGCWRNTGLRNADNYSAFQDVF------QPKNDYWVTGLWNADNYNAFQDAGKYS--DFQDGSCSTPFAPLIVPLFMTIRPKRVGVFLDYEACTVSFFNVTSNGCLIYKFSNCHFSCPVFPYFSPMTCKLPMTLCSPSS
>PMarmoset
MASRILVNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESTLHQ-GERSCPLCRMSYPSENLRPNRHLANIVERLKEVMLSPEEGQKVDHCARHGEKLLLFCQQDGNVICWLCERSQEHRGHHTFLVEEVAEKYQGKLQVALEMMRQKQQDAEKLEADVREEQASWKIQIQNDKTNIMAEFKQLRDILDCEESKELQNLEKEEKNILKRLVQSESDMVLQTQSIRVLISDLERRLQGSVMELLQGVDDVIKRIEKVTLQKPKTFLNEKRRVFRAPDLKGMLQAFKELTEVQRYWAHVTLVPSHPSCTVISEDERQVRYQVPIH-----------------------------QPLVKVKYFYGVLGSLSITSGKHYWEVDVSNKRGWILGVCGSWKCNAKWNVLRPENYQPKNGYWVI-------------------------------------------------------------GLRNTDNYSAFQDAVKYS--DVQDGSRSVSSGPLIVPLFMTICPNRVGVFLDYEACTISFFNVTSNGFLIYKFSNCHFSYPVFPYFSPTTCELPMTLCSPSS
>Tamarin
MASRILVNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESTPHQ-GERSCPLCRMSYPSENLRPNRHLANIVERLKEVMLSPEEGQKVGHCARHGEKLLLFCEQDGNVICWLCERSQEHRGHHTLLVEEVAEKYQEKLQVALEMMRQKQQDAEKLEADVREEQASWKIQIRNDKTNIMAEFKQLRDILDCEESKELQNLEKEEKNILKRLVQSESDMVLQTQSMRVLISDLERRLQGSVLELLQGVDDVIKRIETVTLQKPKTFLNEKRRVFRAPDLKAMLQAFKELTEVQRYWAHVTLVPSHPSYAVISEDERQVRYQFQIH-----------------------------QPSVKVNYFYGVLGSPSITSGKHYWEVDVTNKRDWILGICVSFKCNAKWNVLRPENYQPKNGYWVI-------------------------------------------------------------GLQNTNNYSAFQDAVKYS--DFQIGSRSTASVPLIVPLFMTIYPNRVGVFLDYEACTVSFFNVTNNGFLIYKFSNCHFSYPVFPYFSPMTCELPMTLCSPSS
>Titi
MASRILVNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKESTLHQ-GERSCPLCRISYPSENLRPNRHLANIVERLREVVLSPEEGQKVDLCARHGEKLLLFCQQDGNVICWLCERSQEHRGHHTFLVEEVAQTYRENLQVVLEMMRQKHQDAEKLEADVREEQASWKIQIQNDKTNIMAEFKQLRDILDCEESNELQNLEKEEKNILKRLVQSENDMVLQTQSISVLISDLEHRLQGSVMELLQGVDGVIKRVKNVTLQKPKTFLNEKRRVFRVPDLKGMLQVSKELTEVQRYWAHVTLVASHPSRAVISEDERQVRYQEWIH-----------------------------QSSGRVKYFYGVLGSPSITSGKHYWEVDVSNKSAWILGVCVSLKCAANRNGPGVENYQPKNGYWVI-------------------------------------------------------------GLRNADNYSAFQDSVKYN--DFQDGSRSTTYAPLIVPLFMTICPNRVGVFLDYEACTVSFFNVTSNGFLIYKFSNCHFSYPVFPYFSPMTCELPMTLCSPRS
>Saki
MASRILMNIKEEVTCPICLELLTEPLSLDCGHSFCQACITANHKKSMLHQ-GERSCPLCRISYPSENLRPNRHLANIVERLREVMLSPEEGQKVDHCARHGEKLLLFCQQDGNVICWLCERSQEHRGHHTLLVEEVAQTYRENLQVALETMRQKQQDAEKLEADVREEQASWKIQIRDDKTNIMAEFKQLRDILDCEESNELQILEKEEKNILKRLTQSENDMVLQTQSMGVLISDLEHRLQGSVMELLQGVDEVIKRVKNVTLQKPKTFLNEKRRVFRAPDLKGMLQVFKELTEVQRYWVHVTLVPSHLSCAVISEDERQVRYQERIH-----------------------------QSFGKVKYFYGVLGSPSIRSGKHYWEVDVSNKSAWILGVCVSLKCTANRNGPRIENYQPKNGYWVI-------------------------------------------------------------GLWNAGNYSAFQDSVKYS--DFQDGSHSATYGPLIVPLFMTICPNRVGVFLDYEACTVSFFNVTSNGFLIYKFSNCRFSDSVFPYFSPMTCELPMTLCSPRS
1 change: 1 addition & 0 deletions phylo/data/phyml_protein_example.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(((((((((Spider:0.03308191,Woolly:0.03582163):0.01444277,Howler:0.06799737):0.02909882,(((PMarmoset:0.02787360,Tamarin:0.03681352):0.01865025,Squirrel:0.08629746):0.01112128,(Saki:0.03881905,Titi:0.04068062):0.01757714):0.00607569):0.20818264,(((Chimp:0.01029099,Gorilla:0.00446741):0.00330774,Human:0.01513926):0.00720972,(Gibbon:0.05851278,Orangutan:0.03164833):0.00204286):0.02732959):0.04351306,(Colobus:0.00814254,DLangur:0.00661586):0.00733489):0.00608429,Patas:0.02612272):0.00687099,(AGM_cDNA:0.00495553,Tant_cDNA:0.00344975):0.00775707):0.00140317,Baboon:0.00482829):0.0,Rhes_cDNA:0.01205729):0.0;
Loading

0 comments on commit b7b49c2

Please sign in to comment.