Skip to content

julianzaugg/PairDistances

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PairDistances

PairDistances contains code for performing pairwise alignment of protein sequences. We avoid the use of an affine gap penalty by simply not inserting gaps. We find the most-likely alignment configuration and evolutionary time between each pair of sequences using a Markov Process approach (see Thorne - ``An Evolutionary model for Maximum Likelihood Alignment of DNA Sequences'').

At minimum, the user simply needs to provide a FASTA file of gap-free sequences and specify an output file location. A results tab-delimited file will be generated, of which an example of the format is shown below:

Seq1 Seq2 Time LogLikelihood Configuration S1Length S2Length S1Sequence S2Sequence
P21538 P21538 0.0 -11.440250364624116 3.0 4 4 KKRK KKRK
P00546 P21538 0.9582527115139087 -32.41837403019155 5.0 7 4 PQWRRKD KKRK
P00546 P00546 0.0 -22.16493324085298 6.0 7 7 PQWRRKD PQWRRKD
P37261 P21538 0.1473260089123268 -17.801589031708815 3.0 4 4 KKRP KKRK
P37261 P00546 1.7000000000000002 -33.38128364660126 6.0 4 7 KKRP PQWRRKD
P37261 P37261 0.0 -11.603586264197588 3.0 4 4 KKRP KKRP
P25555 P21538 1.1278615505309812 -30.37755252356034 5.0 7 4 PVRRRLS KKRK

There are a variety of input parameters that the user can provide to improve both the efficiency and accuracy of the pairwise alignment search. Details of which will be added here in the future, but in the mean time, look in the PairDistances.java file for brief descriptions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages