Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cymon working #6

Open
wants to merge 88 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
b8be055
Initial configuration
cymon Nov 18, 2016
2064d66
mcmc.writeProposalProbs can now return a dictionary of proposal:values
cymon Nov 18, 2016
a83025f
Added vim .swp to gitignore
cymon Nov 18, 2016
501e56b
Merge branch 'master' into cymon-working
cymon Nov 25, 2016
7083997
func.summariseMcmcPrams can now return a dictionary with makeDict arg…
cymon Nov 29, 2016
82a3e04
Changed gsl search paths in setup.py
cymon Nov 30, 2016
13adc1b
Merge branch 'master' into cymon-working
cymon Dec 23, 2016
2c406d9
Merge branch 'master' into cymon-working
cymon Dec 23, 2016
16b494d
Fixed makeDict in writeProposalProbs() to deal with multiple partitio…
cymon Jan 2, 2017
e79571d
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Jan 2, 2017
ab16187
Merge branch 'master' into cymon-working
cymon Jan 16, 2017
766e6e3
Added new GSL package library paths for Ubuntu 16.04 and greater. Fixed
cymon Jan 18, 2017
0a4ff21
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Feb 7, 2017
8e8800e
Return all stats from tail-area prob test. Add offset to writePhylip()
cymon Mar 7, 2017
5362b28
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Mar 7, 2017
72aaf1b
Fixed print statement
cymon Mar 7, 2017
74893a1
Added empirical protein model stmtREV of Li et al 2014
cymon Mar 17, 2017
79b2b19
Missing comma
cymon Mar 17, 2017
f71214f
Merge branch 'master' into cymon-working
cymon Mar 17, 2017
71ff540
Merge branch 'master' of https://github.com/cymon/p4-phylogenetics in…
cymon Mar 17, 2017
fb5b715
Typo: really should test these things before committing...
cymon Mar 20, 2017
ba31dde
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Mar 20, 2017
2d17c43
Handle aa ambiguities in Dayhoff recoding
cymon May 3, 2017
6d9b47b
Resolved merge conflict by ingnoring upstream.
cymon May 5, 2017
b382b60
Resolved merge conflict by ingnoring upstream.
cymon Jun 7, 2017
6ea768b
Merge branch 'master' into cymon-working
cymon Jun 7, 2017
398caab
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Aug 30, 2017
2fab389
Merge branch 'master' into cymon-working
cymon Oct 18, 2017
f089868
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Oct 18, 2017
d1299c2
Added option to start recoding protein data into group starting with …
cymon Dec 11, 2017
a0e538b
Cleaned up getMinmaxChiSqGroups() - it might actually work now...
cymon Dec 12, 2017
329a62e
Allow Dayhoff recoding by numbers to start with 0
cymon Dec 12, 2017
a9889e9
Merge branch 'master' into cymon-working
cymon Jan 29, 2018
eaefe23
Merge branch 'master' into cymon-working
cymon Feb 1, 2018
569baf1
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Feb 1, 2018
aa0f4f0
Expanation of path modification in p4 start script
cymon Feb 2, 2018
82bde71
Merge branch 'master' into cymon-working
cymon Feb 19, 2018
311cd1a
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Feb 19, 2018
83981d1
Calculate tree heights (lead to root cumulative br lens)
cymon Mar 23, 2018
bd6fc70
Merge branch 'master' into cymon-working
cymon Mar 23, 2018
8168e1d
Merge branch 'master' into cymon-working
cymon Apr 10, 2018
d7acf86
Merge branch 'master' into cymon-working
cymon Apr 11, 2018
52b0180
Build both python2 and python3 pf binaries
cymon Apr 11, 2018
c0c1d10
Merge branch 'master' into cymon-working
cymon Apr 23, 2018
c973cdc
Merge branch 'master' into cymon-working
cymon May 3, 2018
7b3e1f2
Resolved merge conflict on make_pf.sh
cymon Aug 31, 2018
2af6015
Added warning to guessing dna/aa's with Phylip format
cymon Nov 22, 2018
2b8725d
Merge branch 'master' into cymon-working
cymon Nov 22, 2018
0ed07fa
Merge branch 'master' into cymon-working
cymon Mar 25, 2019
993b719
Change default python from 2 to 3
cymon Mar 25, 2019
01b6de7
Added protien models prasREV and gnetREV for Li and Filipe
cymon Mar 25, 2019
2fe6648
Merge branch 'new-models' into cymon-working
cymon Mar 25, 2019
52ae60b
Added greater precision for the values in gnetREV and prasREV
cymon Mar 29, 2019
e17b6fc
Defaulting to python3
Apr 1, 2019
53b52a7
io.Bytes to io.String in treefilelite.py and added /usr/local/lib64 t…
cymon May 15, 2019
137d128
Merge branch 'master' into cymon-working
cymon May 21, 2019
35fc9e1
Fixed BytesI0/StringIO issues in treefilelite.py
cymon May 21, 2019
dac236f
Merge branch 'master' into cymon-working
cymon Jul 26, 2019
2616241
Merge branch 'master' into cymon-working
cymon Sep 13, 2019
35d4cdb
Configuration for Ceta. Check that git is installed before call it.
cymon Sep 13, 2019
32d6df4
Resolved merge conflicts with upstream.
cymon Oct 28, 2019
2bf4408
Resolved merge conflicts with upstream.
cymon Oct 28, 2019
2085ffe
Removed old ref to binary
cymon Oct 28, 2019
152636d
Resolved merge conflict by ingnoring upstream.
cymon Nov 12, 2019
6b6afa9
Added paths for weird include places on ceta
cymon Nov 12, 2019
e04a24c
Merge branch 'cymon-working' of https://github.com/cymon/p4-phylogene…
cymon Nov 12, 2019
16fc201
Merge branch 'master' into cymon-working
cymon Nov 18, 2019
7cb08a4
Merge branch 'master' into cymon-working
cymon Nov 28, 2019
a86d1ab
Added a 'stop' sample parameter to pnumbers.Numbers
cymon Nov 29, 2019
dc95c23
Merge branch 'master' into cymon-working
cymon Dec 19, 2019
04c29eb
Merge branch 'master' into cymon-working
cymon Jan 6, 2020
0c8b535
Merge branch 'master' into cymon-working
cymon Jan 30, 2020
f0835f0
Merge branch 'master' into cymon-working
cymon Feb 18, 2020
87d7f46
Merge branch 'master' into cymon-working
cymon Mar 4, 2020
2a51726
Merge branch 'master' into cymon-working
cymon Mar 5, 2020
a54c906
Merge branch 'master' into cymon-working
cymon Mar 12, 2020
4e1781b
Merge branch 'master' into cymon-working
cymon Apr 19, 2020
44d48f8
Merge branch 'master' into cymon-working
cymon Sep 9, 2020
e03175f
Merge branch 'master' into cymon-working
cymon Sep 10, 2020
038264c
Reverting func.py to master
cymon Sep 10, 2020
d7bc881
Merge branch 'master' into cymon-working
cymon Mar 12, 2021
07ba7b0
Merge branch 'master' into cymon-working
cymon Jan 7, 2022
7a0553f
Merge branch 'master' into cymon-working
cymon Feb 15, 2022
d8b7d48
Merge branch 'master' into cymon-working
cymon Apr 18, 2022
688f163
Resolved merge conflict by being confused.
cymon Apr 21, 2022
ff958f2
Merge branch 'master' into cymon-working
cymon Jul 4, 2022
950af23
Manual removed a left over >>>>HEAD
cymon Jul 4, 2022
7f2ffc6
Removed >>>>HEAD after failed merge
cymon Jul 4, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .buildconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[default]
name=Default
runtime=host
config-opts=
run-opts=
prefix=/home/cymon/.cache/gnome-builder/install/p4-phylogenetics.git.cymon/host
app-id=
postbuild=
prebuild=
default=true
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ build/
# Sphinx documentation
share/sphinxdoc/_build/


make_pf.sh
setup.py
5 changes: 5 additions & 0 deletions Pf/defines.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@
#define RMATRIX_STMTREV 118
#define RMATRIX_VT 119
#define RMATRIX_PMB 120
#define RMATRIX_PRASREV 121
#define RMATRIX_GNETREV 122

#define PIVEC_NONE 0
#define PIVEC_TUPLE 1

#define GAP_CODE -1
#define QMARK_CODE -2
Expand Down
7 changes: 6 additions & 1 deletion Pf/p4_model.c
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,6 @@ void p4_newRMatrix(p4_model *aModel, int pNum, int mNum, int free, int spec)
else if(spec == RMATRIX_RTREV) {
rtRevRMatrix(aRMatrix->bigR);
}

else if(spec == RMATRIX_TMJTT94) {
tmjtt94RMatrix(aRMatrix->bigR);
}
Expand Down Expand Up @@ -412,6 +411,12 @@ void p4_newRMatrix(p4_model *aModel, int pNum, int mNum, int free, int spec)
else if(spec == RMATRIX_STMTREV) {
stmtREVRMatrix(aRMatrix->bigR);
}
else if(spec == RMATRIX_PRASREV) {
prasREVRMatrix(aRMatrix->bigR);
}
else if(spec == RMATRIX_GNETREV) {
gnetREVRMatrix(aRMatrix->bigR);
}
else if(spec == RMATRIX_VT) {
vtRMatrix(aRMatrix->bigR);
}
Expand Down
289 changes: 288 additions & 1 deletion Pf/proteinModels.c

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions Pf/proteinModels.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ void gcpREVRMatrix(double **r);
void stmtREVRMatrix(double **r);
void vtRMatrix(double **r);
void pmbRMatrix(double **r);
void prasREVRMatrix(double **r);
void gnetREVRMatrix(double **r);
15 changes: 7 additions & 8 deletions bin/p4
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,11 @@ argvAfterDoubleDash = []
# back to being an empty string, the way it usually is when you run interactive
# python.

# Cymon seems to think this is needed to pull in the correct libraries:
# sys.path = [os.path.dirname(sys.path[0])] + sys.path
sys.path[0] = ''
# If you have multiple p4 repositories you will need this path modification
# to pull in the correct p4 libs: ensure that the p4 script is being executed
# from the <repo>/bin directory
sys.path = [os.path.dirname(sys.path[0])] + sys.path
#sys.path[0] = ''

if len(sys.argv) > 1: # Is there stuff on the command line?
for f in sys.argv[1:]:
Expand Down Expand Up @@ -109,7 +111,6 @@ if doDrawTreesWithNodeNumbers:
for t in var.trees:
t.draw(showInternalNodeNames=1, addToBrLen=0.2, width=None, showNodeNums=1, partNum=None, model=None)


if forceExitAtEnd:
sys.exit()
if exitAtEnd:
Expand All @@ -127,7 +128,7 @@ os.environ['PYTHONINSPECT'] = '1'
# Set the prompt (default is '>>> '). I suspect that doing this might
# screw up using the python interpreter in python-mode in emacs, but
# you'll want to confirm that.
sys.ps1 = 'p4> '
sys.ps1 = 'p4-gitc> '

# When the "from p4 import *" was done above, it read in any *.py
# files in ~/.p4. Now we want to execfile one more file, which of
Expand All @@ -153,12 +154,10 @@ if var.interactiveHelper:
elif var.interactiveHelper == 'ipython':
from IPython import start_ipython
if var.excepthookEditor:
sys.exit(start_ipython(argv=["-m", "p4.interactive.ipython", "-i"],
sys.exit(start_ipython(argv=["-m", "p4.interactive.ipython", "-i"],
user_ns=locals()))
else:
sys.exit(start_ipython(argv=["-i"], user_ns=locals()))



# Maybe the user just types 'p4' and wants to see a splash...
if len(sys.argv) == 1:
Expand Down
2 changes: 1 addition & 1 deletion p4/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@

if 1:
verboseStartupFiles = False # Turn on for debugging...
if 0: # If you want it, turn it on.
if 1: # If you want it, turn it on.
try:
import __main__
exec(open(os.environ['P4_STARTUP']).read()) # an official python anti-idiom!
Expand Down
90 changes: 65 additions & 25 deletions p4/alignment_manip.py
Original file line number Diff line number Diff line change
Expand Up @@ -1266,16 +1266,23 @@ def recodeDayhoff(self, firstLetter=False, symbols="123456"):
pass # They stay as they are.
elif c == 'x':
s.sequence[i] = '-'
elif c in 'bzj':
if ambigsBecomeGaps:
s.sequence[i] = '-'
else:
gm.append("Ambiguity character '%s' is not handled." % c)
gm.append("Perhaps set 'ambigsBecomeGaps' as True.")
raise P4Error(gm)
else:
# Maybe this should raise a P4Error?
print("skipping character '%s'" % c)
s.sequence = ''.join(s.sequence)
gm.append("Unknown character state '%s'" % c)
raise P4Error(gm)
s.sequence = string.join(s.sequence, '')
self.dataType = 'standard'
self.equates = {}
self.dim = 6
self.symbols = symbols

def recodeProteinIntoGroups(self, groups, firstLetter=False):
def recodeProteinIntoGroups(self, groups, firstLetter=False, startAtZero=False):
"""Recode protein data into user-specified groups, in place.

A generalization of :meth:`p4.alignment.Alignment.recodeDayhoff`
Expand All @@ -1288,7 +1295,10 @@ def recodeProteinIntoGroups(self, groups, firstLetter=False):
It does not make a new alignment-- it does the re-coding 'in-place'.

If arg *firstLetter* is set, then the character is recoded as the
first letter of its group rather than as a number.
first letter of its group rather than as a number.

If arg *startAtZero* groups will be coded with number beginning with
zero, else beginning with 1.
"""

gm = ['Alignment.recodeProteinIntoGroups()']
Expand All @@ -1308,7 +1318,10 @@ def recodeProteinIntoGroups(self, groups, firstLetter=False):
for s in theseSymbols:
assert s in self.symbols
assert theseSymbols.count(s) == 1
numeralSymbols = ['%i' % (i + 1) for i in range(nGroups)]
if startAtZero:
numeralSymbols = ['%i' % i for i in range(nGroups)]
else:
numeralSymbols = ['%i' % (i + 1) for i in range(nGroups)]
firstLetters = [gr[0] for gr in myGroups]

for s in self.sequences:
Expand Down Expand Up @@ -2653,6 +2666,8 @@ def getMinmaxChiSqGroups(self, percent_cutoff=0.05, min_bins=2, max_bins=20,
n_choices=1000, seed=42, verbose=False):
"""An interface for Susko and Roger's minmax-chisq program.

This returns the maximum number of bins (groups of amino-acids) that maintains composition homogeneity.

Susko, E. and Roger, A.J. (2007). On reduced amino acid alphabets for
phylogenetic inference. Mol. Biol. Evol. 24:2139-2150.

Expand Down Expand Up @@ -2706,31 +2721,56 @@ def getMinmaxChiSqGroups(self, percent_cutoff=0.05, min_bins=2, max_bins=20,
# Need the binning before pvalue falls below percent_cutoff
# They could be all <= 0.05 or all >= 0.05 because of the binning range
pvalues = [float(bin[0].split()[1]) for bin in results]
if not pvalues[0] >= percent_cutoff:
print("No p-value <= %s" % percent_cutoff)
return None
if not any(pv for pv in pvalues if pv >= percent_cutoff):
print("No p-value >= %s" % percent_cutoff)
print("\nminmax-chisq output:\n%s" % stdout)
return None
if not any(pv for pv in pvalues if pv <= percent_cutoff):
print("No p-value <= %s" % percent_cutoff)
print("\nminmax-chisq output:\n%s" % stdout)
return None
opt_bin = False
#Find the maximum number of groups that maintains homogeneity
#Loop through p-values and find the first value that is less than the
#cutoff, then pick the previous bin
#Find the first homogeneous bin
found_homogeneous = False
for i, bin_result in enumerate(results):
nbins, pvalue = bin_result[0].split()
if float(pvalue) <= percent_cutoff:
opt_bin = results[i - 1]
break
nbins, pvalue = opt_bin[0].split()
scores = opt_bin[1].split()
amino_acid_order = "A R N D C Q E G H I L K M F P S T W Y V".lower().split()
c = zip(scores, amino_acid_order)
groups = {}
for bin, amino in c:
sbin = str(bin)
groups[sbin] = groups.get(sbin, "") + amino
if verbose:
if verbose:
print("Doing bin %s, p-value = %s" % (bin_result, pvalue))
if not found_homogeneous:
if float(pvalue) >= percent_cutoff:
found_homogeneous = True
if verbose:
print("\tFound Homogen %s, p-value = %s" % (bin_result, pvalue))
continue
else:
#Find the next result < cuttoff and pick the previous bin
if float(pvalue) <= percent_cutoff:
opt_bin = results[i-1]
break
if not opt_bin:
print("Error: unable to find an optimal bin")
print("- try increasing number of bins, or lowering the %% cutoff")
print("\nminmax-chisq output:\n%s" % stdout)
print("\nMaximum number of bins that maintains homogeneity: %s" % nbins)
print("\nGroups: %s" % ", ".join(groups.values()))

return groups.values()
return
else:
nbins, pvalue = opt_bin[0].split()
scores = opt_bin[1].split()
amino_acid_order = "A R N D C Q E G H I L K M F P S T W Y V".lower().split()
c = zip(scores, amino_acid_order)
groups = {}
for bin, amino in c:
sbin = str(bin)
groups[sbin] = groups.get(sbin, "") + amino
if verbose:
print("\nminmax-chisq output:\n%s" % stdout)
print("\nMaximum number of bins that maintains homogeneity: %s" % nbins)
print("\nGroups: %s" % ", ".join(groups.values()))
r = groups.values()
r.sort()
return r

def getKosiolAISGroups(self, tree, n_bins, remove_files=False, verbose=True):
"""An interface for Kosiol's program AIS, for grouping amino acids.
Expand Down
Binary file added p4/pf.cpython-36m-x86_64-linux-gnu.so-old
Binary file not shown.
18 changes: 12 additions & 6 deletions p4/pnumbers.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class Numbers(object):

"""

def __init__(self, inThing, col=0, skip=0):
def __init__(self, inThing, col=0, stop=0, skip=0):
self.data = []
self.bins = None
# self.binSize is a property
Expand All @@ -55,7 +55,7 @@ def __init__(self, inThing, col=0, skip=0):
if isinstance(inThing, numpy.ndarray):
inThing = list(inThing)
if inThing:
self.read(inThing, col, skip)
self.read(inThing, col, stop, skip)

@property
def binSize(self):
Expand All @@ -76,7 +76,7 @@ def binSize(self, binSize):
def binSize(self):
self._binSize = None

def read(self, inThing, col=0, skip=0):
def read(self, inThing, col=0, stop=0, skip=0):
"""Slurp in some more numbers.

You can use this repeatedly, eg if your numbers are in more
Expand All @@ -88,6 +88,7 @@ def read(self, inThing, col=0, skip=0):
try:
self.col = int(col)
self.skip = int(skip)
self.stop = int(stop)
except (ValueError, TypeError):
gm.append("Args col and skip must be ints")
raise P4Error(gm)
Expand All @@ -100,7 +101,7 @@ def read(self, inThing, col=0, skip=0):
# gm.append("File '%s' has %i lines, " % (inThing, len(theLines)))
# gm.append("but skip is set to %i." % self.skip)
# raise P4Error(gm)
skipsDone = 0
count = 0
digitsPlusMinus = '0123456789+-'
for aLine in theLines:
ll = aLine.lstrip()
Expand All @@ -116,8 +117,11 @@ def read(self, inThing, col=0, skip=0):
elif ll[0] not in digitsPlusMinus:
pass
else:
if skipsDone < self.skip:
skipsDone += 1
if count < self.skip:
count += 1
continue
elif self.stop > 0 and count >= self.stop:
break
else:
splitLine = aLine.split()
try:
Expand All @@ -134,6 +138,8 @@ def read(self, inThing, col=0, skip=0):
gm.append("Line '%s'. " % aLine.rstrip())
gm.append("Can't make sense of '%s'" % theOne)
raise P4Error(gm)
count += 1

elif isinstance(inThing, list):
for thing in inThing:
try:
Expand Down
17 changes: 17 additions & 0 deletions p4/tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1539,6 +1539,23 @@ def getSeqNumsAbove(self, nodeSpecifier):
seqNums.append(n.seqNum)
return seqNums

def treeHeight(self):
"""Mean, (min, max) of all leaf to root branch lengths"""
theights = self.getLeafToRootHeights()
return (p4.func.mean(theights), (min(theights), max(theights)))

def getLeafToRootHeights(self):
"""Get all leaf to root branch lengths"""
theights = []
for nodeNum in self.iterLeavesNoRoot():
nodeHeight = 0
while self.node(nodeNum).parent:
if nodeNum != self.root.nodeNum:
nodeHeight += self.node(nodeNum).br.len
nodeNum = self.node(nodeNum).parent.nodeNum
theights.append(nodeHeight)
return theights

# def getAllChildrenNums(self, specifier):
# """Returns a list of the nodeNums of all children of the specified node
# Ambiguous, unused.
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@
#my_include_dirs = ["/my/weird/include"]
#my_lib_dirs = ["/my/weird/lib"]

my_include_dirs = []
my_lib_dirs = []
my_include_dirs = ["/share/apps/include"]
my_lib_dirs = ["/usr/local/lib64", "/share/apps/lib64"]

likelyDirs = [ "/usr",
"/usr/local",
Expand Down