Clean up RNA degradation process #364

ggsun · 2018-10-19T22:57:14Z

This PR fixes #270. I performed a general cleanup of the code for the calculateRequest() and evolveState() methods of the rna_degradation process. During the process I realized through a
line profile (thanks, @tahorst!) that this particular line of code that computes a summation is consuming nearly 90% of the computation time for the entire method.

wcEcoli/models/ecoli/processes/rna_degradation.py

Line 272 in b499317

    
           self.writeToListener("RnaDegradationListener", "FractionActiveEndoRNases", sum(fracEndoRnaseSaturated))

I tried changing the summation function to np.sum - this actually made this line run slower! It turns out that the variable that was being summed over (fracEndoRnaseSaturated, or frac_endornase_saturated in the changed version of the file) is a unitless units Unum object. Apparently np.sum cannot sum up these objects in a memory-optimized way, but it does so anyway without any errors or warnings.

When I first convert this variable to a numpy ndarray by stripping off units and then use np.sum to sum it up, this cuts down the execution time of the entire calculateRequest() method from roughly 30ms to 2ms (See the new line profile), and the execution time of the entire simulation by 1-2 minutes depending on the platform.

I think this has important implications for our use of the units tool in general - we should make sure that we are using the methods that are built into the units package (units.sum(), units.dot() etc.), not numpy operations, if we are dealing with units objects.

tahorst

This looks great! A lot easier to follow the logic and good to see unnecessary computation was removed and great to see the speed improvement. I also like the added functions but think it would be helpful to have some more documentation on inputs/returns.

tahorst · 2018-10-20T00:34:24Z

models/ecoli/processes/rna_degradation.py

-		nTRNAsTotalToDegrade = np.round(sum(TargetEndoRNasesFullTRNA *
-				self.endoRnases.total() * 
-				self.KcatEndoRNases * (units.s * self.timeStepSec())).asNumber()
+		endornase_per_rna = total_endornase_counts.astype(np.float) / np.sum(rna_counts)


I'd expect this astype(np.float) is no longer necessary since we are importing division from future

models/ecoli/processes/rna_degradation.py

tahorst · 2018-10-20T00:39:30Z

models/ecoli/processes/rna_degradation.py

@@ -1,10 +1,8 @@
 #!/usr/bin/env python


Could remove the shebang while you're updating the file since this is just a class

jmason42 · 2018-10-20T01:15:51Z

TBH I think units should be stripped out entirely once we get into model execution code.

ggsun · 2018-10-20T02:05:09Z

The evaluationTime plots before and after the change:
evaluationTime_before.pdf
evaluationTime_after.pdf

ggsun · 2018-10-20T02:16:37Z

I also confirmed that the simulation results are identical with the change.

prismofeverything

This is an impressive improvement to this file in a number of ways, and I love how much code you were able to remove. I have no substantial suggestions but appreciate all the work you did here. I suspect we could make similar gains through a survey through the rest of the processes.

prismofeverything · 2018-10-22T23:54:22Z

models/ecoli/processes/rna_degradation.py

@@ -98,7 +102,7 @@ def initialize(self, sim, sim_data):
 		self.isRRna = sim_data.process.transcription.rnaData["isRRna"]
 		self.isTRna = sim_data.process.transcription.rnaData["isTRna"]

-		self.rnaLens = sim_data.process.transcription.rnaData['length'].asNumber()
+		self.rna_lengths = sim_data.process.transcription.rnaData['length'].asNumber()


Thank you for your heroic adjustment of all the variable names in this file, this makes it way more sane to read.

prismofeverything · 2018-10-23T17:18:27Z

models/ecoli/processes/rna_degradation.py

+4. Update RNA fragments (assumption: fragments are represented as a pool of
+nucleotides) created because of endonucleolytic cleavage
+5. Compute total capacity of exoRNases and determine fraction of nucleotides
+that can be diggested


typo: digested

prismofeverything · 2018-10-23T18:01:43Z

models/ecoli/processes/rna_degradation.py

+		else:
+			frac_endornase_saturated = (
+				countsToMolar * rna_counts / (self.Km + (
+				countsToMolar * rna_counts))


A simple thing, but the expression countsToMolar * rna_counts is used four times here. Something like

molar_counts = countsToMolar * rna_counts

would clear things up here.

prismofeverything · 2018-10-23T18:21:58Z

models/ecoli/processes/rna_degradation.py

+			"FractEndoRRnaCounts",
+			endornase_per_rna)
+
+		if self.EndoRNaseFunc:


Might be nice to document somewhere in this file the significance of the difference between EndoRNaseFunc and EndoRNaseCoop, it was unclear while trying to understand this code.

jmason42

Looks good. I am a bit leery of the way in which operational comments are mixed with modeling comments - more and more I feel that those should be abstracted apart - but regardless this is a vast improvement to one of the most egregious Processes.

jmason42 · 2018-10-23T18:49:51Z

models/ecoli/processes/rna_degradation.py

+		# Get total counts of RNAs including rRNAs and charged tRNAs
+		rna_counts = self.rnas.total().copy()
+		rna_counts[self.rrsaIdx] += self.ribosome30S.total()
+		rna_counts[[self.rrlaIdx, self.rrfaIdx]] += self.ribosome50S.total()


Are those inner brackets necessary?

It looks like they are. Removing them throws off an error for having two indexes to a 1-d vector.

jmason42 · 2018-10-23T18:50:42Z

models/ecoli/processes/rna_degradation.py

-			fracEndoRnaseSaturated = countsToMolar * rnasTotal / (self.Km + (countsToMolar * rnasTotal))
+		# Get counts of endoRNases
+		endornase_counts = self.endoRnases.total().copy()
+		total_kcat_endornase = units.dot(self.KcatEndoRNases, endornase_counts)


Again, I think units should be out of the picture at this point in the model. It's very difficult to keep track of which things are or aren't unit'd, due to pseudo-ducktyping. This has been our policy (I hope) everywhere else in the model.

jmason42 · 2018-10-23T18:52:57Z

models/ecoli/processes/rna_degradation.py

+			# Dissect RNAse specificity into mRNA, tRNA, and rRNA
+			mrna_specificity = np.dot(frac_endornase_saturated, self.isMRna)
+			trna_specificity = np.dot(frac_endornase_saturated, self.isTRna)
+			rrna_specificity = np.dot(frac_endornase_saturated, self.isRRna)


Assuming the vectors are boolean arrays, I wonder if this is the fastest way to do this operation. Pre-casting these variables to floats would probably improve evaluation times.

I tried out mrna_specificity = np.dot(frac_endornase_saturated, self.isMRna.astype(np.float)) but the differences seem negligible. Wouldn't np.dot cast the boolean to float internally anyway?

Yes - what I'm saying is that you can cast to float in initialize and then reuse those vectors. I'm sure it's at most a minor performance gain.

Oh, I see what you mean. Good point.

Also numpy doesn't always appear to cast in efficient ways. See #100.

With this process taking around 2 ms per time step with the changes, there's not that much to be gained by making more changes (< 5 sec per sim). You could focus on sections that take up a lot of time in the profiler (not sure if this is one of them) but it seems to be pretty efficient as is.

This additionally cuts down the runtime from ~2.0 ms to ~1.7 ms.

jmason42 · 2018-10-23T18:54:40Z

models/ecoli/processes/rna_degradation.py

-		nExoRNases = self.exoRnases.counts()
-		exoCapacity = nExoRNases.sum() * self.KcatExoRNase * (units.s * self.timeStepSec())
-		NucleotideRecycling = self.fragmentBases.counts().sum()
+		n_exornases = self.exoRnases.counts()


This is a place where capitalization helps readability. E.g. n_exoRNases is much more readable.

jmason42 · 2018-10-23T18:55:34Z

models/ecoli/processes/rna_degradation.py

+					rna_counts
+					)
+
+		return n_rnas_to_degrade


I think it's our policy to end each file with a newline. I know Sublime can be configured to do this automatically. Unsure about PyCharm.

ggsun added 4 commits October 18, 2018 21:08

Reformat calculateRequest() method in rna_degradation

f9e9cd5

Strip units off before doing np.sum

bad8b00

Build functions for repeated routines

e67308c

Cleanup .evolveState() for rna degradation

5c786c0

ggsun added performance code refinement labels Oct 20, 2018

tahorst approved these changes Oct 20, 2018

View reviewed changes

Apply proposed changes

ce0ceee

prismofeverything approved these changes Oct 23, 2018

View reviewed changes

jmason42 reviewed Oct 23, 2018

View reviewed changes

Apply proposed changes

20d3217

ggsun force-pushed the rnadeg-cleanup branch from 93aecbe to 20d3217 Compare October 23, 2018 21:26

ggsun mentioned this pull request Oct 23, 2018

Slow performance of the odeint solver used in 2CS process #367

Closed

ggsun merged commit 27b6ef5 into master Oct 24, 2018

ggsun deleted the rnadeg-cleanup branch November 6, 2018 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up RNA degradation process #364

Clean up RNA degradation process #364

ggsun commented Oct 19, 2018 •

edited

Loading

tahorst left a comment

tahorst Oct 20, 2018

tahorst Oct 20, 2018

jmason42 commented Oct 20, 2018

ggsun commented Oct 20, 2018

ggsun commented Oct 20, 2018

prismofeverything left a comment

prismofeverything Oct 22, 2018

prismofeverything Oct 23, 2018

prismofeverything Oct 23, 2018

prismofeverything Oct 23, 2018

jmason42 left a comment

jmason42 Oct 23, 2018

ggsun Oct 23, 2018

jmason42 Oct 23, 2018

jmason42 Oct 23, 2018

ggsun Oct 23, 2018

jmason42 Oct 23, 2018

ggsun Oct 23, 2018

tahorst Oct 23, 2018

ggsun Oct 23, 2018

jmason42 Oct 23, 2018

jmason42 Oct 23, 2018

Clean up RNA degradation process #364

Clean up RNA degradation process #364

Conversation

ggsun commented Oct 19, 2018 • edited Loading

tahorst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmason42 commented Oct 20, 2018

ggsun commented Oct 20, 2018

ggsun commented Oct 20, 2018

prismofeverything left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmason42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggsun commented Oct 19, 2018 •

edited

Loading