-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate flame chart for performance analysis #895
Conversation
Looks cool 🙂 What is the format of Another implementation for the HTML rendering would be to communicate through WebSockets, so that the performance report is updated live! But it's more complicated. |
@alexsegura The format is quite straightforward, it's a JSON object representing a tree, with dictionary keys |
Is it possible to format the output in RunSnakeRun |
@benjello No, RunSnakeRun takes cProfile files as input, we don't know how to output that format. |
😍 |
@Morendil there is also this other visualisation tool https://jiffyclub.github.io/snakeviz/ |
1723b70
to
791c448
Compare
8e928db
to
3240f64
Compare
@Morendil I'd need some help for the review and rebase, great job though! |
dc3901d
to
d325456
Compare
Rebased |
d325456
to
11c2372
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, I'm going to use this right now 🎉 🎉 🙂
openfisca_core/simulations.py
Outdated
@@ -106,13 +106,15 @@ def calculate(self, variable_name, period): | |||
period = periods.period(period) | |||
|
|||
self.tracer.enter_calculation(variable_name, period) | |||
self.tracer.record_start(time.time_ns() / (10**9)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the tracer call time.time_ns()
rather than the simulation?
I'd say that's its responsibility, as it is in charge of measuring performances. That would also avoid needlessly calling time
when we don't have a full tracer activated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If record_start
and enter_calculation
are called at the exact same time, and have really similar names, is there really a point to make them 2 different methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favour of moving time.time_ns()
to the tracer but:
- we need to test how sub-calculations measure their durations (aka
test_tracers.py::test_calculation_time_with_depth
) - time precision / os clocks differ (so, a simulation on bulk data might need some other time unit) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_start
and enter_calculation
have specific and different missions so I would keep them separated.
Besides, this is a first flame chart test based on timestamps but the chart values might take other performance information like the number of calls to the same variable which leads to a call to
enter_calculation
but norecord_start
(Even if, I know, it's bad to involve a possible future to a PR 😬).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to test how sub-calculations measure their durations (aka test_tracers.py::test_calculation_time_with_depth)
In that case, can we make the timestamp an optional argument, so that:
- the function by default uses the current timestamp
- it can be overrun to use a specific timestamp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_start
andenter_calculation
have specific and different missions so I would keep them separated.
Keeping them separated as 2 different functions in the tracers
module makes total sense 👍 .
But do they really have different missions from the simulation's point of view ? The fact that the simulation needs to tell the tracer to first enter a calculation, then to record the time feels to me like the simulation is telling the tracer how to do its job. And this makes me feel like it's an SRP violation.
I also find that the names read very similar at first sight:
tracer.record_start
: Record that something has started. Probably a calculation?tracer.enter_calculation
: Make the tracer enter a calculation. So tell the tracer a calculation has started, how is this different?
I don't really understand how capturing more information would lead to call
enter_calculation
but notrecord_start
, since we decided callrecord_start
in all cases, even if we are just reading from the cache.
This is not a hill I'm willing to die on, but I think we could have a single exposed method explicitly called record_calculation_start
that would then call private methods to both record_start_time
and enter_calculation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit containing the 1st suggested change on this branch, and another commit addressing my 2nd remark
on a separate branch trace-performance-suggestion
.
Feel free to adopt or drop the second one (I think it simplifies the tracer interface and keep responsibilities more separated, but up to you).
openfisca_core/simulations.py
Outdated
|
||
try: | ||
result = self._calculate(variable_name, period) | ||
self.tracer.record_calculation_result(result) | ||
return result | ||
|
||
finally: | ||
self.tracer.record_end(time.time_ns() / (10**9)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same 2 comments than for record_start
f378921
to
193b7ef
Compare
from collections import ChainMap | ||
import importlib.resources as pkg_resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗ This removes the 3.6 backward compatibility
New in version 3.7.
From https://docs.python.org/3/library/importlib.html#module-importlib.resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Morendil for suggesting pip install importlib_resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, importlib_resources
still requires this line to be changed into
import importlib_resources as pkg_resources
return PerformanceLog(self) | ||
|
||
def _get_time_in_sec(self) -> float: | ||
return time.time_ns() / (10**9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗ This is not 3.6 compatible
New in version 3.7.
From https://docs.python.org/3/library/time.html#time.time_ns
Connected to #877
New features
openfisca test myTest.yaml --performance
To test it:
openfisca test tests/formulas/irpp.yaml --performance -c openfisca_france
python -m http.server 5000
http://localhost:5000
When your yaml file contains multiple tests, only the last one is displayed in the flame chart.
You can use openfisca test --name_filter option to choose a specific test case.