-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile mem #678
Profile mem #678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Not sure how to add that, but memory_profiler
is based on psutil
, which would also give us .cpu_times()['iowait']
or .disk_io_counters()['read_time']
/ write_time
/ busy_time
etc...
ocrd/ocrd/processor/helpers.py
Outdated
@@ -69,7 +71,13 @@ def run_processor( | |||
log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole) | |||
t0_wall = perf_counter() | |||
t0_cpu = process_time() | |||
processor.process() | |||
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you wanna add multiprocess=True
in there for processors that use multiprocessing
or similar, or that delegate to shell calls. Perhaps also include_children=True
(not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you wanna add
multiprocess=True
in there for processors that usemultiprocessing
or similar, or that delegate to shell calls.
I will.
Perhaps also
include_children=True
(not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).
if include_children:
mem += sum(_get_child_memory(process, meminfo_attr))
https://github.com/pythonprofilers/memory_profiler/blob/master/memory_profiler.py#L135-L136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also
include_children=True
(not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).if include_children: mem += sum(_get_child_memory(process, meminfo_attr))
I saw that, but the question remains: what is in that sum? (We probably don't want to count shared and CoW pages repeatedly. But it may be hard to calculate exactly. IIRC standard ps
and top
and even time
do count repetitions, so they appear too large – don't "add up" – for multiprocessed programs, whereas smem
gets it right. This tells me we are actually interested in the PSS, the proportional set size, rather than the RSS.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but PSS vs RSS is a more general problem/discussion. Any multiprocessing coder will know about that. And RSS is still a valuable information. So let's stick to your current implementation.
This will conflict with #652 (where that part of Oh, since I saw this also uses |
Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?
I haven't, yet. I did notice that when I profile the current process ( |
It looks like this is only cosmetic (since the workflow server does not handle the retval either, it just adds exception handling, and does not use multiprocessing itself, at least in the current implementation). But I would have to check/test. |
ocrd/ocrd/processor/helpers.py
Outdated
@@ -69,7 +71,12 @@ def run_processor( | |||
log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole) | |||
t0_wall = perf_counter() | |||
t0_cpu = process_time() | |||
processor.process() | |||
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True) | |
mem_usage = memory_usage(proc=processor.process, | |
interval=.1, timeout=None, timestamps=True, | |
# include sub-processes | |
multiprocess=True, include_children=True, | |
# get proportional set size instead of RSS | |
backend='psutil_pss') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is really that simple – let's make this configurable (PSS vs RSS measuring). Did you try this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, however we first need a mechanism to toggle this behavior. Simple idea would be to add CLI flags --mem-profile-rss
and --mem-profile-pss
.
@bertsky prefers a way to have this configurable via the logging configuration - @M3ssman do you have a good idea how that might work? That way, we could keep the CLI arguments small and the --help
output less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for the moment it could be as simple as
mem_prof = os.getenv('OCRD_PROFILE', '')
if mem_prof:
...
backend='psutil_pss' if mem_prof == 'PSS' else 'psutil')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like environment variables in general, but when the choice is between environment variables and CLI flags, I prefer the latter, because at least it's consistent. Because otherwise you could get OCRD_PROFILE=PSS ocrd-dummy --profile-file my.prof ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kba I agree, however I did not argue env var plus cli arg, but only the former. Perhaps we could also move --profile
and --profile-file
into env var control entirely.
Or we go the other direction and put everything into a single CLI arg, just with multi-faceted value: --profile mem=rss,cpu,file=foo.log
for example.
For the broader discussion of how to configure processors without adding to many options / parameters and what for, see this old thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now that the wonky multiple-process-call behavior is fixed, I'll implement this via environment variable OCRD_PROFILE
and OCRD_PROFILE_FILE
OCRD_PROFILE
not set or empty: no profilingOCRD_PROFILE_FILE
set to a value (== current--profile-file
flag):- if
OCRD_PROFILE
is defined but empty: no profiling - if
OCRD_PROFILE
is not defined: setOCRD_PROFILE
toCPU
* iftoo strict, PROFILE_FILE should imply PROFILE^=CPUOCRD_PROFILE
is defined but does not containCPU
: raise error
- if
OCRD_PROFILE
containsCPU
, do CPU profiling (== current--profile
flag)OCRD_PROFILE
containsRSS
do RSS memory profilingOCRD_PROFILE
containsPSS
: do proportionate mem profilingIftoo strict,OCRD_PROFILE
contains bothPSS
andRSS
: raise errorPSS
takes precedence
Did I catch everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 4fdfbf3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment variables look good to me.
In the readme, I would also suggest to reference an external link where the difference between RSS
and PSS
is explained or at least briefly described. May not be the best suggestion, but consider this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sounds reasonable and complete AFAICS – thx!
I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative |
I'm afraid it's not that simple at all. GPutil only gives snapshots, no aggregation/statistics there. And it looks like one would need to use the respective ML framework's facilities for process/session measurements anyway. (See here and there for Pytorch. TF now also has things like get_memory_info / get_memory_usage / reset_memory_stats.) Let's treat that off-topic for now! |
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
* `RSS`: Enable RSS memory profiling | ||
* `PSS`: Enable proportionate memory profiling | ||
* `OCRD_PROFILE_FILE`: If set, then the CPU profile is written to this file for later peruse with a analysis tools like [snakeviz](https://jiffyclub.github.io/snakeviz/) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging is configured via configuration files, see [module logging](./ocrd_utils#ocr-d-module-logging). |
When running processors with
ocrd process
, we already measure the wall and CPU time of a processor run. This adds basic tracking of memory usage with https://github.com/pythonprofilers/memory_profiler.Currently, this just outputs a sparkline of memory usage and max/min memory, e.g.