-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) #3186
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3186 +/- ##
==========================================
- Coverage 78.15% 77.64% -0.51%
==========================================
Files 98 99 +1
Lines 16641 16795 +154
==========================================
+ Hits 13006 13041 +35
- Misses 3635 3754 +119
Continue to review full report at Codecov.
|
src/transformers/file_utils.py
Outdated
|
||
Memory = namedtuple("Memory", ["bytes", "string"]) | ||
MemoryState = namedtuple("MemoryState", ["frame", "cpu", "gpu", "cpu_gpu"]) | ||
MemorySummary = namedtuple("MemorySummary", ["sequential", "cumulative", "total"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the alternative, typed syntax for NamedTuple
s?
I like it a lot, is slightly more powerful, and same Python requirements (until we use type hints which is 3.6): https://docs.python.org/3/library/typing.html#typing.NamedTuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes that's a lot better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW could also use a pd.DataFrame
and do the diff code with .diff
in like two lines. Would also make csv saving cheap. I think examples
might also have a pandas dependency implicitly through pytorch-lightning
src/transformers/file_utils.py
Outdated
@@ -496,3 +498,253 @@ def _resumable_file_manager(): | |||
json.dump(meta, meta_file) | |||
|
|||
return cache_path | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we create a new benchmarking_utils.py
file for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this. Excited to use!
Left some comments which might be out of scope for this PR
average_time = sum(runtimes) / float(len(runtimes)) / 3.0 | ||
dictionary[model_name]["results"][batch_size][slice_size] = average_time | ||
if not no_memory: | ||
# model.add_memory_hooks() # Forward method tracing (only for PyTorch models) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) delete this?
@@ -250,15 +257,21 @@ | |||
|
|||
def create_setup_and_compute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely lack context, but I feel like I would want to use this with 1 model, 1 batch_size, one slice size. The fact that the code is taking lists of models, lists of batch sizes, and lists of slice sizes adds a fair amount of complexity, (e.g. results[model_name]["memory"][batch_size][slice_size]
would just be results['memory']
.
If the use case is comparing different runs I guess the signature makes sense. Is that the reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that the script we used for this blog post for instance: https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2
I kept it as is, just added memory benchmarking in addition to speed (and a little more flexibility in the CL args)
examples/benchmarks.py
Outdated
) | ||
) | ||
print( | ||
"\nLines with lowest memory consumption:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this useful output? are there negatives in cumulative
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are (in particular on my Ubuntu, there are some memory releases during execution which render the output noisier).
Also if you don't keep the output of the model, the last line is just a large release of memory to cancel all you've allocated haha.
examples/benchmarks.py
Outdated
) | ||
) | ||
print( | ||
"\nLines with lowest memory consumption:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we share a def print_summary_statistics
with _compute_pytorch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
dictionary[model_name]["memory"][batch_size][slice_size] = "N/A" | ||
|
||
if not no_speed: | ||
runtimes = timeit.repeat(lambda: inference(sequence), repeat=average_over, number=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) number=3
feels like it should be exposed as a higher level kwarg like nruns=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the previous code. We can keep it as is for now.
src/transformers/file_utils.py
Outdated
|
||
Memory = namedtuple("Memory", ["bytes", "string"]) | ||
MemoryState = namedtuple("MemoryState", ["frame", "cpu", "gpu", "cpu_gpu"]) | ||
MemorySummary = namedtuple("MemorySummary", ["sequential", "cumulative", "total"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW could also use a pd.DataFrame
and do the diff code with .diff
in like two lines. Would also make csv saving cheap. I think examples
might also have a pandas dependency implicitly through pytorch-lightning
@@ -66,6 +67,47 @@ def num_parameters(self, only_trainable: bool = False) -> int: | |||
params = filter(lambda x: x.requires_grad, self.parameters()) if only_trainable else self.parameters() | |||
return sum(p.numel() for p in params) | |||
|
|||
@staticmethod | |||
def _hook_rss_memory_pre_forward(module, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does rss refer to in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"resident set size" as in the psutil RSS memory info https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_info
I didn't comment much these PyTorch hooks because I feel like the tracing methods are more general (can be used both in PT and TF) and they give the same results. I kept them for now in the codebase though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshleifer, regarding the panda
dependency: note that these tracing methods are in the library, not in the examples.
And we are very careful about the dependencies we add for the main library (also note that pytorch-lightning
doesn't depend on pandas).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM. Glad the benchmarking script was upgraded with that!
class UsedMemoryState(NamedTuple): | ||
""" `UsedMemoryState` are named tuples with the following fields: | ||
- 'frame': a `Frame` namedtuple (see below) storing information on the current tracing frame (current file, location in current file) | ||
- 'cpu_memory': CPU RSS memory state *before* executing the line | ||
- 'gpu_memory': GPU used memory *before* executing the line (sum for all GPUs or for only `gpus_to_trace` if provided) | ||
""" | ||
|
||
frame: Frame | ||
cpu_memory: int | ||
gpu_memory: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clean
… (now only 3.6+) (huggingface#3186) * memory benchmark rss * have both forward pass and line-by-line mem tracing * cleaned up tracing * refactored and cleaning up API * no f-strings yet... * add GPU mem logging * fix GPU memory monitoring * style and quality * clean up and doc * update with comments * Switching to python 3.6+ * fix quality
This PR add some utilities to benchmark (RAM) memory consumption of the models.
This is actually a generic utility that can work with any arbitrary python code
Ex:
Incorporated in the
./examples/benchmark.py
script. Example of command-line run: