Create a message logging system for PyTorch with the following requirements:
-
The C++ and Python APIs should match each other as closely as possible.
-
All errors, warnings, and other messages generated by PyTorch should be emitted using the the logging system API.
-
Offer different message severity levels, including at least the following:
-
Info: Emits a message without creating a warning or error. By default, this gets printed to stdout.
-
Warning: Emits a message as a warning. If a warning is never caught, it gets printed to stderr by default.
-
Error: Emits a message as an error. If an error is never caught, the application will print the error to stderr and quit.
-
-
Offer different message classes under each severity level.
-
Every message is emitted as an instance of a message class.
-
Each message class has both a C++ class and a Python class, and when a C++ message is propagated to Python, it is converted to its corresponding Python class.
-
Whenever it makes sense, the Python class should be one of the builtin Python error/warning classes. For instance, currently in PyTorch, the C++ error class
c10::Error
gets converted to the PythonRuntimeError
class.
-
-
Adding new message classes and severity levels should be easy
-
Ability to turn warnings into errors. This is already possible with the Python
warnings
module filter, but the PyTorch docs should mention it and we should probably have unit tests for it. See documentation -
Settings to disable specific Warning or Info classes
-
Disabling warnings in Python is already possible with the
warnings
module filter. See documentation. There is no similar system in C++ at the moment, and building one is probably low priority. -
Filtering out Info messages would be nice to have because excessive printouts can degrade the user experience. Related to issue #68768
-
-
Settings to enable/disable emitting duplicate messages generated by multiple
torch.distributed
ranks. Related to issue #68768 -
Ability to make a particular Warning or Info message only emit once. Warn-once should be the default for most warnings.
-
Currently
TORCH_WARN_ONCE
does this in C++, but there is no Python equivalent -
Offer a filter to override warn- and log-once, so that they always emit. The filter could work similarly to the Python
warnings
filter. This is a low priority feature. -
TODO:
torch.set_warn_always()
currently controls some warnings (maybe only the ones from C++? I need to find out for sure.)
-
-
Settings can be changed from Python, C++, or environment variables
- Filtering warnings with Python command line arguments should
remain possible. For instance, the following turns a
DeprecationWarning
into an error:python -W error::DeprecationWarning your_script.py
- Filtering warnings with Python command line arguments should
remain possible. For instance, the following turns a
-
Should integrate with Meta's internal logging system, which is glog
- TODO: What are all the requirements that define "integrating with glog"
-
Must be OSS-friendly, so it shouldn't require libraries (like glog) which may cause incompatibility issues for projects that use PyTorch
-
Continue using warning/error APIs and message classes that currently exist in PyTorch wherever possible. For instance,
TORCH_CHECK
,TORCH_WARN
, andTORCH_WARN_ONCE
should continue to be used in C++ -
TODO: Determine the requirements for the following concepts:
- Log files? (default behavior and any settings)
Original issue: link
Currently, it is challenging for PyTorch developers to provide messages that act consistently between Python and C++.
It is also challenging for PyTorch users to manage the messages that PyTorch emits. For instance, if a PyTorch user happens to be calling PyTorch functions that emit lots of messages, it can be difficult for them to filter out those messages so that their project's users don't get bombarded with warnings and printouts that they don't need to see.
At least the following message classes should be available. The name of the C++ class appears first in all the listed entries below, with the Python class to the right of it.
Each severity level has a default class. All other classes within a given severity level inherit from the corresponding default class.
NOTE: Most of the error classes below already exist in PyTorch. However, info classes do not currently exist. Also, only one type of warning currently exists in C++, and it is not implemented as a C++ class that can be inherited (as far as I understand).
-
c10::Error
- PythonRuntimeError
- Default error class. Other error classes inherit from it.
-
c10::IndexError
- PythonIndexError
- Emitted when attempting to access an element that is not present in a list-like object.
-
c10::ValueError
- PythonValueError
- Emitted when a function receives an argument with correct type but incorrect value.
-
c10::TypeError
- PythonTypeError
- Emitted when a function receives an argument with incorrect type.
-
c10:NotImplementedError
- PythonNotImplementedError
- Emitted when a feature that is not implemented is called.
-
c10::LinAlgError
- Pythontorch.linalg.LinAlgError
- Emitted from the
torch.linalg
module when there is a numerical error.
- Emitted from the
-
c10::NondeterministicError
- Pythontorch.NondeterministicError
- Emitted when
torch.use_deterministic_algorithms(True)
andtorch.set_deterministic_debug_mode('error')
are set, and a nondeterministic operation is called.
- Emitted when
-
c10::UserWarning
- PythonUserWarning
- Default warning class. Other warning classes inherit from it.
-
c10::BetaWarning
- Pythontorch.BetaWarning
- Emitted when a beta feature is called. See PyTorch feature classifications.
- TODO: This warning type might not be very useful--find out if we really want this
-
c10::PrototypeWarning
- Pythontorch.PrototypeWarning
- Emitted when a prototype feature is called. See PyTorch feature classifications.
- TODO: This warning type might not be very useful--find out if we really want this
-
c10::NondeterministicWarning
- Pythontorch.NondeterministicWarning
- Emitted when
torch.use_deterministic_algorithms(True)
andtorch.set_deterministic_debug_mode('warn')
are set, and a nondeterministic operation is called.
- Emitted when
-
c10::DeprecationWarning
- PythonDeprecationWarning
- Emitted when a deprecated function is called.
- TODO:
DeprecationWarning
s are ignored by default in Python, so we may actually want to use a different Python class for this.
c10::Info
- Pythontorch.Info
- Default info class. Other info classes inherit from it.
In order to emit messages, developers can use the APIs defined in this section.
These APIs all have a variable length argument list, ...
in C++ and *args
in Python. When a message is emitted, these arguments are concatenated into
a string, and the string becomes the body of the message.
In C++, the arguments in ...
must all have the std::ostream& operator<<
function defined so that they can be concatenated.
In Python, each element in *args
must either have a __str__
function or it
must be a callable that, when called, produces another object that has
a __str__
fuction. Providing the body of a message as a callable can provide
better performance in cases where the message would not be emitted, as in
torch.check(True, lambda: expensive_function())
if cond == True
, since the
expensive_function()
would not be called in that case.
The APIs for raising errors all check a boolean condition, the cond
argument
in the following signatures, and throw an error if that condition is false.
The error APIs are listed below, with the C++ signature on the left and the corresponding Python signature on the right.
TORCH_CHECK(cond, ...)
- torch.check(cond, *args)
- C++ error:
c10::Error
- Python error:
RuntimeError
TORCH_CHECK_INDEX(cond, ...)
- torch.check_index(cond, *args)
- C++ error:
c10::IndexError
- Python error:
IndexError
TORCH_CHECK_VALUE(cond, ...)
- torch.check_value(cond, *args)
- C++ error:
c10::ValueError
- Python error:
IndexError
TORCH_CHECK_TYPE(cond, ...)
- torch.check_type(cond, *args)
- C++ error:
c10::TypeError
- Python error:
TypeError
TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)
- torch.check_not_implemented(cond, *args)
- C++ error:
c10::NotImplementedError
- Python error:
NotImplementedError
TORCH_CHECK_WITH(error_t, cond, ...)
- torch.check_with(error_type, cond, *args)
- C++ error: Specified by
error_t
argument - Python error: Specified by
error_type
argument
TORCH_WARN(...)
- torch.warn(*args)
- C++ warning:
c10::UserWarning
- Python warning:
UserWarning
TORCH_WARN_ONCE(...)
- torch.warn_once(*args)
- C++ warning:
c10::UserWarning
- Python warning:
UserWarning
- For a given callsite, the warning is emitted only upon the first time it is called.
TORCH_WARN_DEPRECATION(...)
- torch.warn_deprecation(*args)
- C++ warning:
c10::DeprecationWarning
- Python warning:
UserWarning
TORCH_WARN_DEPRECATION_ONCE(...)
- torch.warn_deprecation_once(*args)
- C++ warning:
c10::DeprecationWarning
- Python warning:
DeprecationWarning
- For a given callsite, the warning is emitted only upon the first time it is called.
TORCH_WARN_WITH(warning_t, ...)
- torch.warn_with(warning_type, ...)
- C++ warning: Specified by
warning_t
argument - Python warning: Specified by
warning_type
argument
TORCH_WARN_ONCE_WITH(warning_t, ...)
- torch.warn_with(warning_type, ...)
- C++ warning: Specified by
warning_t
argument - Python warning: Specified by
warning_type
argument - For a given callsite, the warning is emitted only upon the first time it is called.
TODO: In C++, TORCH_WARN_ONCE
is implemented as a macro that defines a local
static variable to track whether the warning has been emitted from each
callsite. It is not possible to implement it this way in Python, so need to
think of some other way to do it. Of course the Python warnings
module's
"default"
filter
prevents duplicate warnings from being emitted, but it acts a little
differently--if two warning messages emitted from the same location differ even
slightly (for instance, if the value of some variable is included in the
message and that value differs between two different warnings.warn
calls),
then both warnings are emitted. TORCH_WARN_ONCE
does not check whether
messages differ. But we could probably implement torch.warn_once
in a similar
way to how the warnings
module filter is implemented.
Just like the error and warning APIs, the info APIs each have a variable length
argument list, ...
in C++ and *args
in Python. These arguments are
concatenated into the info message.
TORCH_LOG_INFO(...)
- torch.log_info(*args)
- C++ info class:
c10::Info
- Python warning:
torch.Info
- TODO: Is there a better name than
log_info
? I didn't want to call ittorch.info
, becausenumpy.info
has a completely different functionality. And obviouslytorch.log
is already taken.
TORCH_LOG_INFO_WITH(info_t, ...)
- torch.log_info_with(info_type, *args)
- C++ info class: Specified by
info_t
argument - Python info class: Specified by
info_type
argument
Currently, when running subprocesses that use PyTorch, some messages are emitted by every running subprocess. See issue #68768 for specific examples. Avoiding emitting duplicate messages from each subprocess by default would give a better user experience.
In issue #68768, the duplicate messages related to cpp_extension.load
can be
modified to only be emitted by subprocess rank 0, simply by checking the node's
rank first. For instance, where there is a warnings.warn(...)
, call we can
replace with:
if rank == 0:
warnings.warn(...)
This successfully avoids duplicate warnings. A few concrete examples can be seen in this draft PR.
However, implementing the duplicate filter like this is not ideal. It would be better to have dedicated message system API calls for this. In the case of warnings, the following signature could be used:
torch.warn_rank(my_rank, *args, warn_rank=0)
- Args:
my_rank
- Rank of the subprocess calling this functionargs
- Warning messagewarn_rank
- Rank that should emit the message
- The warning is only emitted if
my_rank == warn_rank
TODO: Add APIs for the rest of the message classes, like
torch.log_info_rank()
, etc.
TODO: There should also be a global setting to enable emitting the duplicates.
torch.warn_rank
could check the setting, and if it's turned on, then it would
emit the warning for all ranks.
TODO: Should we have a TOCH_WARN_RANK
(and others) in C++ as well? Is there
an existing use case for it?
The rest of this document contains details about the current messaging API in PyTorch. This is included to give better context about what will change and what will stay the same in the new messaging system.
At the moment, PyTorch has some APIs in place to make a lot of aspects of message logging easy, from the perspective of a developer working on PyTorch. Messages can be either printouts, warnings, or errors.
Errors are created with the standard raise
statement in Python
(documentation).
In C++, PyTorch offers macros for creating errors (which are listed later in
this document). When a C++ function propagates to Python, any errors that were
generated get converted to Python errors.
Warnings are created with warnings.warn
in Python
(documentation). In C++,
PyTorch offers macros for creating warnings (which are listed later in this
document). When a C++ function propagates to Python, any warnings that were
generated get converted to Python warnings.
Printouts (or what is called "Info" severity messages in the new system) are
created with just print
in Python and std::cout
in C++.
PyTorch's C++ warning/error macros are declared in
c10/util/Exception.h
.
In C++, there are several different types of errors that can be used, but PyTorch developers typically don't deal with these error classes directly. Instead, they use macros that offer a concise interface for raising different error classes.
Each of the error macros evaluate a boolean conditional expression, cond
. If
the condition is false, the error is raised, and whatever extra arguments are
in ...
get concatenated into the error message with operator<<
.
Macro | C++ Error class |
---|---|
TORCH_CHECK(cond, ...) |
c10::Error |
TORCH_CHECK_WITH(error_t, cond, ...) |
caller specifies error_t arg |
TORCH_CHECK_LINALG(cond, ...) |
c10::LinAlgError |
TORCH_CHECK_INDEX(cond, ...) |
c10::IndexError |
TORCH_CHECK_VALUE(cond, ...) |
c10::ValueError |
TORCH_CHECK_TYPE(cond, ...) |
c10::TypeError |
TORCH_CHECK_NOT_IMPLEMENTED(cond, ...) |
c10::NotImplementedError |
There is some documentation on error macros here
The reason why C++ preprocessor macros are used, rather than function calls, is
to ensure that the compiler can optimize for the cond == true
branch. In
other words, if an error does not get raised, overhead is minimized.
The primary error class in C++ is c10::Error
. Documentation and declaration
are
here.
c10::Error
is a subclass of std::exception
.
There are other error classes which are child classes of c10::Error
, defined
here.
When these errors propagate to Python, they are each converted to a different Python error class:
C++ error class | Python error class |
---|---|
std::exception |
RuntimeError |
c10::Error |
RuntimeError |
c10::IndexError |
IndexError |
c10::ValueError |
ValueError |
c10::TypeError |
TypeError |
c10::NotImplementedError |
NotImplementedError |
c10::EnforceFiniteError |
ExitException |
c10::OnnxfiBackendSystemError |
ExitException |
c10::LinAlgError |
torch.linalg.LinAlgError |
When warnings propagate from C++ to Python, they are converted to a Python
UserWarning
. Whatever is in ...
will get concatenated into the warning
message using operator<<
.
-
TORCH_WARN(...)
-
TORCH_WARN_ONCE(...)
- Definition
- This macro only generates a warning the first time it is encountered during run time.
c10::Error
and its subclasses are translated into their corresponding Python
errors in CATCH_CORE_ERRORS
.
However, not all of the c10::Error
subclasses in the table above appear here,
which could just be an oversight.
CATCH_CORE_ERRORS
is included within the END_HANDLE_TH_ERRORS
macro that
most Python-bound C++ functions use for handling errors. For instance,
THPVariable__is_view
uses the error handling macro
here.
There is also a similar END_HANDLE_TH_ERRORS_PYBIND
macro that is used for
pybind-based bindings.
There's also an extra error class in CATCH_CORE_ERRORS
,
torch::PyTorchError
. I'm not sure yet why it exists and how it differs from
c10::Error
. torch::PyTorchError
has several overloads:
torch::IndexError
torch::TypeError
torch::ValueError
torch::NotImplementedError
torch::AttributeError
torch::LinAlgError
The conversion of warnings from C++ to Python is described here
PyTorch Developer Podcast - Python exceptions explains how C++ errors/warnings are converted to Python. TODO: listen to it again and take notes.