Skip to content

Latest commit

 

History

History
496 lines (350 loc) · 20.5 KB

RFC-0026-logging-system.md

File metadata and controls

496 lines (350 loc) · 20.5 KB

New PyTorch Logging System

Summary

Create a message logging system for PyTorch with the following requirements:

Consistency

  • The C++ and Python APIs should match each other as closely as possible.

  • All errors, warnings, and other messages generated by PyTorch should be emitted using the the logging system API.

Severity level and message classes

  • Offer different message severity levels, including at least the following:

    • Info: Emits a message without creating a warning or error. By default, this gets printed to stdout.

    • Warning: Emits a message as a warning. If a warning is never caught, it gets printed to stderr by default.

    • Error: Emits a message as an error. If an error is never caught, the application will print the error to stderr and quit.

  • Offer different message classes under each severity level.

    • Every message is emitted as an instance of a message class.

    • Each message class has both a C++ class and a Python class, and when a C++ message is propagated to Python, it is converted to its corresponding Python class.

    • Whenever it makes sense, the Python class should be one of the builtin Python error/warning classes. For instance, currently in PyTorch, the C++ error class c10::Error gets converted to the Python RuntimeError class.

  • Adding new message classes and severity levels should be easy

Configurability and filtering

  • Ability to turn warnings into errors. This is already possible with the Python warnings module filter, but the PyTorch docs should mention it and we should probably have unit tests for it. See documentation

  • Settings to disable specific Warning or Info classes

    • Disabling warnings in Python is already possible with the warnings module filter. See documentation. There is no similar system in C++ at the moment, and building one is probably low priority.

    • Filtering out Info messages would be nice to have because excessive printouts can degrade the user experience. Related to issue #68768

  • Settings to enable/disable emitting duplicate messages generated by multiple torch.distributed ranks. Related to issue #68768

  • Ability to make a particular Warning or Info message only emit once. Warn-once should be the default for most warnings.

    • Currently TORCH_WARN_ONCE does this in C++, but there is no Python equivalent

    • Offer a filter to override warn- and log-once, so that they always emit. The filter could work similarly to the Python warnings filter. This is a low priority feature.

    • TODO: torch.set_warn_always() currently controls some warnings (maybe only the ones from C++? I need to find out for sure.)

  • Settings can be changed from Python, C++, or environment variables

    • Filtering warnings with Python command line arguments should remain possible. For instance, the following turns a DeprecationWarning into an error: python -W error::DeprecationWarning your_script.py

Compatibility

  • Should integrate with Meta's internal logging system, which is glog

    • TODO: What are all the requirements that define "integrating with glog"
  • Must be OSS-friendly, so it shouldn't require libraries (like glog) which may cause incompatibility issues for projects that use PyTorch

Other requirements

  • Continue using warning/error APIs and message classes that currently exist in PyTorch wherever possible. For instance, TORCH_CHECK, TORCH_WARN, and TORCH_WARN_ONCE should continue to be used in C++

  • TODO: Determine the requirements for the following concepts:

    • Log files? (default behavior and any settings)

Motivation

Original issue: link

Currently, it is challenging for PyTorch developers to provide messages that act consistently between Python and C++.

It is also challenging for PyTorch users to manage the messages that PyTorch emits. For instance, if a PyTorch user happens to be calling PyTorch functions that emit lots of messages, it can be difficult for them to filter out those messages so that their project's users don't get bombarded with warnings and printouts that they don't need to see.

Proposed Implementation

Message classes

At least the following message classes should be available. The name of the C++ class appears first in all the listed entries below, with the Python class to the right of it.

Each severity level has a default class. All other classes within a given severity level inherit from the corresponding default class.

NOTE: Most of the error classes below already exist in PyTorch. However, info classes do not currently exist. Also, only one type of warning currently exists in C++, and it is not implemented as a C++ class that can be inherited (as far as I understand).

Error message classes:

  • c10::Error - Python RuntimeError

    • Default error class. Other error classes inherit from it.
  • c10::IndexError - Python IndexError

    • Emitted when attempting to access an element that is not present in a list-like object.
  • c10::ValueError - Python ValueError

    • Emitted when a function receives an argument with correct type but incorrect value.
  • c10::TypeError - Python TypeError

    • Emitted when a function receives an argument with incorrect type.
  • c10:NotImplementedError - Python NotImplementedError

    • Emitted when a feature that is not implemented is called.
  • c10::LinAlgError - Python torch.linalg.LinAlgError

    • Emitted from the torch.linalg module when there is a numerical error.
  • c10::NondeterministicError - Python torch.NondeterministicError

    • Emitted when torch.use_deterministic_algorithms(True) and torch.set_deterministic_debug_mode('error') are set, and a nondeterministic operation is called.

Warning message classes:

  • c10::UserWarning - Python UserWarning

    • Default warning class. Other warning classes inherit from it.
  • c10::BetaWarning - Python torch.BetaWarning

    • Emitted when a beta feature is called. See PyTorch feature classifications.
    • TODO: This warning type might not be very useful--find out if we really want this
  • c10::PrototypeWarning - Python torch.PrototypeWarning

    • Emitted when a prototype feature is called. See PyTorch feature classifications.
    • TODO: This warning type might not be very useful--find out if we really want this
  • c10::NondeterministicWarning - Python torch.NondeterministicWarning

    • Emitted when torch.use_deterministic_algorithms(True) and torch.set_deterministic_debug_mode('warn') are set, and a nondeterministic operation is called.
  • c10::DeprecationWarning - Python DeprecationWarning

    • Emitted when a deprecated function is called.
    • TODO: DeprecationWarnings are ignored by default in Python, so we may actually want to use a different Python class for this.

Info message classes:

  • c10::Info - Python torch.Info
    • Default info class. Other info classes inherit from it.

Message APIs

In order to emit messages, developers can use the APIs defined in this section.

These APIs all have a variable length argument list, ... in C++ and *args in Python. When a message is emitted, these arguments are concatenated into a string, and the string becomes the body of the message.

In C++, the arguments in ... must all have the std::ostream& operator<< function defined so that they can be concatenated.

In Python, each element in *args must either have a __str__ function or it must be a callable that, when called, produces another object that has a __str__ fuction. Providing the body of a message as a callable can provide better performance in cases where the message would not be emitted, as in torch.check(True, lambda: expensive_function()) if cond == True, since the expensive_function() would not be called in that case.

Error APIs

The APIs for raising errors all check a boolean condition, the cond argument in the following signatures, and throw an error if that condition is false.

The error APIs are listed below, with the C++ signature on the left and the corresponding Python signature on the right.

TORCH_CHECK(cond, ...) - torch.check(cond, *args)

  • C++ error: c10::Error
  • Python error: RuntimeError

TORCH_CHECK_INDEX(cond, ...) - torch.check_index(cond, *args)

  • C++ error: c10::IndexError
  • Python error: IndexError

TORCH_CHECK_VALUE(cond, ...) - torch.check_value(cond, *args)

  • C++ error: c10::ValueError
  • Python error: IndexError

TORCH_CHECK_TYPE(cond, ...) - torch.check_type(cond, *args)

  • C++ error: c10::TypeError
  • Python error: TypeError

TORCH_CHECK_NOT_IMPLEMENTED(cond, ...) - torch.check_not_implemented(cond, *args)

  • C++ error: c10::NotImplementedError
  • Python error: NotImplementedError

TORCH_CHECK_WITH(error_t, cond, ...) - torch.check_with(error_type, cond, *args)

  • C++ error: Specified by error_t argument
  • Python error: Specified by error_type argument

Warning APIs

TORCH_WARN(...) - torch.warn(*args)

  • C++ warning: c10::UserWarning
  • Python warning: UserWarning

TORCH_WARN_ONCE(...) - torch.warn_once(*args)

  • C++ warning: c10::UserWarning
  • Python warning: UserWarning
  • For a given callsite, the warning is emitted only upon the first time it is called.

TORCH_WARN_DEPRECATION(...) - torch.warn_deprecation(*args)

  • C++ warning: c10::DeprecationWarning
  • Python warning: UserWarning

TORCH_WARN_DEPRECATION_ONCE(...) - torch.warn_deprecation_once(*args)

  • C++ warning: c10::DeprecationWarning
  • Python warning: DeprecationWarning
  • For a given callsite, the warning is emitted only upon the first time it is called.

TORCH_WARN_WITH(warning_t, ...) - torch.warn_with(warning_type, ...)

  • C++ warning: Specified by warning_t argument
  • Python warning: Specified by warning_type argument

TORCH_WARN_ONCE_WITH(warning_t, ...) - torch.warn_with(warning_type, ...)

  • C++ warning: Specified by warning_t argument
  • Python warning: Specified by warning_type argument
  • For a given callsite, the warning is emitted only upon the first time it is called.

TODO: In C++, TORCH_WARN_ONCE is implemented as a macro that defines a local static variable to track whether the warning has been emitted from each callsite. It is not possible to implement it this way in Python, so need to think of some other way to do it. Of course the Python warnings module's "default" filter prevents duplicate warnings from being emitted, but it acts a little differently--if two warning messages emitted from the same location differ even slightly (for instance, if the value of some variable is included in the message and that value differs between two different warnings.warn calls), then both warnings are emitted. TORCH_WARN_ONCE does not check whether messages differ. But we could probably implement torch.warn_once in a similar way to how the warnings module filter is implemented.

Info APIs

Just like the error and warning APIs, the info APIs each have a variable length argument list, ... in C++ and *args in Python. These arguments are concatenated into the info message.

TORCH_LOG_INFO(...) - torch.log_info(*args)

  • C++ info class: c10::Info
  • Python warning: torch.Info
  • TODO: Is there a better name than log_info? I didn't want to call it torch.info, because numpy.info has a completely different functionality. And obviously torch.log is already taken.

TORCH_LOG_INFO_WITH(info_t, ...) - torch.log_info_with(info_type, *args)

  • C++ info class: Specified by info_t argument
  • Python info class: Specified by info_type argument

Multi-process messaging APIs

Currently, when running subprocesses that use PyTorch, some messages are emitted by every running subprocess. See issue #68768 for specific examples. Avoiding emitting duplicate messages from each subprocess by default would give a better user experience.

In issue #68768, the duplicate messages related to cpp_extension.load can be modified to only be emitted by subprocess rank 0, simply by checking the node's rank first. For instance, where there is a warnings.warn(...), call we can replace with:

if rank == 0:
    warnings.warn(...)

This successfully avoids duplicate warnings. A few concrete examples can be seen in this draft PR.

However, implementing the duplicate filter like this is not ideal. It would be better to have dedicated message system API calls for this. In the case of warnings, the following signature could be used:

torch.warn_rank(my_rank, *args, warn_rank=0)

  • Args:
    • my_rank - Rank of the subprocess calling this function
    • args - Warning message
    • warn_rank - Rank that should emit the message
  • The warning is only emitted if my_rank == warn_rank

TODO: Add APIs for the rest of the message classes, like torch.log_info_rank(), etc.

TODO: There should also be a global setting to enable emitting the duplicates. torch.warn_rank could check the setting, and if it's turned on, then it would emit the warning for all ranks.

TODO: Should we have a TOCH_WARN_RANK (and others) in C++ as well? Is there an existing use case for it?

PyTorch's current messaging API

The rest of this document contains details about the current messaging API in PyTorch. This is included to give better context about what will change and what will stay the same in the new messaging system.

At the moment, PyTorch has some APIs in place to make a lot of aspects of message logging easy, from the perspective of a developer working on PyTorch. Messages can be either printouts, warnings, or errors.

Errors are created with the standard raise statement in Python (documentation). In C++, PyTorch offers macros for creating errors (which are listed later in this document). When a C++ function propagates to Python, any errors that were generated get converted to Python errors.

Warnings are created with warnings.warn in Python (documentation). In C++, PyTorch offers macros for creating warnings (which are listed later in this document). When a C++ function propagates to Python, any warnings that were generated get converted to Python warnings.

Printouts (or what is called "Info" severity messages in the new system) are created with just print in Python and std::cout in C++.

PyTorch's C++ warning/error macros are declared in c10/util/Exception.h.

PyTorch C++ Errors

In C++, there are several different types of errors that can be used, but PyTorch developers typically don't deal with these error classes directly. Instead, they use macros that offer a concise interface for raising different error classes.

C++ error macros

Each of the error macros evaluate a boolean conditional expression, cond. If the condition is false, the error is raised, and whatever extra arguments are in ... get concatenated into the error message with operator<<.

Macro C++ Error class
TORCH_CHECK(cond, ...) c10::Error
TORCH_CHECK_WITH(error_t, cond, ...) caller specifies error_t arg
TORCH_CHECK_LINALG(cond, ...) c10::LinAlgError
TORCH_CHECK_INDEX(cond, ...) c10::IndexError
TORCH_CHECK_VALUE(cond, ...) c10::ValueError
TORCH_CHECK_TYPE(cond, ...) c10::TypeError
TORCH_CHECK_NOT_IMPLEMENTED(cond, ...) c10::NotImplementedError

There is some documentation on error macros here

The reason why C++ preprocessor macros are used, rather than function calls, is to ensure that the compiler can optimize for the cond == true branch. In other words, if an error does not get raised, overhead is minimized.

C++ error classes

The primary error class in C++ is c10::Error. Documentation and declaration are here. c10::Error is a subclass of std::exception.

There are other error classes which are child classes of c10::Error, defined here.

When these errors propagate to Python, they are each converted to a different Python error class:

C++ error class Python error class
std::exception RuntimeError
c10::Error RuntimeError
c10::IndexError IndexError
c10::ValueError ValueError
c10::TypeError TypeError
c10::NotImplementedError NotImplementedError
c10::EnforceFiniteError ExitException
c10::OnnxfiBackendSystemError ExitException
c10::LinAlgError torch.linalg.LinAlgError

PyTorch C++ Warnings

When warnings propagate from C++ to Python, they are converted to a Python UserWarning. Whatever is in ... will get concatenated into the warning message using operator<<.

  • TORCH_WARN(...)

  • TORCH_WARN_ONCE(...)

    • Definition
    • This macro only generates a warning the first time it is encountered during run time.

Implementation details

C++ to Python Error Translation

c10::Error and its subclasses are translated into their corresponding Python errors in CATCH_CORE_ERRORS.

However, not all of the c10::Error subclasses in the table above appear here, which could just be an oversight.

CATCH_CORE_ERRORS is included within the END_HANDLE_TH_ERRORS macro that most Python-bound C++ functions use for handling errors. For instance, THPVariable__is_view uses the error handling macro here. There is also a similar END_HANDLE_TH_ERRORS_PYBIND macro that is used for pybind-based bindings.

torch::PyTorchError

There's also an extra error class in CATCH_CORE_ERRORS, torch::PyTorchError. I'm not sure yet why it exists and how it differs from c10::Error. torch::PyTorchError has several overloads:

  • torch::IndexError
  • torch::TypeError
  • torch::ValueError
  • torch::NotImplementedError
  • torch::AttributeError
  • torch::LinAlgError

C++ to Python Warning Translation

The conversion of warnings from C++ to Python is described here

Misc Notes

PyTorch Developer Podcast - Python exceptions explains how C++ errors/warnings are converted to Python. TODO: listen to it again and take notes.