[Feature Request]: Add with_exception_handling()
for PTransforms in Python
#31193
Open
2 of 16 tasks
Labels
What would you like to happen?
Add a way to handle uncaught runtime exceptions thrown within a transform to the Python SDK, e.g. something like this
with_exception_handling()
method:This already exists for DoFns in DoFn.with_exception_handling, and the Java SDK appears to offer something similar for PTransforms: https://beam.apache.org/releases/javadoc/2.15.0/index.html?org/apache/beam/sdk/transforms/WithFailures.html
Motivation
Google Cloud Dataflow will automatically re-try failed messages in streaming jobs; however, in the case of messages that cause runtime errors due to bad data/etc., this can cause messages to be retried infinitely and block other messages from being processed. The only fix we've found is to drain and re-start the pipeline to flush the bad messages, which is manual and risks losing data. There's no way to set a maximum number of retries per message. While we try to parse + validate messages up-front as much as possible, bugs have slipped through to production and caused runtime errors (and obviously, we can't prevent 100% of bugs).
Being able to add a top-level error handler to the pipeline (or a root transform) would solve this, since in a worst-case scenario we could catch any failed messages/collections, log them, and not block the rest of the pipeline.
Right now, though, adding a top-level exception handler isn't possible. For instance, this example will not catch the raised error in Apache Beam 2.56.0, which is very unintuitive:
Output:
The only solution we've found is to add this sort of error handling separately to every pipeline step, which isn't maintainable (e.g. if we have hundreds of DoFns, adding try-except blocks to all of them individually is labor-intensive).
Related Issues
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: