-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix StreamConstraintsException introduced in jackson 2.15 #31580
Conversation
@@ -67,6 +68,13 @@ public void testDecodeEncodeEqual() throws Exception { | |||
} | |||
} | |||
|
|||
@Test | |||
public void testLargeRow() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before:
com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)
at com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
at com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
...
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3740)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:59)
...
at org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:97)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoderTest.testLargeRow(TableRowJsonCoderTest.java:75)
after:
testLargeRow 0.615s passed
Also tested with example Pipeline (read TableRow from BigQuery with large row, then ReShuffle it).
com.fasterxml.jackson.core.StreamReadConstraints.builder() | ||
.maxStringLength(newLimit) | ||
.build()); | ||
} catch (ClassNotFoundException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sure pinning jackson-databind 2.14 would not break pipeline. i.e., In case there are other regression in jackson 2.15, use can still pin 2.14
* overwrite the default buffer size limit to 100 MB, and exposes this interface for higher limit. | ||
* If needed, call this method during pipeline run time, e.g. in DoFn.setup. | ||
*/ | ||
public static void increaseDefaultStreamReadConstraints(int newLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was considering adding a pipeline option to configure this. However, overrideDefaultStreamReadConstraints needs to be called as early as in static context to make it effective for ObjectMapper used in random places, makes it ineffective in non-static context, and ended up with this solution.
After all this is a "band-aid" fix for a bad upstream breaking change. Feel free to comment if there is a better solution.
Added to 2.57.0 milestone because at least one regression is found (after #31473), that is when a pipeline decode a TableRows that contains string of > 20 MB (could happen on read from ReShuffle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us update CHANGES.md as well. Thanks a lot!
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
* Fix StreamConstraintsException introduced in jackson 2.15 * Fix spotless * Fix checkstyle * Add changes.md
Fix a regression noted in #26743
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.