Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix StreamConstraintsException introduced in jackson 2.15 #31580

Merged
merged 4 commits into from
Jun 13, 2024

Conversation

Abacn
Copy link
Contributor

@Abacn Abacn commented Jun 12, 2024

Fix a regression noted in #26743

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@@ -67,6 +68,13 @@ public void testDecodeEncodeEqual() throws Exception {
}
}

@Test
public void testLargeRow() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)
	at com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
	at com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
	...
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3740)
	at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder.decode(TableRowJsonCoder.java:59)
	...
	at org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:97)
	at org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoderTest.testLargeRow(TableRowJsonCoderTest.java:75)

after:

testLargeRow 0.615s passed

Also tested with example Pipeline (read TableRow from BigQuery with large row, then ReShuffle it).

com.fasterxml.jackson.core.StreamReadConstraints.builder()
.maxStringLength(newLimit)
.build());
} catch (ClassNotFoundException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sure pinning jackson-databind 2.14 would not break pipeline. i.e., In case there are other regression in jackson 2.15, use can still pin 2.14

* overwrite the default buffer size limit to 100 MB, and exposes this interface for higher limit.
* If needed, call this method during pipeline run time, e.g. in DoFn.setup.
*/
public static void increaseDefaultStreamReadConstraints(int newLimit) {
Copy link
Contributor Author

@Abacn Abacn Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was considering adding a pipeline option to configure this. However, overrideDefaultStreamReadConstraints needs to be called as early as in static context to make it effective for ObjectMapper used in random places, makes it ineffective in non-static context, and ended up with this solution.

After all this is a "band-aid" fix for a bad upstream breaking change. Feel free to comment if there is a better solution.

@Abacn Abacn modified the milestone: 2.57.0 Release Jun 12, 2024
@Abacn
Copy link
Contributor Author

Abacn commented Jun 12, 2024

Added to 2.57.0 milestone because at least one regression is found (after #31473), that is when a pipeline decode a TableRows that contains string of > 20 MB (could happen on read from ReShuffle)

@Abacn Abacn marked this pull request as ready for review June 12, 2024 21:22
Copy link
Collaborator

@liferoad liferoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us update CHANGES.md as well. Thanks a lot!

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@Abacn Abacn merged commit fd5b1de into apache:master Jun 13, 2024
28 checks passed
Abacn added a commit to Abacn/beam that referenced this pull request Jun 13, 2024
* Fix StreamConstraintsException introduced in jackson 2.15

* Fix spotless

* Fix checkstyle

* Add changes.md
@Abacn Abacn deleted the fixjacksonregress branch June 13, 2024 14:59
kennknowles added a commit that referenced this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants