Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable processing limits for JSON parser (StreamReadConstraints) #637

Closed
cowtowncoder opened this issue Aug 23, 2020 · 11 comments
Closed
Labels
2.16 Issue planned (at earliest) for 2.16 performance Issue related to performance problems or enhancements
Milestone

Comments

@cowtowncoder
Copy link
Member

cowtowncoder commented Aug 23, 2020

(note: related to/inspired by FasterXML/jackson-databind#2816)

Some aspects of input document are prone to possible abuse, so that malicious sender can create specifically crafted documents to try to overload server. This includes things like:

  1. Ridiculously deeply nested documents (it only takes 2 characters to create Array context so 8k document can have 4000 levels of nesting) -- implemented via Add StreamReadConstraints.maxNestingDepth() to constraint max nesting depth (default: 1000) #943
  2. Very long documents (in general): while streaming decoder has no problems with length per se, higher level databind can easily exhaust memory even with somewhat normal amplification (that is, input content of 10 megabytes can result in a data structure of 100 megabytes) -- implemented via Add configurable limit for the maximum number of bytes/chars of content to parse before failing #1046
  3. Very long names: Jackson's name decoding is optimized for short names and while it can handle any length (within bounds of total memory available) performance characteristics are not great beyond, say, couple of thousands of characters -- implemented via Add configurable limit for the maximum length of Object property names to parse before failing (default max: 50,000 chars) #1047
  4. Very long numbers: textual length of tens of thousands of numbers might be problematic for some uses -- implemented via Add numeric value size limits via StreamReadConstraints (fixes sonatype-2022-6438) -- default 1000 chars #827
  5. (possibly?) Huge number of properties per JSON Object -- for some use cases construction of data structures with tens or hundreds of thousands of distinct keys can be problematic -- not implemented, no immediate plans; can create new issue if this is still desired (ditto for big Arrays)
  6. Extra long text segments (megabytes of content) can similarly become problematic -- this could also result from broken encoding (missing closing quote) -- implemented via Add StreamReadConstraints limit for longest textual value to allow (default: 5M) #863

and although streaming parser can typically handle many of these cases quite well, they can be very problematic for higher-level processing -- and even for streaming, for highly parallel processing.

So. It would be good to create a configurable set of options that:

  1. Default to same safe set of limits for likely problematic cases (like limit nesting to what is known to typically fit in wrt stack frames; limit maximum property names)
  2. Leave more speculative limits (text length) to unlimited (or very high)
  3. Offer a simple way to configure limits (possibly only per JsonFactory, although it'd be really nice if per-parser overrides were possible)

Further reading: related material.

Here are some relevant links:

@cowtowncoder
Copy link
Member Author

cowtowncoder commented Feb 13, 2022

Since 2.13 has been released marking as 2.14 (earliest) due to need for API additions and possible compatibility concerns for some maximum limits.

Also: I think this issue is still important even if databind tackles some of more immediate concerns (via FasterXML/jackson-databind#2816).

@cowtowncoder
Copy link
Member Author

cowtowncoder commented Feb 14, 2022

Some thinking out aloud:

  • Limits for total input document size can probably be checked at points when input blocks are read (blocking) or fed (async): this is what Woodstox does and appears to work beautifully with minimal overhead. Limits probably need to be in units of "whatever input unit is, byte/char"
  • Limit for nesting is a much bigger challenge but my first instinct is to integrate this to JsonStreamContext/JsonReadContext (2.x) (TokenStreamContext in 3.0) -- that is, check it when trying to add next context. This can get tricky due to inability to throw checked (in 2.x) JsonProcessingException but we'll cross that bridge if and when we get that far.... there's also the obvious challenge of how to pass limit configuration in the first place.

As to configuration settings, there probably needs to be a shared base type (StreamProcessingLimits), as concrete implementation, and with a singleton "default" or "empty" instance to feed in case where no explicit configuration is used. And then with format-specific extensions allowed.
Should probably use Builder-style construction, limits instances fully immutable (obviously) to be shareable.

One immediate concern I have, however, is that of adding state to JsonReadContext: seems like we need both current level (int) and configuration settings; latter to be passed when creating "root" instance, then passed to children.
Alternative would be to require config to be passed on every "createXxx()" call -- this would prevent need to hold on to a reference (save 1 field) but would be much more intrusive as callers would need to be changed.
So probably better to add 2 fields.

@cowtowncoder
Copy link
Member Author

Will not happen for 2.14 but should be the first thing for 2.15.

@cowtowncoder
Copy link
Member Author

Implemented for inclusion in 2.15:

@pjfanning
Copy link
Member

@cowtowncoder are there any more of these that you want to do for 2.15?

@pjfanning
Copy link
Member

The total size could readily be limited using stream wrappers like those in https://github.com/pjfanning/json-size-limiter/tree/main/src/main/java/com/github/pjfanning/json/util . That code is borrowed. It should also be possible to find ByteBuffer equivalents. There is an argument that many web service frameworks have support for limiting the acceptable size of inputs - and that users should prefer those approaches. The earlier that the large input size is spotted, the better

@cowtowncoder
Copy link
Member Author

Yes, ideally we'd be able to target (1) by JsonStreamContext, and (2) for input and output sizes.
But I don't know how timing would work.

I think existence of stream wrappers is useful and can work for now even if eventually in-core limits were added.

But deeper nesting checks should definitely be added in core at some point. Just not sure if timing works for 2.15.

And field names... that's an interesting thing. Not commonly reported as attack but conceivably it would be vector same way (... and more) as max String token length.

@pjfanning
Copy link
Member

It seems like (2) is something that can be left to users. I think it is reasonable for jackson to concentrate on the cases where comparatively fewer bytes in input can lead to bad outcomes.

@cowtowncoder
Copy link
Member Author

@pjfanning It could be, altough I implemented this with Woodstox few years back and it really is rather easy to add. Then again Jackson-core supports decorators so it's... even easier to wrap by users.

But I agree it's not a high priority at this point.

@cowtowncoder
Copy link
Member Author

Filed #1046 for max document length -- I think it is worth adding, after thinking it through. Hope to work on it for 2.16.

@cowtowncoder cowtowncoder changed the title Add configurable "document/processing limits" for JSON parser Add configurable processing limits for JSON parser (StreamReadConstraints) Jun 12, 2023
@cowtowncoder cowtowncoder added the 2.16 Issue planned (at earliest) for 2.16 label Oct 26, 2023
@cowtowncoder
Copy link
Member Author

Out of 6 ideas, 5 now implemented and included in 2.16.0-rc1; no plans yet for last one (max # of properties per Object); can file a new issue if need be.

Closing this issue as completed.

@cowtowncoder cowtowncoder added this to the 2.16.0-rc1 milestone Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.16 Issue planned (at earliest) for 2.16 performance Issue related to performance problems or enhancements
Projects
None yet
Development

No branches or pull requests

2 participants