Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Object and JsonElement deserialization iterative #1912

Merged

Conversation

Marcono1234
Copy link
Collaborator

@Marcono1234 Marcono1234 commented Jun 18, 2021

Often when Object and JsonElement are deserialized the format of the JSON data is unknown and it might come from an untrusted source. To avoid a StackOverflowError from maliciously crafted JSON, deserialize Object and JsonElement iteratively instead of recursively.
Note that in most cases Gson already catches StackOverflowError so users of current Gson versions would not encounter them.

Concept based on FasterXML/jackson-databind@51fd2fa
But implementation is not based on it.

@google-cla google-cla bot added the cla: yes label Jun 18, 2021
Often when Object and JsonElement are deserialized the format of the JSON
data is unknown and it might come from an untrusted source. To avoid a
StackOverflowError from maliciously crafted JSON, deserialize Object and
JsonElement iteratively instead of recursively.

Concept based on FasterXML/jackson-databind@51fd2fa
But implementation is not based on it.
@Marcono1234 Marcono1234 force-pushed the marcono1234/iterative-deserialization branch from 851728d to 539952e Compare June 19, 2021 12:39
@Marcono1234
Copy link
Collaborator Author

@eamonnmcmanus, what do you think about these changes? They would probably fix #2109 and #2111.
This is also related to this oss-fuzz comment.

Copy link
Member

@eamonnmcmanus eamonnmcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is a great idea, and I appreciate your taking the trouble to make it work. The current business of catching StackOverflowError has always felt very wobbly to me.

I'm running this against Google's internal tests to check whether they show anything up.

My main comment is that we should probably still have some limit on the maximum depth. Otherwise we could still be exposing users to denial-of-service attacks. If every { causes a new LinkedTreeMap to be allocated then an input string with a million of them is probably going to allocate on the order of 30M. We can imagine that being enough to cause a server to fall over.

We could have a hardcoded limit on nesting, say 1000, or we could make that a parameter settable by GsonBuilder.

* the next element is neither of those.
*/
private JsonElement tryBeginNesting(JsonReader in, JsonToken peeked) throws IOException {
if (peeked == JsonToken.BEGIN_ARRAY) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about switch here, and about LinkedList below. Is it possible to reduce the amount of code duplication between this class and ObjectTypeAdapter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reduce the amount of code duplication between this class and ObjectTypeAdapter?

The code indeed looks pretty similar. Maybe this can be done by having a separate internal abstract IterativeDeserializer or similar with subclasses for Object and JsonElement deserialization, but I am not sure if this wouldn't be more complex (and possibly more inefficient). Do you have a specific implementation in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to do anything about this right away, unless there is some way that people could switch between the two to circumvent any depth (etc) limitations we are trying to introduce here.

@Marcono1234
Copy link
Collaborator Author

My main comment is that we should probably still have some limit on the maximum depth. Otherwise we could still be exposing users to denial-of-service attacks.

Sounds reasonable, but maybe it shouldn't be depth then, but instead total count. Otherwise an adversary could circumvent the depth restriction by creating one JSON array and placing all the JSON objects inside it. So instead the total number of encountered JSON arrays and objects could be tracked for these deserializers per deserialization call. What do you think?

If every { causes a new LinkedTreeMap to be allocated then an input string with a million of them is probably going to allocate on the order of 30M.

Maybe there is a related issue: When ArrayList increases its capacity the new backing array length is based on the current array length. Possibly that could be abused as well to deserialize JSON arrays with just enough items to cause the capacity to get increased. But that issue probably cannot be easily solved without using a custom List implementation or manually constructing the list before copying the elements to a properly sized ArrayList.

@eamonnmcmanus
Copy link
Member

Sounds reasonable, but maybe it shouldn't be depth then, but instead total count. Otherwise an adversary could circumvent the depth restriction by creating one JSON array and placing all the JSON objects inside it. So instead the total number of encountered JSON arrays and objects could be tracked for these deserializers per deserialization call. What do you think?

I see what you mean. Instead of causing {{{{{{{{... to be decoded, Dr Evil could cause [{},{},{},{},...] to be decoded, with similar effect. It's probably a fool's errand to try to avoid problems like these. We might just have an overall configurable limit on how big a string we are prepared to decode, but it's pretty trivial for users to check that size themselves. There's nothing new here: if client code is prepared to decode any arbitrary string of any length then it's already potentially vulnerable to OutOfMemoryError DoS. But we do already catch OutOfMemoryError in appropriate places. So the main consequence of this PR would be that we wouldn't really need to catch StackOverflowError as well.

@Marcono1234
Copy link
Collaborator Author

Is this pull request fine like this or are there any other things you want to have changed?

Copy link
Member

@eamonnmcmanus eamonnmcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Sorry for the delay. I checked it with all of Google's internal tests and didn't find any problems.

@eamonnmcmanus eamonnmcmanus merged commit 2d01d6a into google:master Jun 23, 2022
@Marcono1234 Marcono1234 deleted the marcono1234/iterative-deserialization branch June 27, 2022 11:35
sgammon referenced this pull request in elide-dev/elide Jan 15, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [com.google.code.gson:gson-parent](https://togithub.com/google/gson) |
`2.9.0` -> `2.10.1` |
[![age](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson-parent/2.10.1/age-slim)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson-parent/2.10.1/adoption-slim)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson-parent/2.10.1/compatibility-slim/2.9.0)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson-parent/2.10.1/confidence-slim/2.9.0)](https://docs.renovatebot.com/merge-confidence/)
|
| [com.google.code.gson:gson](https://togithub.com/google/gson) |
`2.9.0` -> `2.10.1` |
[![age](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson/2.10.1/age-slim)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson/2.10.1/adoption-slim)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson/2.10.1/compatibility-slim/2.9.0)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://badges.renovateapi.com/packages/maven/com.google.code.gson:gson/2.10.1/confidence-slim/2.9.0)](https://docs.renovatebot.com/merge-confidence/)
|

---

### ⚠ Dependency Lookup Warnings ⚠

Warnings were logged while processing this repo. Please check the
Dependency Dashboard for more information.

---

### Release Notes

<details>
<summary>google/gson</summary>

###
[`v2.10`](https://togithub.com/google/gson/blob/HEAD/CHANGELOG.md#Version-210)

- Support for serializing and deserializing Java records, on Java ≥ 16.
([https://github.com/google/gson/pull/2201](https://togithub.com/google/gson/pull/2201))
- Add `JsonArray.asList` and `JsonObject.asMap` view methods
([https://github.com/google/gson/pull/2225](https://togithub.com/google/gson/pull/2225))
- Fix `TypeAdapterRuntimeTypeWrapper` not detecting reflective
`TreeTypeAdapter` and `FutureTypeAdapter`
([https://github.com/google/gson/pull/1787](https://togithub.com/google/gson/pull/1787))
- Improve `JsonReader.skipValue()`
([https://github.com/google/gson/pull/2062](https://togithub.com/google/gson/pull/2062))
- Perform numeric conversion for primitive numeric type adapters
([https://github.com/google/gson/pull/2158](https://togithub.com/google/gson/pull/2158))
- Add `Gson.fromJson(..., TypeToken)` overloads
([https://github.com/google/gson/pull/1700](https://togithub.com/google/gson/pull/1700))
- Fix changes to `GsonBuilder` affecting existing `Gson` instances
([https://github.com/google/gson/pull/1815](https://togithub.com/google/gson/pull/1815))
- Make `JsonElement` conversion methods more consistent and fix javadoc
([https://github.com/google/gson/pull/2178](https://togithub.com/google/gson/pull/2178))
- Throw `UnsupportedOperationException` when `JsonWriter.jsonValue` is
not supported
([https://github.com/google/gson/pull/1651](https://togithub.com/google/gson/pull/1651))
- Disallow `JsonObject` `Entry.setValue(null)`
([https://github.com/google/gson/pull/2167](https://togithub.com/google/gson/pull/2167))
- Fix `TypeAdapter.toJson` throwing AssertionError for custom
IOException
([https://github.com/google/gson/pull/2172](https://togithub.com/google/gson/pull/2172))
- Convert null to JsonNull for `JsonArray.set`
([https://github.com/google/gson/pull/2170](https://togithub.com/google/gson/pull/2170))
- Fixed nullSafe usage.
([https://github.com/google/gson/pull/1555](https://togithub.com/google/gson/pull/1555))
- Validate `TypeToken.getParameterized` arguments
([https://github.com/google/gson/pull/2166](https://togithub.com/google/gson/pull/2166))
- Fix [#&#8203;1702](https://togithub.com/google/gson/issues/1702):
Gson.toJson creates CharSequence which does not implement toString
([https://github.com/google/gson/pull/1703](https://togithub.com/google/gson/pull/1703))
- Prefer existing adapter for concurrent `Gson.getAdapter` calls
([https://github.com/google/gson/pull/2153](https://togithub.com/google/gson/pull/2153))
- Improve `ArrayTypeAdapter` for `Object[]`
([https://github.com/google/gson/pull/1716](https://togithub.com/google/gson/pull/1716))
- Improve `AppendableWriter` performance
([https://github.com/google/gson/pull/1706](https://togithub.com/google/gson/pull/1706))

###
[`v2.9.1`](https://togithub.com/google/gson/blob/HEAD/CHANGELOG.md#Version-291)

- Make `Object` and `JsonElement` deserialization iterative rather than

recursi[https://github.com/google/gson/pull/1912](https://togithub.com/google/gson/pull/1912)1912)
- Added parsing support for enum that has overridden toString() method
([https://github.com/google/gson/pull/1950](https://togithub.com/google/gson/pull/1950))
- Removed support for building Gson with Gradle
([https://github.com/google/gson/pull/2081](https://togithub.com/google/gson/pull/2081))
- Removed obsolete `codegen` hierarchy
([https://github.com/google/gson/pull/2099](https://togithub.com/google/gson/pull/2099))
- Add support for reflection access filter
([https://github.com/google/gson/pull/1905](https://togithub.com/google/gson/pull/1905))
- Improve `TypeToken` creation validation
([https://github.com/google/gson/pull/2072](https://togithub.com/google/gson/pull/2072))
- Add explicit support for `float` in `JsonWriter`
([https://github.com/google/gson/pull/2130](https://togithub.com/google/gson/pull/2130),
[https://github.com/google/gson/pull/2132](https://togithub.com/google/gson/pull/2132))
- Fail when parsing invalid local date
([https://github.com/google/gson/pull/2134](https://togithub.com/google/gson/pull/2134))

Also many small improvements to javadoc.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://app.renovatebot.com/dashboard#github/elide-dev/v3).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4xMDIuMCIsInVwZGF0ZWRJblZlciI6IjM0LjEwMi4wIn0=-->
@renatoathaydes
Copy link

An interesting "side-effect" of this change: in our tests, we started actually getting StackOverflowError after upgrading GSON. The reason is that we were processing the resulting Java objects coming from GSON, and because now, GSON will let very deep structures through before causing an error, our code, which is still recursive, falls into a StackOverflowError outside of the try/catch for that (as we expected GSON to do that before).

I understand this is now our bug (as in, us, the lib consumers) :D but in any case, I wanted to mention this because other users of GSON may now face the same issue, which could cause their services to crash.

In our case, luckily we had a test for this case so I can fix it before it goes to production.

@Marcono1234
Copy link
Collaborator Author

Marcono1234 commented Jul 7, 2024

Yes you are right; the changes of this pull request unfortunately do not help against a StackOverflowError which occurs when using the deserialized object afterwards.

#2588 proposes to introduce a nesting limit to already fail for such deeply nested JSON during parsing. Hopefully that would help for your use case as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants