Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type validation of numeric values #334

Closed
oguzhanunlu opened this issue Oct 14, 2020 · 8 comments
Closed

Improve type validation of numeric values #334

oguzhanunlu opened this issue Oct 14, 2020 · 8 comments

Comments

@oguzhanunlu
Copy link
Contributor

oguzhanunlu commented Oct 14, 2020

Issue
TypeFactory uses jackson-databind's JsonNode.isIntegralNumber and JsonNode.isNumber to see if the node's type is integer or number, respectively. This result is used when comparing node type and schema type which is not precise enough in some cases.

if (node.isIntegralNumber())
return JsonType.INTEGER;
if (node.isNumber())
return JsonType.NUMBER;

Scenario
Assume that we have a schema where a property's type is integer.
Given a DecimalNode(v) where v is a value that can fit into Long, json-schema-validator thinks that node type is number and this results in type failure as number found, integer expected

Proposal
Update line 72-73 in the snippet above as following (canConvertToLong() is defined for DecimalNode at jackson-databind)

if (node.isIntegralNumber() || node.canConvertToLong())
    return JsonType.INTEGER; 

so that we can have a validator with better precision

@stevehu
Copy link
Contributor

stevehu commented Oct 14, 2020

@oguzhanunlu Thanks a lot for raising the issue and provide a solution. Let me add some background information here so that others can see where this issue is from and why we should adopt your solution as a Java-specific implementation.

Warning

The precise treatment of the “integer” type may depend on the implementation of your JSON Schema validator. JavaScript (and thus also JSON) does not have distinct types for integers and floating-point values. Therefore, JSON Schema can not use type alone to distinguish between integers and non-integers. The JSON Schema specification recommends, but does not require, that validators use the mathematical value to determine whether a number is an integer, and not the type alone. Therefore, there is some disagreement between validators on this point. For example, a JavaScript-based validator may accept 1.0 as an integer, whereas the Python-based jsonschema does not.

Clever use of the multipleOf keyword (see Multiples) can be used to get around this discrepancy. For example, the following likely has the same behavior on all JSON Schema implementations:

{ "type": "number", "multipleOf": 1.0 }

Given the above quote from the official site, I think we should leverage the Java language feature to support the Long type; however, the change might make the validator different than the Javascript implementation. As this is a community-driven implementation, I want to know the opinion of @everybody. The best solution is to support both and use the configuration to differentiate it. We just need to make the decision of which implementation is the default.

@oguzhanunlu
Copy link
Contributor Author

oguzhanunlu commented Oct 14, 2020

Thanks for the quick response @stevehu !

Looking at the implementation of DecimalNode.canConvertToLong in jackson-databind, it seems that floating values smaller than maximum long, e.g. 1.4, will be considered as integer, if we use the proposal I shared above, which would be wrong, so we need to use a better logical predicate to check if the node value can fit into a Long and it doesn't have a fractional value.

Instead of the initial proposal, perhaps something as following:

if (node.isIntegralNumber() || (node.canConvertToLong() && !node.asText().contains('.')))
    return JsonType.INTEGER; 

@oguzhanunlu
Copy link
Contributor Author

Hi @stevehu , is there any update on this one? Or is there anything I can do to proceed here?

@stevehu
Copy link
Contributor

stevehu commented Nov 5, 2020

@oguzhanunlu I think we should go ahead with the implementation. Just make sure to keep the existing behaviour as default and the new feature can be enabled from the configuration. Could you submit a PR for this? Thanks.

@oguzhanunlu
Copy link
Contributor Author

sure @stevehu , I'll open a PR, thanks!

@oguzhanunlu
Copy link
Contributor Author

hey @stevehu , fyi, I just opened the PR and wanted to share here as well in case you don't get notified

@stevehu
Copy link
Contributor

stevehu commented Nov 11, 2020

@oguzhanunlu Thanks a lot for your help. I am working with customers these days and very busy. I am reviewing in my spare time and it will take a bit longer.

@oguzhanunlu
Copy link
Contributor Author

I understand it @stevehu , no worries at all! I hope jackson-databind publishes 2.12.0 in the meantime.

stevehu pushed a commit that referenced this issue Jan 14, 2022
* Improve type validation of integrals

Instead of doing string comparison, use new jackson method to determine if a number is an integral.

The `javaSemantics` config option was added in PR #343 which partially addressed issue #334. In the notes for this PR:
> Once jackson-databind 2.12.0 is out, I'll replace my solution with a call to canConvertToExactIntegral

jackson-databind has been updated to 2.12.1 so this is available but the change has not yet been made.

PR #450 which addressed #446 missed this location which is used when calling `JsonSchemaFactory.getSchema`.

Issue #344 requested coercion of various types but the only type implemented in PR #379 was lossless narrowing, set with configuration option `losslessNarrowing`. I believe that setting is unnecessary now as this implementation of `javaSemantics` addresses the original issue, but left it for backwards compatibility. It also allows you to set `javaSemantics=false` and `losslessNarrowing=true` to achieve only this specific case rather than anything that `javaSemantics` is used for in the future. At this time, these properties do exactly the same thing.

- Change from string comparison to `canConvertToExactIntegral` for `javaSemantics` and `losslessNarrowing`
- Add missing documentation around `losslessNarrowing`
- Add more test cases around integrals

* Update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants