Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance regression caused by unevaluatedProperties #564

Closed
dreis2211 opened this issue Apr 29, 2022 · 9 comments
Closed

Major performance regression caused by unevaluatedProperties #564

dreis2211 opened this issue Apr 29, 2022 · 9 comments

Comments

@dreis2211
Copy link
Contributor

dreis2211 commented Apr 29, 2022

Hi,

I've seen an internal customer project being on an ancient version (0.1.3) and wanted to upgrade things. After the update to 1.0.69 (latest at the point of writing) things still seem to work, but show a major performance regression.

A little microbenchmark that validates a typical json for us shows the following numbers:
1.0.69

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  46884,301 ± 764,721  ns/op

1.0.67

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  26214,556 ± 425,732  ns/op

0.1.3

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate         avgt   10  14705,189 ± 986,189  ns/op

The regression between 1.0.67 and 1.0.69 can be largely traced back to the work on unevaluated properties in #534 & #544 , I think. Ideally, when the schema doesn't contain this keyword - like in our case - this overhead should be avoided.

The regression between 0.1.3 and 1.0.67 is probably caused by numerous things, but one bigger portion is likely driven by the CollectorContext infrastructure and its repeated Map.get calls. This is also why the unevaluated properties is so expensive I think.

The purple highlights in the following (reversed) flamegraph of 1.0.69 show the CollectorContext being one of the main drivers.
image

For 1.0.67 it shows the following - less but still a significant time is spent on the CollectorContext.
image

Cheers,
Christoph

@dreis2211
Copy link
Contributor Author

dreis2211 commented Apr 29, 2022

Just out of curiousity I fired our example against Everit 1.14.1 (latest at the time of writing) vs. Networknt (1.0.67).

Benchmark                                 Mode  Cnt      Score      Error  Units
MyBenchmark.testValidateEverit            avgt   10  17213,067 ± 2434,057  ns/op
MyBenchmark.testValidateNetworkNt         avgt   10  24811,879 ± 2222,995  ns/op

At least for our very simple example, the regressions apparently put this library into a position where it's not the fastest one anymore and invalidate the 3 year old claim in the README.

It is the fastest Java JSON Schema Validator as far as I know

With 1.0.69 even by quite a margin.

Don't get me wrong. I think Networknt is still great and its API is more intuitive, but I would appreciate if you could spent a few dev-cycles on the validation performance to reclaim the # 1 position.

@stevehu
Copy link
Contributor

stevehu commented Apr 30, 2022

@dreis2211 I have rerun the performance test and successfully reproduced what you have observed. Thanks a lot for raising it and I will try to get it fixed.

@prashanthjos
Copy link
Member

Hi @dreis2211 thank you for bringing this up I will take a look into this. I think we can avoid multiple calls to the CollectorContext by a simple check like hasEvaluatedProperties, this check is similar to hasRequiredProperties method in JsonSchema.java.

@stevehu is there anyway I can run the performance test setup you already have. I want to check the perf tests after I have fixed this issue.

Thanks in advance.

@dreis2211
Copy link
Contributor Author

dreis2211 commented May 2, 2022

As shown above the unevaluatedProperties optimization only reduces things, but networknt has not the fastest schema validator anymore even if this is fixed. Feel free to repurpose this ticket to apply some more general performance work or create a new ticket for further work and keep this focussed at the unevaluated properties stuff.

@dreis2211
Copy link
Contributor Author

@prashanthjos I guess @stevehu just ran https://github.com/networknt/json-schema-validator-perftest . Although it's not a perfect benchmark it shows the regressions as well.

@stevehu
Copy link
Contributor

stevehu commented May 2, 2022

we have a performance test repo that I am currently working on to upgrade to the latest and hooked with JProfiler. Please be aware that the test case was focused on V4 only as other libraries won't support the latest version of the specifications.

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Once we finish this upgrade, we might add other active libraries.

https://github.com/networknt/json-schema-validator-perftest

@stevehu
Copy link
Contributor

stevehu commented May 3, 2022

I have updated the perftest repo and rerun the tests in both IDE and java -jar. The following is the link to the test result.

https://github.com/networknt/json-schema-validator-perftest#results-2022-05-02

I have no idea why the networknt validator is twice as fast as running with java -jar.

@ThanksForAllTheFish
Copy link

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Hopefully it does not sound too blunt, but what is the purpose of comparing performance for v7? There are 2 drafts after that, which puts (imo) v7 in the realm of legacy. I understand advertisement, I guess, but if that is the reason I would focus on supporting draft2020 and be the first one in java (might be a weaker claim, since other libs can catch up, but for people like me who look at the latest it could mean I know where to look when the next json schema drafts come out)

@fdutton
Copy link
Contributor

fdutton commented Apr 30, 2023

@dreis2211 Please try 1.0.81 and let us know how we are doing.

@fdutton fdutton closed this as completed Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants