Major performance regression caused by unevaluatedProperties #564

dreis2211 · 2022-04-29T07:45:27Z

Hi,

I've seen an internal customer project being on an ancient version (0.1.3) and wanted to upgrade things. After the update to 1.0.69 (latest at the point of writing) things still seem to work, but show a major performance regression.

A little microbenchmark that validates a typical json for us shows the following numbers:
1.0.69

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  46884,301 ± 764,721  ns/op

1.0.67

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate          avgt   10  26214,556 ± 425,732  ns/op

0.1.3

Benchmark                         Mode  Cnt      Score     Error  Units
MyBenchmark.testValidate         avgt   10  14705,189 ± 986,189  ns/op

The regression between 1.0.67 and 1.0.69 can be largely traced back to the work on unevaluated properties in #534 & #544 , I think. Ideally, when the schema doesn't contain this keyword - like in our case - this overhead should be avoided.

The regression between 0.1.3 and 1.0.67 is probably caused by numerous things, but one bigger portion is likely driven by the CollectorContext infrastructure and its repeated Map.get calls. This is also why the unevaluated properties is so expensive I think.

The purple highlights in the following (reversed) flamegraph of 1.0.69 show the CollectorContext being one of the main drivers.

For 1.0.67 it shows the following - less but still a significant time is spent on the CollectorContext.

Cheers,
Christoph

The text was updated successfully, but these errors were encountered:

dreis2211 · 2022-04-29T10:50:47Z

Just out of curiousity I fired our example against Everit 1.14.1 (latest at the time of writing) vs. Networknt (1.0.67).

Benchmark                                 Mode  Cnt      Score      Error  Units
MyBenchmark.testValidateEverit            avgt   10  17213,067 ± 2434,057  ns/op
MyBenchmark.testValidateNetworkNt         avgt   10  24811,879 ± 2222,995  ns/op

At least for our very simple example, the regressions apparently put this library into a position where it's not the fastest one anymore and invalidate the 3 year old claim in the README.

It is the fastest Java JSON Schema Validator as far as I know

With 1.0.69 even by quite a margin.

Don't get me wrong. I think Networknt is still great and its API is more intuitive, but I would appreciate if you could spent a few dev-cycles on the validation performance to reclaim the # 1 position.

stevehu · 2022-04-30T01:26:55Z

@dreis2211 I have rerun the performance test and successfully reproduced what you have observed. Thanks a lot for raising it and I will try to get it fixed.

prashanthjos · 2022-05-02T11:50:28Z

Hi @dreis2211 thank you for bringing this up I will take a look into this. I think we can avoid multiple calls to the CollectorContext by a simple check like hasEvaluatedProperties, this check is similar to hasRequiredProperties method in JsonSchema.java.

@stevehu is there anyway I can run the performance test setup you already have. I want to check the perf tests after I have fixed this issue.

Thanks in advance.

dreis2211 · 2022-05-02T11:57:04Z

As shown above the unevaluatedProperties optimization only reduces things, but networknt has not the fastest schema validator anymore even if this is fixed. Feel free to repurpose this ticket to apply some more general performance work or create a new ticket for further work and keep this focussed at the unevaluated properties stuff.

dreis2211 · 2022-05-02T12:48:22Z

@prashanthjos I guess @stevehu just ran https://github.com/networknt/json-schema-validator-perftest . Although it's not a perfect benchmark it shows the regressions as well.

stevehu · 2022-05-02T13:11:04Z

we have a performance test repo that I am currently working on to upgrade to the latest and hooked with JProfiler. Please be aware that the test case was focused on V4 only as other libraries won't support the latest version of the specifications.

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Once we finish this upgrade, we might add other active libraries.

https://github.com/networknt/json-schema-validator-perftest

stevehu · 2022-05-03T23:51:06Z

I have updated the perftest repo and rerun the tests in both IDE and java -jar. The following is the link to the test result.

https://github.com/networknt/json-schema-validator-perftest#results-2022-05-02

I have no idea why the networknt validator is twice as fast as running with java -jar.

ThanksForAllTheFish · 2022-05-27T06:21:41Z

I am going to remove the Fge as it is not active anymore and the developer has given up the repo. Also, I will upgrade the test to V7 as the everit supports up to V7.

Hopefully it does not sound too blunt, but what is the purpose of comparing performance for v7? There are 2 drafts after that, which puts (imo) v7 in the realm of legacy. I understand advertisement, I guess, but if that is the reason I would focus on supporting draft2020 and be the first one in java (might be a weaker claim, since other libs can catch up, but for people like me who look at the latest it could mean I know where to look when the next json schema drafts come out)

fdutton · 2023-04-30T16:10:11Z

@dreis2211 Please try 1.0.81 and let us know how we are doing.

dreis2211 mentioned this issue Apr 29, 2022

Remove unused variable in JsonNodeUtil #566

Merged

caseycrites mentioned this issue Mar 29, 2023

Using failFast makes validation slower #685

Closed

felixkrull-neuland mentioned this issue Apr 21, 2023

Significant performance regression in 1.0.80 #721

Closed

fdutton closed this as completed Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major performance regression caused by unevaluatedProperties #564

Major performance regression caused by unevaluatedProperties #564

dreis2211 commented Apr 29, 2022 •

edited

Loading

dreis2211 commented Apr 29, 2022 •

edited

Loading

stevehu commented Apr 30, 2022

prashanthjos commented May 2, 2022

dreis2211 commented May 2, 2022 •

edited

Loading

dreis2211 commented May 2, 2022

stevehu commented May 2, 2022

stevehu commented May 3, 2022

ThanksForAllTheFish commented May 27, 2022

fdutton commented Apr 30, 2023

Major performance regression caused by unevaluatedProperties #564

Major performance regression caused by unevaluatedProperties #564

Comments

dreis2211 commented Apr 29, 2022 • edited Loading

dreis2211 commented Apr 29, 2022 • edited Loading

stevehu commented Apr 30, 2022

prashanthjos commented May 2, 2022

dreis2211 commented May 2, 2022 • edited Loading

dreis2211 commented May 2, 2022

stevehu commented May 2, 2022

stevehu commented May 3, 2022

ThanksForAllTheFish commented May 27, 2022

fdutton commented Apr 30, 2023

dreis2211 commented Apr 29, 2022 •

edited

Loading

dreis2211 commented Apr 29, 2022 •

edited

Loading

dreis2211 commented May 2, 2022 •

edited

Loading