Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring JSON parser on par with circe-fs2 #491

Merged
merged 11 commits into from
Jul 17, 2023
Merged

Bring JSON parser on par with circe-fs2 #491

merged 11 commits into from
Jul 17, 2023

Conversation

satabin
Copy link
Member

@satabin satabin commented Jun 26, 2023

The goal of this PR is to have performances on par with circe-fs2, so that users can switch fearlessly to fs2-data (see circe/circe-fs2#425).
The existing two steps approach has a major drawback, that prevents this from happening: it builds intermediate Token, which are immediately consumed to build the AST using an instance of Builder.
This can be avoided for people coming from circe-fs2, by adding a new pipe ast.parse that directly transforms the input byte/char/string stream into the AST.

To achieve this, this PR takes inspiration from the Facade approach in jawn to model the ChunkAccumulator in the JSON parser. One such simple accumulator simply wraps the existing VectorBuilder[Token] and makes it possible to build the token stream. Another implementation (BuilderChunkAccumulator[Json]) directly builds the AST from the parser without instantiating any Token. In the future, it can even be extended to implement filtering at the source, without constructing the Tokens that are not selected. The abstraction is kept internal for now, so that we have more latitude to change the API and stabilize it.

  • Implement the parser in terms of ChunkAccumulator
  • Implement the simple TokenChunkAccumulator
  • Implement the AST building BuilderChunkAccumulator
    • implement ast.parse
  • Add sclafix rule to migrate from .through(tokens).through(ast.values) to .through(ast.parse)
  • Add ast.parse documentation
  • Add cookbook to migrate from circe-fs2

Using the new abstraction, here are the results of the benchmark.

Benchmark                                          Mode  Cnt     Score    Error  Units
JsonParserBenchmarks.parseCirceFs2                 avgt   10  5197.451 ± 18.083  us/op
JsonParserBenchmarks.parseJsonFs2DataParse         avgt   10  5259.336 ± 32.579  us/op
JsonParserBenchmarks.parseJsonFs2DataTokens        avgt   10  3531.900 ± 15.076  us/op
JsonParserBenchmarks.parseJsonFs2DataTokensValues  avgt   10  6091.062 ± 94.753  us/op
JsonParserBenchmarks.parseJsonJawn                 avgt   10  1945.402 ± 10.970  us/op
JsonValueBenchmarks.parseEscapedString             avgt   10     2.742 ±  0.021  us/op
JsonValueBenchmarks.parseSimpleString              avgt   10     2.588 ±  0.027  us/op

@satabin satabin requested a review from a team as a code owner June 26, 2023 17:36
@satabin satabin added enhancement New feature or request json labels Jun 26, 2023
@satabin satabin added this to the 1.8.0 milestone Jun 26, 2023
This abstraction allows to create chunks of various types out of an
input stream. One possible accumulator is the `Token` accumulator which
generates the token stream.

Leveraging this abstraction, we can then implement a pipe that builds
directly AST values instead of having intermediate `Token`
representation.
Inspired by the `Facade` abstraction from jawn, we can build the AST
directly, without emitting intermediate tokens, which makes it faster.
@satabin
Copy link
Member Author

satabin commented Jul 2, 2023

@ybasket this PR is ready to be reviewed!

@satabin satabin added the documentation Improvements or additions to documentation label Jul 2, 2023
Copy link
Collaborator

@ybasket ybasket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and sorry for the time it took me to review it!

The fact that it now returs `Unit` makes it clearer it is side
effectful.
@satabin satabin merged commit 136b493 into main Jul 17, 2023
12 checks passed
@satabin satabin deleted the json/chunk-acc branch July 17, 2023 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request json
Development

Successfully merging this pull request may close these issues.

2 participants