Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

templated input adapters #1950

Merged
merged 9 commits into from
May 14, 2020
Merged

templated input adapters #1950

merged 9 commits into from
May 14, 2020

Conversation

FrancoisChabot
Copy link
Contributor

@FrancoisChabot FrancoisChabot commented Feb 19, 2020

This addresses #1457.

This change templates the lexer and parser on the input adapter. This accelerates parsing by:

  • removing a virtual function call that was being done on each character.
  • removing an extra indirection each time a character is pulled from the input
  • Allows the compiler to unroll loops more effectively.

Notes:

  • I jumped through a few hoops to try and interfere as little as possible with the existing codebase. (notably making input_adapter a function with a bunch of overloads). This PR introduces enough disruption as it is.
  • This will cause a bit of code bloat in executables making use of multiple types of input adapters, but that's kinda by design, since the parsing algorithm is now going to be different when parsing continuous buffers vs std::istream for example. Adding a compilation flag that forces every input_adapter() call to return a std::shared_ptr<input_adapter_protocol> would allow someone to force the current behavior, but I'm sure anyone would actually want that.

Preliminary benchmark results (the benchmark code was completely unaltered):

Running ./json_benchmarks
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

Before:

ParseFile/jeopardy             756273922 ns  756252923 ns          1   70.0573MB/s
ParseFile/canada                40617999 ns   40616005 ns         17   52.8553MB/s
ParseFile/citm_catalog          13887657 ns   13886150 ns         49   118.621MB/s
ParseFile/twitter                6637993 ns    6637624 ns         99   90.7341MB/s
ParseFile/floats               331636695 ns  331623524 ns          2   65.1949MB/s
ParseFile/signed_ints          260474009 ns  260455959 ns          3   89.2653MB/s
ParseFile/unsigned_ints        253288270 ns  253277160 ns          3   91.8672MB/s
ParseFile/small_signed_ints    134354774 ns  134344022 ns          5   87.9442MB/s
ParseString/jeopardy           639711775 ns  639663043 ns          1   82.8264MB/s
ParseString/canada              39393165 ns   39385726 ns         18   54.5063MB/s
ParseString/citm_catalog        13733981 ns   13733392 ns         51   119.941MB/s
ParseString/twitter              6728397 ns    6727613 ns        104   89.5204MB/s
ParseString/floats             324777625 ns  324759540 ns          2   66.5729MB/s
ParseString/signed_ints        249387091 ns  249374529 ns          3    93.232MB/s
ParseString/unsigned_ints      246544683 ns  246511734 ns          3   94.3884MB/s
ParseString/small_signed_ints  131169641 ns  131153913 ns          5   90.0833MB/s

After:

ParseFile/jeopardy             699275653 ns  698337652 ns          1   75.8673MB/s
ParseFile/canada                39039109 ns   39038367 ns         18   54.9913MB/s
ParseFile/citm_catalog          12542385 ns   12541780 ns         53   131.336MB/s
ParseFile/twitter                6008025 ns    6007541 ns        106    100.25MB/s
ParseFile/floats               308449194 ns  308446408 ns          2   70.0938MB/s
ParseFile/signed_ints          240709926 ns  240703930 ns          3   96.5904MB/s
ParseFile/unsigned_ints        237952035 ns  237947858 ns          3   97.7855MB/s
ParseFile/small_signed_ints    124701521 ns  124700671 ns          5   94.7451MB/s
ParseString/jeopardy           575891841 ns  575879105 ns          1   92.0002MB/s
ParseString/canada              38789711 ns   38788899 ns         13   55.3449MB/s
ParseString/citm_catalog        12558964 ns   12558833 ns         55   131.158MB/s
ParseString/twitter              6084682 ns    6084677 ns         99   98.9796MB/s
ParseString/floats             301731229 ns  301725636 ns          2   71.6551MB/s
ParseString/signed_ints        237410417 ns  237406140 ns          3   97.9321MB/s
ParseString/unsigned_ints      231609294 ns  231595549 ns          3   100.468MB/s
ParseString/small_signed_ints  119460428 ns  119458165 ns          6    98.903MB/s

Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header file single_include/nlohmann/json.hpp. The whole process is described here.

@FrancoisChabot
Copy link
Contributor Author

The 4 code violations are all from effectively unaltered code. I could fix them, but I don't think this PR is the place to do that.

@coveralls
Copy link

coveralls commented Feb 19, 2020

Coverage Status

Coverage increased (+0.0009%) to 99.842% when pulling a4266bb on FrancoisChabot:issues/1457 into a414e35 on nlohmann:develop.

@lgtm-com
Copy link

lgtm-com bot commented Feb 19, 2020

This pull request introduces 2 alerts when merging 2e2cf02 into 973c52d - view on LGTM.com

new alerts:

  • 1 for Overloaded assignment does not return 'this'
  • 1 for Missing return statement

@FrancoisChabot
Copy link
Contributor Author

FrancoisChabot commented Feb 20, 2020

The 2 remaining Travis failures seem like CI setup issues unrelated to my change (it fails to build CMake somehow). This should be ready for review at this point.

@FrancoisChabot
Copy link
Contributor Author

Is there anything else preventing this from going through? Not that I'm any rush to have that go through, but I can't guarantee that I'll have the availability to address further changes indefinitely.

@nlohmann
Copy link
Owner

I’ll check soon. Sorry for the delay.

Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Show resolved Hide resolved
@nlohmann
Copy link
Owner

I think it would be also a good idea to provide an example for a user-define input adapter. What do you think?

@FrancoisChabot
Copy link
Contributor Author

I think it would be also a good idea to provide an example for a user-define input adapter. What do you think?

So I've been giving this is bit of thought before diving in.

It's certainly possible to write a custom input adapter, but it's far from the ideal interface here. The "correct" answer would be to support input iterators in general, and have the user express their inputs as a pair of custom input iterators of char or wchar (basically the same way boost::spirit handles user-defined data sources).

I'm kinda tempted to just add that support in instead of creating an example that would be deprecated as soon as that went.

Thoughts?

@abrownsword
Copy link

Yes, please offer support for user defined input_adapters. There are other (now closed as stale) issues for custom input_adapter and custom output_adapter. These would be very very useful for a bunch of purposes -- custom containers, custom filesystem interfaces (e.g. through zlib), filters to do things like stripping non-standard comments, and so forth.

Pretty please with sugar on top?

@nlohmann
Copy link
Owner

@FrancoisChabot sorry for not checking back on this earlier! I really like the changes, but do you think merging now makes sense, when the request for user-defined input adapters is still open?

@FrancoisChabot
Copy link
Contributor Author

Yes, I think custom input adapters are a separate issue.

Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nlohmann nlohmann self-assigned this May 13, 2020
@nlohmann nlohmann linked an issue May 13, 2020 that may be closed by this pull request
@nlohmann nlohmann added this to the Release 3.8.0 milestone May 13, 2020
@nlohmann nlohmann merged commit 0857140 into nlohmann:develop May 14, 2020
@nlohmann
Copy link
Owner


🔖 Release item

This issue/PR will be part of the next release of the library. This template helps preparing the release notes.

Type

  • ✨ New Feature
  • 🐛 Bug Fix
  • ⚡️ Improvement
  • 🔨 Further Change
  • 🔥 Deprecated function

Description


@nlohmann
Copy link
Owner

@FrancoisChabot Thanks a lot!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Substantial performance penalty caused by polymorphic input adapter
4 participants