Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fair play with JSON #335

Merged
merged 10 commits into from
May 10, 2021
Merged

Fair play with JSON #335

merged 10 commits into from
May 10, 2021

Conversation

9il
Copy link
Contributor

@9il 9il commented May 9, 2021

  • DAW Json Link use checked parsing now like other libraries (it is also a default mode in DAW)
  • Rapid JSON use IEEE number parsing now (like almost all other libraries, or we need to add a note that it is inaccurate)
  • Added DOM suffix for Mir libraries
  • Updated README with notes about top libraries.
  • Use numbers with exponents. This is an important thing. The numbers between 0 and 1 are very easy to parse precisely. The numbers with significant exponents are quite a more interesting thing to parse and libraries may use quite different approaches to do this. For example, Rapid and Mir have their own implementations, simdjson fallbacks to C's stdlib call.

@9il 9il force-pushed the master branch 2 times, most recently from 2c651d3 to fde5bb3 Compare May 9, 2021 06:09
Copy link
Collaborator

@nuald nuald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I'm not quite sure it's worth testing JSON precision as it's definitely not the best format if precision is important (and moreover, it's not fully compatible with IEEE 754 anyway as, for example, it doesn't have the specific values supported like negative zero). However, it could be useful information, so thank you for your efforts.

README.md Outdated Show resolved Hide resolved
json/test_dawjsonlink.cpp Show resolved Hide resolved
json/test_rapid.cpp Show resolved Hide resolved
json/test_rapid_sax.cpp Show resolved Hide resolved
@9il
Copy link
Contributor Author

9il commented May 9, 2021

Frankly, I'm not quite sure it's worth testing JSON precision as it's definitely not the best format if precision is important (and moreover, it's not fully compatible with IEEE 754 anyway as, for example, it doesn't have the specific values supported like negative zero). However, it could be useful information, so thank you for your efforts.

The goal isn't to change Mir position. Unlikely it will be changed, except maybe distance with Rapid will be reduced up to zero. Also, during the last decade programming languages improved number parsing and printing a lot. Ryu is a revolution. JSON is very frequently used for data transferring and serialization (there are "better" formats, but who cares). An adequate library can guarantee that it can write not a special number to JSON and read it back and got the same value. If it can't than it isn't fair. Likely the winner will be simdjson and serde. After this MR it will be clear that they are the fastest correct JSON parsing libraries. Mir needs a test with file or stream input, to show low memory consumption, but at least I can add a special note in my local readme and place a link to the benchmark, but the benchmark should be fair.

I assume you may want a clever C++ configuration, I am not sure I can do it well. If you wish to preserve the unfair configs I can just add additional notes and leave them for the future.

@beached
Copy link
Contributor

beached commented May 10, 2021

Isn't the memory usage already measured twice to separate the library usage from the string data.

@9il
Copy link
Contributor Author

9il commented May 10, 2021

Yep. On the other hand, most of the high-performance libraries require a string on the input. Mir parses data by chunks of a few KB. It can read the 108 MiB JSON file and build complete Amazon's Ion DOM using only 16 MiB (so small because of symbol tables and etc). The total memory consumption would be 16 MiB, not 108+16 MiB.

@beached
Copy link
Contributor

beached commented May 10, 2021

That's really cool and I know a lot of people want a feature like that for things like web servers. I thought about that path, but with mmap being on all POSIX systems and VirtualAlloc on windows, it was less effort to let the OS handle the paging of the file in those cases. It's not for all use cases, but does simplify a lot.

@nuald
Copy link
Collaborator

nuald commented May 10, 2021

Precision flags affect the performance, therefore it would be better to use them in the separate tests. I'm going to submit the PR into the original branch for that PR.

@beached
Copy link
Contributor

beached commented May 10, 2021

Most use cases do not care about 0-2ulp diff from strtod, but do care about perf. Those that do, should have an option to pay for that though.

@nuald
Copy link
Collaborator

nuald commented May 10, 2021

@9il Please merge 9il#1 and update this PR.

@nuald nuald merged commit f9f2a37 into kostya:master May 10, 2021
@9il
Copy link
Contributor Author

9il commented May 10, 2021

@9il Please merge 9il#1 and update this PR.

Awesome, thank you! Could you please update whole JSON benchmark with the stats for the new JSON file, that have adjusted exponents, when you have chance?

@nuald
Copy link
Collaborator

nuald commented May 10, 2021

Yes, I have a scheduled maintenance update on all tests, just need to finish adding the Primes tests first. I think this week I'll publish the up-to-dated numbers.

@dumblob
Copy link

dumblob commented May 11, 2021

Most use cases do not care about 0-2ulp diff from strtod, but do care about perf. Those that do, should have an option to pay for that though.

As a side note - IEEE 754 binary float is worse than just 2 ulp diff. It's a nightmare - see e.g. vlang/v#5180 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants