Fair play with JSON #335

9il · 2021-05-09T05:15:33Z

DAW Json Link use checked parsing now like other libraries (it is also a default mode in DAW)
Rapid JSON use IEEE number parsing now (like almost all other libraries, or we need to add a note that it is inaccurate)
Added DOM suffix for Mir libraries
Updated README with notes about top libraries.
Use numbers with exponents. This is an important thing. The numbers between 0 and 1 are very easy to parse precisely. The numbers with significant exponents are quite a more interesting thing to parse and libraries may use quite different approaches to do this. For example, Rapid and Mir have their own implementations, simdjson fallbacks to C's stdlib call.

nuald

Frankly, I'm not quite sure it's worth testing JSON precision as it's definitely not the best format if precision is important (and moreover, it's not fully compatible with IEEE 754 anyway as, for example, it doesn't have the specific values supported like negative zero). However, it could be useful information, so thank you for your efforts.

README.md

json/test_dawjsonlink.cpp

json/test_rapid.cpp

json/test_rapid_sax.cpp

9il · 2021-05-09T16:31:21Z

Frankly, I'm not quite sure it's worth testing JSON precision as it's definitely not the best format if precision is important (and moreover, it's not fully compatible with IEEE 754 anyway as, for example, it doesn't have the specific values supported like negative zero). However, it could be useful information, so thank you for your efforts.

The goal isn't to change Mir position. Unlikely it will be changed, except maybe distance with Rapid will be reduced up to zero. Also, during the last decade programming languages improved number parsing and printing a lot. Ryu is a revolution. JSON is very frequently used for data transferring and serialization (there are "better" formats, but who cares). An adequate library can guarantee that it can write not a special number to JSON and read it back and got the same value. If it can't than it isn't fair. Likely the winner will be simdjson and serde. After this MR it will be clear that they are the fastest correct JSON parsing libraries. Mir needs a test with file or stream input, to show low memory consumption, but at least I can add a special note in my local readme and place a link to the benchmark, but the benchmark should be fair.

I assume you may want a clever C++ configuration, I am not sure I can do it well. If you wish to preserve the unfair configs I can just add additional notes and leave them for the future.

beached · 2021-05-10T12:24:43Z

Isn't the memory usage already measured twice to separate the library usage from the string data.

9il · 2021-05-10T13:13:34Z

Yep. On the other hand, most of the high-performance libraries require a string on the input. Mir parses data by chunks of a few KB. It can read the 108 MiB JSON file and build complete Amazon's Ion DOM using only 16 MiB (so small because of symbol tables and etc). The total memory consumption would be 16 MiB, not 108+16 MiB.

beached · 2021-05-10T14:16:55Z

That's really cool and I know a lot of people want a feature like that for things like web servers. I thought about that path, but with mmap being on all POSIX systems and VirtualAlloc on windows, it was less effort to let the OS handle the paging of the file in those cases. It's not for all use cases, but does simplify a lot.

nuald · 2021-05-10T16:49:52Z

Precision flags affect the performance, therefore it would be better to use them in the separate tests. I'm going to submit the PR into the original branch for that PR.

beached · 2021-05-10T16:52:01Z

Most use cases do not care about 0-2ulp diff from strtod, but do care about perf. Those that do, should have an option to pay for that though.

nuald · 2021-05-10T17:03:07Z

@9il Please merge 9il#1 and update this PR.

Additional tests have been added.

9il · 2021-05-10T17:24:36Z

@9il Please merge 9il#1 and update this PR.

Awesome, thank you! Could you please update whole JSON benchmark with the stats for the new JSON file, that have adjusted exponents, when you have chance?

nuald · 2021-05-10T17:29:18Z

Yes, I have a scheduled maintenance update on all tests, just need to finish adding the Primes tests first. I think this week I'll publish the up-to-dated numbers.

dumblob · 2021-05-11T20:45:51Z

Most use cases do not care about 0-2ulp diff from strtod, but do care about perf. Those that do, should have an option to pay for that though.

As a side note - IEEE 754 binary float is worse than just 2 ulp diff. It's a nightmare - see e.g. vlang/v#5180 (comment) .

9il added 5 commits May 9, 2021 12:04

DAW Json Link shoudl be checked like others

005b08d

Rapid JSON should use IEEE number parsing

1254629

add DOM suffix for Mir libraries

38fa475

update readme for JSON tests

f0c7921

use numbers with exponents

76e2241

9il force-pushed the master branch 2 times, most recently from 2c651d3 to fde5bb3 Compare May 9, 2021 06:09

add notes about json libraries

413fcb0

9il force-pushed the master branch from fde5bb3 to 413fcb0 Compare May 9, 2021 06:41

nuald reviewed May 9, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

json/test_dawjsonlink.cpp Show resolved Hide resolved

json/test_rapid.cpp Show resolved Hide resolved

json/test_rapid_sax.cpp Show resolved Hide resolved

update links

79759ea

9il mentioned this pull request May 10, 2021

Would you add asdf to langeuage benchmark :) libmir/asdf#2

Closed

Additional tests have been added.

f14c9b7

9il added 2 commits May 11, 2021 00:12

Merge pull request #1 from nuald/additional_tests

5960dd4

Additional tests have been added.

update notes

4aa13ae

nuald merged commit f9f2a37 into kostya:master May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fair play with JSON #335

Fair play with JSON #335

9il commented May 9, 2021 •

edited

Loading

nuald left a comment

9il commented May 9, 2021 •

edited

Loading

beached commented May 10, 2021

9il commented May 10, 2021 •

edited

Loading

beached commented May 10, 2021

nuald commented May 10, 2021

beached commented May 10, 2021

nuald commented May 10, 2021

9il commented May 10, 2021

nuald commented May 10, 2021

dumblob commented May 11, 2021

Fair play with JSON #335

Fair play with JSON #335

Conversation

9il commented May 9, 2021 • edited Loading

nuald left a comment

Choose a reason for hiding this comment

9il commented May 9, 2021 • edited Loading

beached commented May 10, 2021

9il commented May 10, 2021 • edited Loading

beached commented May 10, 2021

nuald commented May 10, 2021

beached commented May 10, 2021

nuald commented May 10, 2021

9il commented May 10, 2021

nuald commented May 10, 2021

dumblob commented May 11, 2021

9il commented May 9, 2021 •

edited

Loading

9il commented May 9, 2021 •

edited

Loading

9il commented May 10, 2021 •

edited

Loading