Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace strtold with non locale dependent version #337

Closed
wants to merge 14 commits into from

Conversation

whackashoe
Copy link
Contributor

@whackashoe whackashoe commented Oct 16, 2016

Hi!

I've taken and reworked a strtod implementation from TI in order to provide this. It's rewritten in an attempt to be solid C++11 rather than C.

Using this fixes #302. It does not support octal or hexadecimal which JSON doesn't recognize in its number type anyway. This seems to be a bit more efficient (possibly due to non support of octal/hexadecimal) though I wouldn't be surprised if these gains disappear once the last few bugs are fixed heh.

Not all tests are passing yet but it's pretty close, perhaps someone can figure out the ones I haven't.

BEFORE:

./json_benchmarks
parse jeopardy.json                           1  1681452236 ns/op
parse canada.json                            50    43518542 ns/op
parse citm_catalog.json                      50    24345478 ns/op
parse twitter.json                          100    12713148 ns/op
parse numbers/floats.json                     5   447398631 ns/op
parse numbers/signed_ints.json                5   280521961 ns/op
parse numbers/unsigned_ints.json              5   278368046 ns/op
dump jeopardy.json                            1  1390675617 ns/op
dump jeopardy.json with indent                1  1578243282 ns/op
./json_benchmarks 21.516s

AFTER:

./json_benchmarks
parse jeopardy.json                           1  1619098873 ns/op
parse canada.json                            50    36922996 ns/op
parse citm_catalog.json                      50    25375433 ns/op
parse twitter.json                          100    13150869 ns/op
parse numbers/floats.json                     5   376901576 ns/op
parse numbers/signed_ints.json                5   300091862 ns/op
parse numbers/unsigned_ints.json              5   297152348 ns/op
dump jeopardy.json                            1  1390073209 ns/op
dump jeopardy.json with indent                1  1616301685 ns/op
./json_benchmarks 20.828s

UNIT TESTS:

(these are run with CXX=clang++ make check)

src/unit-class_parser.cpp:37

src/unit-class_parser.cpp:258: FAILED:
  CHECK( json::parser("0.999").parse() == json(0.999) )
with expansion:
  0.999 == 0.999

regression tests
  issue #89 - nonstandard integer type
src/unit-regression.cpp:36

src/unit-regression.cpp:150: FAILED:
  CHECK( j.get<float>() == 4294967296.0f )
with expansion:
  4294967040.0f == 4294967296.0f

src/unit-regression.cpp:160: FAILED:
  CHECK( j.get<float>() == -2147483650.0f )
with expansion:
  -2147483520.0f == -2147483648.0f

regression tests
  issue #186 miloyip/nativejson-benchmark: floating-point parsing
src/unit-regression.cpp:36

src/unit-regression.cpp:316: FAILED:
  CHECK( j.get<double>() == 2.2250738585072009e-308 )
with expansion:
  0.0 == 0.0

src/unit-regression.cpp:319: FAILED:
  CHECK( j.get<double>() == 0.99999999999999989 )
with expansion:
  1.0 == 1.0

src/unit-regression.cpp:322: FAILED:
  CHECK( j.get<double>() == 1.00000000000000022 )
with expansion:
  1.0 == 1.0

src/unit-regression.cpp:325: FAILED:
  CHECK( j.get<double>() == 72057594037927928.0 )
with expansion:
  72057594037927920.0 == 72057594037927928.0

src/unit-regression.cpp:328: FAILED:
  CHECK( j.get<double>() == 9223372036854774784.0 )
with expansion:
  9223372036854775808.0
  ==
  9223372036854774784.0

src/unit-regression.cpp:331: FAILED:
  CHECK( j.get<double>() == 10141204801825834086073718800384.0 )
with expansion:
  10141204801825832960173811957760.0
  ==
  10141204801825834086073718800384.0

src/unit-regression.cpp:334: FAILED:
  CHECK( j.get<double>() == 5708990770823838890407843763683279797179383808.0 )
with expansion:
  5708990770823843327184944562486185035640602624.0
  ==
  5708990770823838890407843763683279797179383808.0

compliance tests from nativejson-benchmark
  roundtrip
src/unit-testsuites.cpp:103

src/unit-testsuites.cpp:309: FAILED:
  CHECK( j.dump() == json_string )
with expansion:
  "[0.0]" == "[2.2250738585072e-308]"
with message:
  filename := test/data/json_roundtrip/roundtrip29.json

===============================================================================
test cases:      35 |      32 passed |  3 failed
assertions: 8905168 | 8905157 passed | 11 failed

@nlohmann
Copy link
Owner

Thanks for the PR! I'll check it as soon as I find the time.

@nlohmann
Copy link
Owner

Just one question: Can the code be licensed under the MIT license?

@whackashoe
Copy link
Contributor Author

Yeah I think it's fine, it's basically all rewritten so I don't think there is much of an issue that way... plus if they attempted to sue for using a similar variable name or if statement we'd probably become famous.

If you want to be extra safe we can change the comments to be unique as they are just copied verbatim, though I think that'd be horrid as their comments are quite nice.

standard floating point number parsing function based on the type
supplied via the first parameter. Set this to @a
static_cast<number_float_t*>(nullptr).
while (nl_isspace(*fst))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace is already skipped by the lexer, so I think it is safe to skip this loop.

if (cp == '-' or cp == '+')
{
++fst;
successful_parse = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lexer discards any string that violates JSON's specification for numbers. Therefore, we do not need additional verification here.

@whackashoe
Copy link
Contributor Author

Since it is relying on other conditions now with these 2 changes, I think we can strip out endptr and errno as well.

this changes a few things. the name of the function to better represent
what it does (string to json number) - we aren't strtoul compliant so
best not to lie about that.

we remove template code and just use one function. this makes it a bit
simpler and we can always add it back if it buys us anything. the
penalty for casting down from long double seems to be inconsequential.
@whackashoe
Copy link
Contributor Author

The unit-regression failures seem to be just an off by 1 error, if you look at expected and result values as hex:

0x4370000000000000 == 0x436fffffffffffff

0x43e0000000000000 == 0x43dfffffffffffff

if (*str == 'e' or *str == 'E')
{
cp = *++str;
bool negative_exp = cp == '-'; // read in exponent sign (+/-)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negative_exp could be const bool

while (nl_isdigit(cp = *str))
{
result = result * 10 + (cp - '0');
++str;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't str be incremented in the while condition just like in the loop below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we may not want to increment if nl_isdigit fails so it must be in body.

supplied via the first parameter. Set this to @a
static_cast<number_float_t*>(nullptr).
int count = 0; // exponent calculation
if (! nl_isdigit(cp))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use not rather than !

{1.e1L, 1.e2L, 1.e4L, 1.e8L, 1.e16L, 1.e32L, 1.e64L, 1.e128L, 1.e256L}
};

if (exp > std::numeric_limits<long double>::max_exponent10)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting here, there a now comments in the code. Please briefly describe each conditional and loop.

@whackashoe
Copy link
Contributor Author

@nlohmann could you add help-wanted tag to this, I am too busy right now to do anything with this right now but it is very close.

@nlohmann nlohmann added the state: help needed the issue needs help to proceed label Nov 2, 2016
@nlohmann
Copy link
Owner

Hey @whackashoe do you need help with this?

@whackashoe
Copy link
Contributor Author

@nlohmann Yes. I haven't had time to figure out the last issues with this, though iirc its basically at a level where it works except for an edge case (maybe more than one).

@nlohmann
Copy link
Owner

I'll have a look. I may open a feature branch myself for the moment to fix the conflicts and fiddle around a bit.

nlohmann added a commit that referenced this pull request Nov 12, 2016
- took current state of #337
- adjusted parser to reject incomplete numbers
- simplified number parsing code as it could rely on correct numbers
@nlohmann
Copy link
Owner

I opened a feature branch. I changed the parser to reject incomplete numbers. This helped to simplify the number parser. There are still three tests failing - I'll check tomorrow.

{
if (plus_or_minus)
{
*--str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a '*' here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! @nlohmann want to incorp these into your feature branch?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes no sense. Please have a look at https://github.com/nlohmann/json/blob/feature/strtold/src/json.hpp.re2c#L8139 where I started an overworked version of this PR.


@param[in] type the @ref number_float_t in use
*--str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@nlohmann
Copy link
Owner

Please have a look at https://github.com/nlohmann/json/blob/feature/strtold/src/json.hpp.re2c#L8139 where I started an overworked version of this PR, and also the discussion in #302.

@TurpentineDistillery
Copy link

I see the code parses to long double, and then the return value is cast to float_number_t.
This will yield different results than parsing into float_number_t directly (unless it is long double, of course). See issue #186

@TurpentineDistillery
Copy link

Apparently implementing strtod correctly is harder than it seems.
Here's mozilla implementation

@nlohmann
Copy link
Owner

@whackashoe Any idea how to proceed here? I agree with @TurpentineDistillery that correct floating-point handling is really hard...

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Dec 14, 2016
@nlohmann
Copy link
Owner

What about this PR? Is it likely that the remaining issues can be fixed? There is also PR #378 dealing with the same issue.

@whackashoe
Copy link
Contributor Author

It seems like it will need significant changes to work. You can close this if it's dirtying up the queue, I don't have time to work on it for a while.

@nlohmann
Copy link
Owner

Thanks for checking back. I close this for now.

@nlohmann nlohmann closed this Dec 28, 2016
falbrechtskirchinger added a commit to falbrechtskirchinger/json that referenced this pull request Mar 6, 2022
Move definition of common macros (e.g., JSON_HAS_CPP_*) into macros.hpp
header file and update unit test sources.

Incidentally enables the regression tests for nlohmann#2546 and nlohmann#3070.

A CHECK_THROWS_WITH_AS in nlohmann#3070 was disabled, which is tracked here: nlohmann#337
falbrechtskirchinger added a commit to falbrechtskirchinger/json that referenced this pull request Mar 6, 2022
Move definition of common macros (e.g., JSON_HAS_CPP_*) into macros.hpp
header file and update unit test sources.

Incidentally enables the regression tests for nlohmann#2546 and nlohmann#3070.

A CHECK_THROWS_WITH_AS in nlohmann#3070 was disabled, which is tracked here: nlohmann#337
falbrechtskirchinger added a commit to falbrechtskirchinger/json that referenced this pull request Mar 6, 2022
Move definition of common macros (e.g., JSON_HAS_CPP_*) into macros.hpp
header file and update unit test sources.

Incidentally enables the regression tests for nlohmann#2546 and nlohmann#3070.

A CHECK_THROWS_WITH_AS in nlohmann#3070 was disabled, which is tracked here: nlohmann#337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: help needed the issue needs help to proceed state: please discuss please discuss the issue or vote for your favorite option
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing of floats locale dependent
4 participants