Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale-independent str-to-num #379

Conversation

TurpentineDistillery
Copy link

This implements #302

// this is a helper to determine whether to
// parse the token into floating-point or
// integral type. We wouldn't need it if
// we had separate token types for interal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: interal

std::string tempstr;
std::array<char, 64> buf;
do {
if (decimal_point_char == '.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do/while(0) instead of a pair of nested if statements seems a bit too tricky.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do{...}while(0) was originally the technique to group statements in C macros, but it can also be used as a local scope that you can break out of, kind of a disciplined scoped goto. I just have really strong aversion to nesting control flow and modifying state; to a fault perhaps : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm familiar. I don't particularly like nesting either, but this saves only one level of nesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now: http://pastebin.com/UVTzUF6u
Simpler with no lambda capture, no lambda call at the end, fewer break/return statements: http://pastebin.com/vQu7y0CA
Personally, I don't think avoiding the one level of indentation for 10 lines of code is worth the extra complexity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just about avoiding an extra level of control flow, however. I usually prefer the lambda style for the following reasons:

  • manifests intent by declaring what we're about to compute and what the inputs are (especially handy in in-person code reviews)
  • instead of the result being assigned as a side-effect in various control-flow branches, all control flow is "joined" into the return value, and the result of the lambda can be assigned to a const variable.
  • there's no (or may be minimal) runtime overhead.

In short-enough cases, however, this approach would probably be considered an overkill by many (but not by me - there’s no lambda small enough!)

Stylistic issues aside, if I could only manage to make setlocale affect the behavior of snprintf in the unit test on all compilers... That's even before any json-specific checks are triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setlocale is not thread specific or thread safe, so I would recommend avoiding it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setlocale with C locales is the equivalent of std::locale::global with C++ locales, which is not thread-specific or thread-safe either.

We're not calling it it from the library code, however, but from the unit test to simulate the current state of a locale-specific application. All locale-dependent functions, including std::to_string() or printf() or cout << are not thread-safe in that sense, as a rogue thread may change C or C++ locale from under your feet by changing the locale in-flight because "they" needed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, missed that you were referring to test code.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 99.415% when pulling e41a956 on TurpentineDistillery:feature/locale_independent_str_to_num into bc28942 on nlohmann:develop.

@brief parse floating point number
public:
template<typename T>
struct maybe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class isn't used in this PR.


@param[in] type the @ref number_float_t in use
throw if can't parse the entire string as a number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If exceptions are being removed, this comment needs to be updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing; will clean up in the next push.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 99.91% when pulling 1c029b9 on TurpentineDistillery:feature/locale_independent_str_to_num into bc28942 on nlohmann:develop.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 6fba52b on TurpentineDistillery:feature/locale_independent_str_to_num into 79fa8b2 on nlohmann:develop.

@nlohmann
Copy link
Owner

nlohmann commented Jan 2, 2017

I shall have a look at the PR this week.

@nlohmann nlohmann self-assigned this Jan 2, 2017
@nlohmann nlohmann added this to the Release 2.1.0 milestone Jan 2, 2017
@nlohmann
Copy link
Owner

nlohmann commented Jan 4, 2017

@TurpentineDistillery Could you update the PR or resolve the conflicts, please?

@TurpentineDistillery
Copy link
Author

Will do.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling c236b59 on TurpentineDistillery:feature/locale_independent_str_to_num into 9f6c86f on nlohmann:develop.

@nlohmann
Copy link
Owner

In lieu of better approaches, here are the number using Google Benchmark:

$ ./benchmark-develop --benchmark_filter="parse.*" --benchmark_report_aggregates_only=true --benchmark_repetitions=30 
Run on (8 X 2900 MHz CPU s)
2017-01-11 22:07:32
Benchmark                                                         Time           CPU Iterations
-----------------------------------------------------------------------------------------------
parse data/jeopardy/jeopardy.json_mean                   1024555051 ns 1023570733 ns          1   51.8754MB/s
parse data/jeopardy/jeopardy.json_stddev                   47953641 ns   47365176 ns          0   2.47509MB/s
parse data/nativejson-benchmark/canada.json_mean           63870560 ns   63820661 ns         11   33.7019MB/s
parse data/nativejson-benchmark/canada.json_stddev          2750435 ns    2732684 ns          0    1.5052MB/s
parse data/nativejson-benchmark/citm_catalog.json_mean     12256813 ns   12245026 ns         52   135.116MB/s
parse data/nativejson-benchmark/citm_catalog.json_stddev     812191 ns     804674 ns          0   9.08961MB/s
parse data/nativejson-benchmark/twitter.json_mean           8013096 ns    8005479 ns         91   75.3725MB/s
parse data/nativejson-benchmark/twitter.json_stddev          345569 ns     341982 ns          0   3.32091MB/s
parse data/numbers/floats.json_mean                       386869345 ns  386530517 ns          2   55.9661MB/s
parse data/numbers/floats.json_stddev                       9113975 ns    9023610 ns          0   1.37988MB/s
parse data/numbers/signed_ints.json_mean                  116659793 ns  116580717 ns          6   199.814MB/s
parse data/numbers/signed_ints.json_stddev                  4985557 ns    4946958 ns          0   9.07039MB/s
parse data/numbers/unsigned_ints.json_mean                115276698 ns  115201656 ns          6    202.08MB/s
parse data/numbers/unsigned_ints.json_stddev                2588632 ns    2572249 ns          0   4.71005MB/s

$ ./benchmark-parse-pr --benchmark_filter="parse.*" --benchmark_report_aggregates_only=true --benchmark_repetitions=30 
Run on (8 X 2900 MHz CPU s)
2017-01-11 22:10:50
Benchmark                                                         Time           CPU Iterations
-----------------------------------------------------------------------------------------------
parse data/jeopardy/jeopardy.json_mean                   1035173889 ns 1034375333 ns          1   51.2314MB/s
parse data/jeopardy/jeopardy.json_stddev                   15369943 ns   15138550 ns          0   774.835kB/s
parse data/nativejson-benchmark/canada.json_mean           62640302 ns   62605706 ns         11   34.3258MB/s
parse data/nativejson-benchmark/canada.json_stddev          1964575 ns    1955608 ns          0   1.13517MB/s
parse data/nativejson-benchmark/citm_catalog.json_mean     13005561 ns   12994596 ns         52   126.821MB/s
parse data/nativejson-benchmark/citm_catalog.json_stddev     285190 ns     282784 ns          0   2.81027MB/s
parse data/nativejson-benchmark/twitter.json_mean           8288143 ns    8279872 ns         82   72.8825MB/s
parse data/nativejson-benchmark/twitter.json_stddev          360370 ns     356344 ns          0    3.3744MB/s
parse data/numbers/floats.json_mean                       366096955 ns  365863167 ns          2   59.1376MB/s
parse data/numbers/floats.json_stddev                       9687455 ns    9619537 ns          0    1.6742MB/s
parse data/numbers/signed_ints.json_mean                  127861852 ns  127771639 ns          6   182.021MB/s
parse data/numbers/signed_ints.json_stddev                  2265199 ns    2245588 ns          0   3.28731MB/s
parse data/numbers/unsigned_ints.json_mean                123481147 ns  123403006 ns          6   188.589MB/s
parse data/numbers/unsigned_ints.json_stddev                1737543 ns    1722133 ns          0   2.64446MB/s

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jan 11, 2017
@TurpentineDistillery
Copy link
Author

From the theoretical standpoint, the numbers are expected to be about the same, except that integral types are parsed with std::strto[u]ll, which may be slower than doing that "manually", but not by much.

@nlohmann nlohmann modified the milestones: Release 2.1.0, Release 3.0.1 Jan 24, 2017
nlohmann added a commit that referenced this pull request Feb 5, 2017
@nlohmann
Copy link
Owner

I forked the branch to https://github.com/nlohmann/json/tree/TurpentineDistillery-feature/locale_independent_str_to_num and made some changes. Once Travis is done checking, I shall merge it to develop. Thanks @TurpentineDistillery for the effort and @gregmarr for the comments!

@nlohmann nlohmann removed the state: please discuss please discuss the issue or vote for your favorite option label Feb 13, 2017
@nlohmann nlohmann modified the milestones: Release 2.1.1, Release 3.0.1 Feb 13, 2017
@nlohmann
Copy link
Owner

Closed with #450.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants