number_converter is slow #422

Crzyrndm · 2019-11-17T02:46:18Z

Related to #421
It was noticed during benchmarking that, at least with MSVC 16 (VS 2019), number_converter::stold made a very significant impact on load times (20% throughput was gained by switching to strtod). number_converter internally uses istringstream

strtod will break when the set locale uses ',' as the decimal point, so it isn't an option, but it shows that there is need for an alternative to the current string->double routine. strtod_l is a widely supported extension (POSIX 2002 (?)) that would fit the requirements

MSVC 2015+ uses _strtod_l in the PR, also non-standard, but documented and microsoft goes out of their way not to break back-compat
POSIX platforms are less clear. The main issue is the platforms can't seem to agree on which headers to include for the necessary bits. As such, they still use the original number_converter implementation

The best solution to this would be an external library which either just does locale independent double parsing (note: this is a hard problem, many libs have issues), or wraps all the platform nastiness away. A mediocre solution would be to detect known platforms and #ifdef in the appropriate strtod_l machinery leaving the slow number_converter as the fallback for unknown toolchains (like has been done in the linked PR).

tfussell · 2019-11-17T12:26:54Z

Thanks for looking into this. My original focus was on correctness over performance, but with such significant gains possible from small changes, this is probably worth looking more into. I always liked the nlohmann/json codebase as they have high quality code and many of the same problems as xlnt. Looks like their approach for this problem was basically to convert the decimal separator into a comma depending on the locale and then use strtod. Here's the PR nlohmann/json#450

paulharris · 2019-11-17T12:57:14Z

For performance, (at the cost of additional library requirements), you could look at Boost Qi library - fast text to double/float parsing and I found this interesting library recently: https://github.com/ulfjack/ryu For converting from a float to text (shortest accurate representation).

…

On Sun, 17 Nov 2019 at 20:26, Thomas Fussell ***@***.***> wrote: Thanks for looking into this. My original focus was on correctness over performance, but with such significant gains possible from small changes, this is probably worth looking more into. I always liked the nlohmann/json codebase as they have high quality code and many of the same problems as xlnt. Looks like their approach for this problem was basically to convert the decimal separator into a comma depending on the locale and then use strtod. Here's the PR nlohmann/json#450 <nlohmann/json#450> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#422?email_source=notifications&email_token=AAMSAS64G4LV6INYU2SDDITQUE2A5A5CNFSM4JOH4NYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEIKXDA#issuecomment-554740620>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMSASYZ76ZVSVJWMYKOAHDQUE2A5ANCNFSM4JOH4NYA> .

Crzyrndm · 2019-11-18T05:31:18Z

@tfussell
That sounds much less hacky than what strtod_l very quickly turned into. I will see if I can whip something up

@paulharris
the primary issue with any external lib dealing with floats (parsing or formatting) is correctness. I know enough to know that I don't know enough, hence I'm not keen on straying outside C / std / system libraries.

The other option that is part of std is C++17's to_chars / from_chars, but only MSVC has implemented so far AFAIK. Hence that would still require preprocessor to detect (luckily, std feature macros are well documented). I will test std::from_chars and see if the trickery is worthwhile, but I don't expect it will be on the same order as dropping istringstream (it could be quite noticeable if we factor in having to scan for comma's though...)

Crzyrndm · 2019-11-18T06:59:03Z

Pushed the suggested change in #421 as it cleans up quite a hacky section
Somewhat surprisingly, for the happy case at least (parsing in a '.' locale), it is even faster than using _strtod_l was

Crzyrndm · 2019-11-18T07:35:36Z

Testing MSVC std::from chars dropped another 1-2%, which would indicate to me that you're unlikely to see massive gains

as it happens, xlnt has issues with the locale not set to use '.' as the decimal point. There are a handful of locations during parsing where xml::parser::attribute<double> are used, which internally just use stringstream without imbuing the "C" locale, so...

Crzyrndm · 2019-11-18T07:47:13Z

and fixing those pushed the bench maybe 1-2% the other way using the "de-DE" (german) locale. Should probably add a test so that doesn't break again... (EDIT: test added, test removed, CI doesn't know about loacels, and I can't be bothered right now)

tfussell · 2019-12-19T22:14:24Z

I think this is fixed now. Going to close it.

Crzyrndm mentioned this issue Nov 17, 2019

Accelerated worksheet parsing #421

Merged

tfussell self-assigned this Nov 17, 2019

tfussell added the performance label Nov 17, 2019

tfussell closed this as completed Dec 19, 2019

Crzyrndm mentioned this issue Mar 1, 2020

microbenchmarks for double<->string conversion, serialisation improvements #447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number_converter is slow #422

number_converter is slow #422

Crzyrndm commented Nov 17, 2019 •

edited

Loading

tfussell commented Nov 17, 2019

paulharris commented Nov 17, 2019 via email

Crzyrndm commented Nov 18, 2019 •

edited

Loading

Crzyrndm commented Nov 18, 2019

Crzyrndm commented Nov 18, 2019 •

edited

Loading

Crzyrndm commented Nov 18, 2019 •

edited

Loading

tfussell commented Dec 19, 2019

number_converter is slow #422

number_converter is slow #422

Comments

Crzyrndm commented Nov 17, 2019 • edited Loading

tfussell commented Nov 17, 2019

paulharris commented Nov 17, 2019 via email

Crzyrndm commented Nov 18, 2019 • edited Loading

Crzyrndm commented Nov 18, 2019

Crzyrndm commented Nov 18, 2019 • edited Loading

Crzyrndm commented Nov 18, 2019 • edited Loading

tfussell commented Dec 19, 2019

Crzyrndm commented Nov 17, 2019 •

edited

Loading

Crzyrndm commented Nov 18, 2019 •

edited

Loading

Crzyrndm commented Nov 18, 2019 •

edited

Loading

Crzyrndm commented Nov 18, 2019 •

edited

Loading