Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tm formatting #2602

Merged
merged 2 commits into from
Nov 25, 2021
Merged

Conversation

phprus
Copy link
Contributor

@phprus phprus commented Nov 14, 2021

Optimize tm formatting for:

  • C-locale and alternative representation.
  • %z format (timezone offset from UTC).

@vitaut
Copy link
Contributor

vitaut commented Nov 17, 2021

Thanks for the PR. Could you provide any benchmark results showing the effects of this change?

void on_utc_offset() { format_localized('z'); }
void on_utc_offset() {
#if defined(_WIN32)
_tzset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't call _tzset or tzset because it changes global state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tzset / tzset is used internally in std::time_put.

Without calling them, it is impossible to get the timezone correctly (if the timezone is not stored in struct std::tm).

Windows: https://docs.microsoft.com/en-us/cpp/c-runtime-library/daylight-dstbias-timezone-and-tzname?view=msvc-140 (Remarks section)
Solaris: https://docs.oracle.com/cd/E86824_01/html/E54766/ctime-3c.html

On Solaris and Windows tzset is threadsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitaut, What do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit system-specific logic here and suggest just detecting if tm_gmtoff is available and otherwise go through format_localized as before. The presence of tm_gmtoff can be detected in a portable way using SFINAE.

@phprus
Copy link
Contributor Author

phprus commented Nov 17, 2021

Benchmark: https://github.com/phprus/fmt-bench/tree/optimize-tm-formatting-5

*BSD, Apple, glibc: Acceleration 10-20 times (%z).
This is std::put_time and ostrstream overhead.

@phprus
Copy link
Contributor Author

phprus commented Nov 17, 2021

cc @alexezeder

@phprus
Copy link
Contributor Author

phprus commented Nov 17, 2021

Other formatters (%EY, ...) on all OS with C-locale: 10-20 times

@alexezeder
Copy link
Contributor

cc @alexezeder

I'm not 100% sure why I was mentioned here, but here are the results on my machine. 😏

Currently, there is no option to run benchmarks for PRs in fmt_bnchmrk_gnrtr (I doubt it would be implemented someday). But the only one runner there uses the same version of GCC - 11.

GCC 11 + libstdc++ (C++20)

Benchmark old time new time
FMTFormatter_z 261 ns 26.3 ns
FMTFormatterCompile_z 246 ns 11.6 ns
FMTFormatter_Y 263 ns 26.1 ns
FMTFormatterCompile_Y 244 ns 10.5 ns
Raw output
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=20 ..
$ ./fmt_test_old
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
FMTFormatter_z               261 ns          261 ns      2692446
FMTFormatterCompile_z        246 ns          246 ns      2851517
FMTFormatter_Y               263 ns          263 ns      2662867
FMTFormatterCompile_Y        244 ns          244 ns      2867342
$ ./fmt_test_new
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
FMTFormatter_z              26.3 ns         26.3 ns     26136296
FMTFormatterCompile_z       11.6 ns         11.6 ns     61292230
FMTFormatter_Y              26.1 ns         26.1 ns     26920284
FMTFormatterCompile_Y       10.5 ns         10.5 ns     67130168

Clang 13 + libc++ (C++20)

Benchmark old time new time
FMTFormatter_z 110 ns 30.6 ns
FMTFormatterCompile_z 90.0 ns 10.4 ns
FMTFormatter_Y 110 ns 30.5 ns
FMTFormatterCompile_Y 88.4 ns 12.7 ns
Raw output
CC='clang' CXX='clang++' CXXFLAGS='-stdlib=libc++' cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=20 ..
$ ./fmt_test_old
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
FMTFormatter_z               110 ns          110 ns      6403692
FMTFormatterCompile_z       90.0 ns         90.0 ns      7797814
FMTFormatter_Y               110 ns          110 ns      6353394
FMTFormatterCompile_Y       88.4 ns         88.4 ns      7909662
$ ./fmt_test_new
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
FMTFormatter_z              30.6 ns         30.6 ns     23115238
FMTFormatterCompile_z       10.4 ns         10.4 ns     67346905
FMTFormatter_Y              30.5 ns         30.4 ns     22898049
FMTFormatterCompile_Y       12.7 ns         12.7 ns     55214092

@phprus
Copy link
Contributor Author

phprus commented Nov 17, 2021

@alexezeder, I mentioned you since you wrote a comment about performance regression (1031eed#commitcomment-60128058)

This PR improves performance in those cases (C-locale and timezone offset).

void on_utc_offset() { format_localized('z'); }
void on_utc_offset() {
#if defined(_WIN32)
_tzset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit system-specific logic here and suggest just detecting if tm_gmtoff is available and otherwise go through format_localized as before. The presence of tm_gmtoff can be detected in a portable way using SFINAE.

Comment on lines 1134 to 1136
if (!is_classic_ && ns != numeric_system::standard)
return format_localized('Y', 'E');
write_year(tm_year());
Copy link
Contributor

@vitaut vitaut Nov 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's swap the branches to get rid of negation:

    if (is_classic_ || ns == numeric_system::standard)
      return write_year(tm_year());
    format_localized('Y', 'E');

and the same throughout the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm_gmtoff and SFINAE - Done.
Solaris specific code is removed.
Windows specific code I suggest to include in fmt, because Windows is more popular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong.
Inside ucrt, the internal function __tzset () is used, which is called only once. When using _tzset (), performance is 10 times slower.

I've added a workaround, but I'm not sure if I need to switch to custom code.

Perf with workaround:

2021-11-22T00:37:43+05:00
Running fmt_test_old
Run on (2 X 2494 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 4096 KiB (x2)
  L3 Unified 16384 KiB (x1)
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
FMTFormatter_z                1384 ns         1360 ns       448000
FMTFormatterCompile_z         1321 ns         1318 ns       497778
FMTFormatter_z_strftime        312 ns          314 ns      2240000
FMTFormatter_z_time_put       1217 ns         1221 ns       640000
FMTFormatter_Y                1385 ns         1381 ns       497778
FMTFormatterCompile_Y         1300 ns         1311 ns       560000
FMTFormatter_Y_strftime        301 ns          300 ns      2240000
FMTFormatter_Y_time_put       1214 ns         1221 ns       640000
2021-11-22T00:37:50+05:00
Running fmt_test_new
Run on (2 X 2494 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 4096 KiB (x2)
  L3 Unified 16384 KiB (x1)
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
FMTFormatter_z                 123 ns          123 ns      5600000
FMTFormatterCompile_z         71.1 ns         69.8 ns      8960000
FMTFormatter_z_strftime        312 ns          314 ns      2240000
FMTFormatter_z_time_put       1214 ns         1221 ns       640000
FMTFormatter_Y                 121 ns          120 ns      5600000
FMTFormatterCompile_Y         67.9 ns         68.4 ns     11200000
FMTFormatter_Y_strftime        303 ns          305 ns      2357895
FMTFormatter_Y_time_put       1203 ns         1200 ns       560000

@@ -49,7 +49,14 @@ std::string system_strftime(const std::string& format, const std::tm* timeptr,
os.imbue(loc);
facet.put(os, os, ' ', timeptr, format.c_str(),
format.c_str() + format.size());
#ifdef _WIN32
// Workaround to early ucrt bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early version.

I have not tested all versions. But in the latest version of the Windows SDK I have installed with MSVS, '%z' returns '+0000'. In the old version: '-0000'.

If remove platform-dependent code, this change will not be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are keeping this, let's tweak it a bit for clarity:

// Workaround a bug in older versions of ucrt.

BTW is "ucrt" the correct term here? Is it Microsoft C runtime library? Maybe calling it Microsoft CRT here would be clearer?

@vitaut
Copy link
Contributor

vitaut commented Nov 21, 2021

@phprus, thanks for the benchmark. Could you add strftime there for comparison?

@phprus
Copy link
Contributor Author

phprus commented Nov 21, 2021

@vitaut, I will do a performance test (on Windows and with raw strftime, std::time_put) in the coming days.

@phprus
Copy link
Contributor Author

phprus commented Nov 21, 2021

include/fmt/chrono.h Outdated Show resolved Hide resolved
include/fmt/chrono.h Outdated Show resolved Hide resolved
@@ -49,7 +49,14 @@ std::string system_strftime(const std::string& format, const std::tm* timeptr,
os.imbue(loc);
facet.put(os, os, ' ', timeptr, format.c_str(),
format.c_str() + format.size());
#ifdef _WIN32
// Workaround to early ucrt bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are keeping this, let's tweak it a bit for clarity:

// Workaround a bug in older versions of ucrt.

BTW is "ucrt" the correct term here? Is it Microsoft C runtime library? Maybe calling it Microsoft CRT here would be clearer?

@vitaut
Copy link
Contributor

vitaut commented Nov 24, 2021

benchmark updated

Awesome, thanks!

@phprus
Copy link
Contributor Author

phprus commented Nov 25, 2021

Sorry for delay.

UCRT is are official name for Microsoft C Runtime start from Visual Studio 2015 (https://docs.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-170)

I replaced the abbreviation UCRT to another version of the name: Universal CRT.

@vitaut vitaut merged commit 5abe9e8 into fmtlib:master Nov 25, 2021
@vitaut
Copy link
Contributor

vitaut commented Nov 25, 2021

Merged, thank you for another great PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants