-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize tm formatting #2602
Optimize tm formatting #2602
Conversation
Thanks for the PR. Could you provide any benchmark results showing the effects of this change? |
include/fmt/chrono.h
Outdated
void on_utc_offset() { format_localized('z'); } | ||
void on_utc_offset() { | ||
#if defined(_WIN32) | ||
_tzset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't call _tzset or tzset because it changes global state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tzset
/ tzset
is used internally in std::time_put
.
Without calling them, it is impossible to get the timezone correctly (if the timezone is not stored in struct std::tm
).
Windows: https://docs.microsoft.com/en-us/cpp/c-runtime-library/daylight-dstbias-timezone-and-tzname?view=msvc-140 (Remarks section)
Solaris: https://docs.oracle.com/cd/E86824_01/html/E54766/ctime-3c.html
On Solaris and Windows tzset is threadsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vitaut, What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit system-specific logic here and suggest just detecting if tm_gmtoff
is available and otherwise go through format_localized
as before. The presence of tm_gmtoff
can be detected in a portable way using SFINAE.
Benchmark: https://github.com/phprus/fmt-bench/tree/optimize-tm-formatting-5 *BSD, Apple, glibc: Acceleration 10-20 times ( |
cc @alexezeder |
Other formatters ( |
I'm not 100% sure why I was mentioned here, but here are the results on my machine. 😏 Currently, there is no option to run benchmarks for PRs in fmt_bnchmrk_gnrtr (I doubt it would be implemented someday). But the only one runner there uses the same version of GCC - 11. GCC 11 + libstdc++ (C++20)
Raw output
Clang 13 + libc++ (C++20)
Raw output
|
@alexezeder, I mentioned you since you wrote a comment about performance regression (1031eed#commitcomment-60128058) This PR improves performance in those cases (C-locale and timezone offset). |
include/fmt/chrono.h
Outdated
void on_utc_offset() { format_localized('z'); } | ||
void on_utc_offset() { | ||
#if defined(_WIN32) | ||
_tzset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit system-specific logic here and suggest just detecting if tm_gmtoff
is available and otherwise go through format_localized
as before. The presence of tm_gmtoff
can be detected in a portable way using SFINAE.
include/fmt/chrono.h
Outdated
if (!is_classic_ && ns != numeric_system::standard) | ||
return format_localized('Y', 'E'); | ||
write_year(tm_year()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's swap the branches to get rid of negation:
if (is_classic_ || ns == numeric_system::standard)
return write_year(tm_year());
format_localized('Y', 'E');
and the same throughout the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tm_gmtoff
and SFINAE - Done.
Solaris specific code is removed.
Windows specific code I suggest to include in fmt, because Windows is more popular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong.
Inside ucrt
, the internal function __tzset () is used, which is called only once. When using _tzset (), performance is 10 times slower.
I've added a workaround, but I'm not sure if I need to switch to custom code.
Perf with workaround:
2021-11-22T00:37:43+05:00
Running fmt_test_old
Run on (2 X 2494 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 4096 KiB (x2)
L3 Unified 16384 KiB (x1)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
FMTFormatter_z 1384 ns 1360 ns 448000
FMTFormatterCompile_z 1321 ns 1318 ns 497778
FMTFormatter_z_strftime 312 ns 314 ns 2240000
FMTFormatter_z_time_put 1217 ns 1221 ns 640000
FMTFormatter_Y 1385 ns 1381 ns 497778
FMTFormatterCompile_Y 1300 ns 1311 ns 560000
FMTFormatter_Y_strftime 301 ns 300 ns 2240000
FMTFormatter_Y_time_put 1214 ns 1221 ns 640000
2021-11-22T00:37:50+05:00
Running fmt_test_new
Run on (2 X 2494 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 4096 KiB (x2)
L3 Unified 16384 KiB (x1)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
FMTFormatter_z 123 ns 123 ns 5600000
FMTFormatterCompile_z 71.1 ns 69.8 ns 8960000
FMTFormatter_z_strftime 312 ns 314 ns 2240000
FMTFormatter_z_time_put 1214 ns 1221 ns 640000
FMTFormatter_Y 121 ns 120 ns 5600000
FMTFormatterCompile_Y 67.9 ns 68.4 ns 11200000
FMTFormatter_Y_strftime 303 ns 305 ns 2357895
FMTFormatter_Y_time_put 1203 ns 1200 ns 560000
test/chrono-test.cc
Outdated
@@ -49,7 +49,14 @@ std::string system_strftime(const std::string& format, const std::tm* timeptr, | |||
os.imbue(loc); | |||
facet.put(os, os, ' ', timeptr, format.c_str(), | |||
format.c_str() + format.size()); | |||
#ifdef _WIN32 | |||
// Workaround to early ucrt bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early version.
I have not tested all versions. But in the latest version of the Windows SDK I have installed with MSVS, '%z' returns '+0000'. In the old version: '-0000'.
If remove platform-dependent code, this change will not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are keeping this, let's tweak it a bit for clarity:
// Workaround a bug in older versions of ucrt.
BTW is "ucrt" the correct term here? Is it Microsoft C runtime library? Maybe calling it Microsoft CRT here would be clearer?
@phprus, thanks for the benchmark. Could you add |
76da43b
to
2597fc2
Compare
@vitaut, I will do a performance test (on Windows and with raw |
2597fc2
to
0b7c3db
Compare
@vitaut, benchmark updated: https://github.com/phprus/fmt-bench/tree/optimize-tm-formatting-5 |
test/chrono-test.cc
Outdated
@@ -49,7 +49,14 @@ std::string system_strftime(const std::string& format, const std::tm* timeptr, | |||
os.imbue(loc); | |||
facet.put(os, os, ' ', timeptr, format.c_str(), | |||
format.c_str() + format.size()); | |||
#ifdef _WIN32 | |||
// Workaround to early ucrt bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are keeping this, let's tweak it a bit for clarity:
// Workaround a bug in older versions of ucrt.
BTW is "ucrt" the correct term here? Is it Microsoft C runtime library? Maybe calling it Microsoft CRT here would be clearer?
Awesome, thanks! |
Sorry for delay. UCRT is are official name for Microsoft C Runtime start from Visual Studio 2015 (https://docs.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-170) I replaced the abbreviation UCRT to another version of the name: Universal CRT. |
0b7c3db
to
d7b90d7
Compare
Merged, thank you for another great PR! |
Optimize tm formatting for:
%z
format (timezone offset from UTC).