Replies: 2 comments 2 replies
-
I wonder how memorization will cope with DST clock shifts? |
Beta Was this translation helpful? Give feedback.
1 reply
-
In section Benchmark results test results contain a warning:
Is debug mode was enabled intentionally? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Introduction
As part of Lua/SQL types consistency task we need to approach datetime implementation in Tarantool SQL engine. But before doing so (remember we want to have Lua/SQL consistent?) we have to establish date/time/timestamp types support to Lua, and be able to store those types in Tarantool box.
It might be easy to pick Unicode ICU date/time parsing implementation, (and ICU is already integrated into Tarantool core for collation support), but problem is - there are reasonable doubts that ICU is fast enough.
We needed to find best date/time parsing code which we could use from Lua side.
For obvious performance reasons none of popular pure Lua Date/Time implementations could serve us well enough. Neither Penlight pl.date (which is deprecated at the moment, BTW), nor Thijs Schreijer "Tieske"' LuaDate could provide us adequate
performance levels we would like to have in builtin Tarantool module.
We need to select among C/C++ implementations which we could use as a basis for FFI-based module.
Possible candidates
In the benchmarks repository bench-timestamp we have used following C/C++ date/time parsers for our experiments:
c-dt by Christian Hansen' chansen/c-dt which, is largely precursor to his excellent Perl5
p5-time-moment module, which @Mons recommended us to look into (thanks, @Mons!).
We had to patch c-dt slightly to properly integrate it whole cmake build process (making sure that we are using the same
set of compilers, with the same set of optimizations settings selected);
Google Civil Time (cctz) C++ implementation;
industry-standard Unicode organization unicode-org/icu C++ implementation;
and, as a bonus (and simply "because we can") there is simple
re2c
-based reimplementation ofc-dt
datetime parser, which shows the pure beauty of deterministic finite automata for parsing regular grammars :)
(see timings details below)
Googletest and Google benchmark
We use Google Test and Google Benchmark frameworks as a drivers for running unit-tests and benchmarks for us.
NB! We do not yet properly (seamlessly) integrated them to the build process, and repository is not yet self-contained, thus one
should install
googletest
andlibbenchmark-dev
as prerequisites from elsewhere.Examples: parsing of ISO-8601 date/time format
We have built unit-tests and benchmarks which use the same set of predefined date/time literals, represented in ISO-8601
format. Below we show the code which we need to use for each implementation,
c-dt
c-dt is flexible while dealing with multiple date/time formats it accepts. You do not worry which format to select for parsing date, the choice is done automagically.
The trick is - there is no aggregate function for parsing date + time + timezone represented in the same literal, but there are separate
dt_parse_iso_date()
,dt_parse_iso_time_*
anddt_parse_iso_zone_*
which parse corresponding parts of input text, and which may be composed to the full timestamp parser (seeparse_datetime_extended()
in the bench-cdt.cpp)Essentially parsing of single literal will look like following:
CCTZ
Both Google CCTZ and ICU do not have any automagical way to parse any format of timestamps - you have to select format string before applying it at the runtime.
i.e. parsing of same literal "2015-02-18T10:50:31.521345123+10:00" in CCTZ will look like:
Advantage here (at least, comparing to c-dt above) is that CCTZ does have timezone database, and you could use symbolic form of timezone offset in timestamp, i.e. instead of "UTC+03:00" your could simply say "Europe/Moscow".
ICU
Unicode' organization ICU is the most powerful, but sometimes is (expectedly) the slowest one among all mentioned here.
We still need to know what kind of format we about to parse, so it's not as much convenient as c-dt, and is similar to CCTZ.
One caveat though - you better to precompile format beforehand, otherwise full cycle of format peparation, and then parsing data would be too slow.
Benchmark results
We ran default release mode executable, compiled by
gcc 8.3
compilers, which given following numbers.(Be warned that your mileage may vary significantly)...
Please compare
*_Parse1
numbers (parsing of a single literal):CDT_Parse1
uses Christian Hansen c-dt;CCTZ_Parse1
uses Google CCTZ;ICU_Parse1
' uses Unicode ICU.ICU_Parse1_Inv
is a version modified with removed all invariants out of loop;RE_Parse1
is a bonus version, where c-dt implementation has been replaced with simple DFA generated by RE2C.Speed comparison ratio
So, essentially, we have this table with comparison of average times for each implementation
As we see, RE2C is the fastest, c-dt is good enough, while CCTZ and ICU are 20x - 100x times slower.
Tarantool plan of actions
Due to it's maturity and completeness;
Beta Was this translation helpful? Give feedback.
All reactions