Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely inefficient instantiating a distant local date #12

Open
autarch opened this issue Nov 12, 2016 · 8 comments
Open

Extremely inefficient instantiating a distant local date #12

autarch opened this issue Nov 12, 2016 · 8 comments

Comments

@autarch
Copy link
Member

autarch commented Nov 12, 2016

Migrated from rt.cpan.org #47671 (status was 'open')

Requestors:

From mschwern@cpan.org (@schwern) on 2009-07-08 02:01:30:

$ time perl -wle 'use DateTime; print DateTime->new( year => 3058,
time_zone => "local" )'
3058-01-01T00:00:00

real 0m9.558s
user 0m8.619s
sys 0m0.165s

This occurs whether time_zone is "local" or set to a zone like
"America/Chicago".

Profiling reveals that this makes tens of thousands of method calls.
The big hog is 10000 calls to DateTime::set_time_zone which is being
called by add_duration() which all traces back to
DateTime::TimeZone::_generate_spans_until_match looping over OlsenDB
calls 2000+ times.

This all traces back to DateTime->offset called by
DateTime::_handle_offset_modifier and DateTime::_calc_local_rd

Its possible one could hold off doing whatever it is that calc_local_rd
does, considering that information is not necessary just to instantiate
an object, but the root problem is _generate_spans_until_match's
algorithm. I'm not sure what its purpose is, but it appears to be
iterative by year resulting in Y iterations where Y is the year being
search for (minus some arbitrary lower limit).

If the whole purpose is to determine the time zone offset, this
algorithm is much more efficiently done as a binary search. Even if we
assume a 64 bit year it will take at most 64 iterations to zero in on
the right year. And so on for the month, day, hour, minute and second.

If its just searching the Olsen DB for applicable zone changes I don't
pretend to understand how that works but there's got to be a better way
to do it. C libraries can do it in milliseconds and Perl is just not
that slow. Perhaps with some sort of... database?

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From autarch@urth.org (@autarch) on 2009-07-08 02:08:38:

On Tue, 7 Jul 2009, Michael G Schwern via RT wrote:

Its possible one could hold off doing whatever it is that calc_local_rd
does, considering that information is not necessary just to instantiate
an object, but the root problem is _generate_spans_until_match's
algorithm. I'm not sure what its purpose is, but it appears to be
iterative by year resulting in Y iterations where Y is the year being
search for (minus some arbitrary lower limit).

It's generating all the time zone changes from X years out (by default,
each time zone ships with 10 years of pre-calculated data. The rest is
generated as needed.

Note that this only affects zones with DST rules in effect.

If the whole purpose is to determine the time zone offset, this
algorithm is much more efficiently done as a binary search. Even if we
assume a 64 bit year it will take at most 64 iterations to zero in on
the right year. And so on for the month, day, hour, minute and second.

The problem being that there's nothing to search until it's generated.

While I'm sure this could be made faster, I'm not sure it's worth too much
effort. Project Olson time zones that far in the future is pretty silly.
Time zones are political entities, and it's almost certain that rules in
effect now won't be in effect in the year 3058.

Just use UTC.

-dave

/============================================================
http://VegGuide.org http://blog.urth.org
Your guide to all that's veg House Absolute(ly Pointless)
============================================================
/

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From mschwern@cpan.org (@schwern) on 2009-07-08 03:02:01:

On Tue Jul 07 22:08:38 2009, autarch@urth.org wrote:

The problem being that there's nothing to search until it's generated.

Are you saying it regenerates the database at run time every time? How
about doing it once when DateTime::TimeZone is installed?

While I'm sure this could be made faster, I'm not sure it's worth too
much
effort. Project Olson time zones that far in the future is pretty silly.
Time zones are political entities, and it's almost certain that rules in
effect now won't be in effect in the year 3058.

Just use UTC.

What the hell kind of answer is that?! I'll take "yeah, that sucks but
I don't have the tuits to optimize it. Here's some thoughts on how to
do it..." but not "use UTC". Come on. While time zones are annoying,
you can't just blow them off! Especially as the maintainer of the most
heavily used date module in Perl!

While I agree worrying about the niggling details of historical time
zone changes 1000 years from now is silly, THE SUN WILL STILL BE IN THE
SKY and that's what a time zone offset basically reflects. And while
optimizing the code might be hard we're not talking milliseconds here.
That's 10 seconds to go forward a mere 1000 years. Even calculating
2109 takes a full second. 2150 is 1.3 seconds. That's unacceptable.

Another simple optimization would be to cache what the latest Olsen
entry is and then just not bother looking beyond that.

Somewhat apropos, for dates far in the past, before time zones, I've
seen 64 bit Linuxen use the lat/long for the city in question which is
pretty clever.

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From autarch@urth.org (@autarch) on 2009-07-08 03:37:39:

On Tue, 7 Jul 2009, Michael G Schwern via RT wrote:

  Queue: DateTime

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=47671 >

On Tue Jul 07 22:08:38 2009, autarch@urth.org wrote:

The problem being that there's nothing to search until it's generated.

Are you saying it regenerates the database at run time every time? How
about doing it once when DateTime::TimeZone is installed?

Storing the changes in memory uses up a fair bit of memory.

Generating 1000+ years worth would make loading each time zone take much
much more memory. It does it at runtime to optimize for the common case of
not needing to look too far in the future.

Just use UTC.

What the hell kind of answer is that?! I'll take "yeah, that sucks but
I don't have the tuits to optimize it. Here's some thoughts on how to
do it..." but not "use UTC". Come on. While time zones are annoying,
you can't just blow them off! Especially as the maintainer of the most
heavily used date module in Perl!

It's the right answer. Using Olson time zones with far future dates just
doesn't make any sense. These time zones exist for the purpose of dealing
with the dictates of modern political entities.

While I agree worrying about the niggling details of historical time
zone changes 1000 years from now is silly, THE SUN WILL STILL BE IN THE
SKY and that's what a time zone offset basically reflects. And while

Yes, that's what the offset reflects. However, the DST changes reflect
the random whims of idiot politicians.

If you just want an offset, use a static offset like '-0500'. That will be
fast, since there's no DST rules to deal with.

Another simple optimization would be to cache what the latest Olsen
entry is and then just not bother looking beyond that.

That doesn't make any sense. The last Olson entry is purely arbitrary
based on how many years of pre-calculation the module ships with, and when
I happen to generate it.

Or do you mean the offset without DST? Then someone else will complain
that it's broken at some arbitrary date (like 11 years out).

For people who need to calculate far future dates, there are other
solutions (UTC or some other fixed offset) that work. But either way, you
as the user need to make a decision about what you want. I can't make that
decision for you, I just provide the options.

If you want to take a stab at optimizing the changes generation, that's
fine with me, but personally I don't think it's wortwhile, since there's
no sane use case for Olson time zones in the far future.

Somewhat apropos, for dates far in the past, before time zones, I've
seen 64 bit Linuxen use the lat/long for the city in question which is
pretty clever.

A DT::TZ::LatLong would be very cool.

Actually, the Olson database approximates this, since pre-modern times it
uses local mean time, which is an offset based off the lat/long of the
named city.

-dave

/============================================================
http://VegGuide.org http://blog.urth.org
Your guide to all that's veg House Absolute(ly Pointless)
============================================================
/

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From schwern@pobox.com on 2009-07-08 09:57:18:

autarch@urth.org via RT wrote:

On Tue Jul 07 22:08:38 2009, autarch@urth.org wrote:

The problem being that there's nothing to search until it's generated.
Are you saying it regenerates the database at run time every time? How
about doing it once when DateTime::TimeZone is installed?

Storing the changes in memory uses up a fair bit of memory.

Generating 1000+ years worth would make loading each time zone take much
much more memory. It does it at runtime to optimize for the common case of
not needing to look too far in the future.

Generating 1000+ years of what? The Olsen database doesn't go forward that
far. And who says it has to all be in memory? There's any number of
techniques to leave things on disk.

The C code figured this out 20 years ago!

Just use UTC.
What the hell kind of answer is that?! I'll take "yeah, that sucks but
I don't have the tuits to optimize it. Here's some thoughts on how to
do it..." but not "use UTC". Come on. While time zones are annoying,
you can't just blow them off! Especially as the maintainer of the most
heavily used date module in Perl!

It's the right answer. Using Olson time zones with far future dates just
doesn't make any sense. These time zones exist for the purpose of dealing
with the dictates of modern political entities.

While I agree worrying about the niggling details of historical time
zone changes 1000 years from now is silly, THE SUN WILL STILL BE IN THE
SKY and that's what a time zone offset basically reflects. And while

Yes, that's what the offset reflects. However, the DST changes reflect
the random whims of idiot politicians.

You're absolutely correct, it sucks and its broken. However, you maintain a
date/time library which deals with time zones. You have to deal with the
reality. Full stop.

If you just want an offset, use a static offset like '-0500'. That will be
fast, since there's no DST rules to deal with.

I, in fact, don't want an offset at all. I just want a DateTime object
representing the date I gave it. Why is it calculating the offset just to
instantiate? Can't it wait until I ask for it?

Another simple optimization would be to cache what the latest Olsen
entry is and then just not bother looking beyond that.

That doesn't make any sense. The last Olson entry is purely arbitrary
based on how many years of pre-calculation the module ships with, and when
I happen to generate it.

Or do you mean the offset without DST? Then someone else will complain
that it's broken at some arbitrary date (like 11 years out).

Ok, maybe I'm missing something here. If I hand you something like "July 9th,
2009 America/Chicago" and ask what the offset is taking into account DST you
look at the last DST rule before that date and apply it. Let's say the last
rule change was 2007 (which I think it was) so you look at that and apply it
and that's that, right?

Why would the algorithm be any different for 3009? What's all this extra
calculation you're talking about? Are you generating rules for every year up
to 3009 or something? Wouldn't that just be repeating the same calculation
over and over?

For people who need to calculate far future dates, there are other
solutions (UTC or some other fixed offset) that work. But either way, you
as the user need to make a decision about what you want. I can't make that
decision for you, I just provide the options.

If you want to take a stab at optimizing the changes generation, that's
fine with me, but personally I don't think it's wortwhile, since there's
no sane use case for Olson time zones in the far future.

Look, all I know is every other datetime library that goes past 2038 doesn't
take 10 seconds to make this calculation. Something is wrong here.

Somewhat apropos, for dates far in the past, before time zones, I've
seen 64 bit Linuxen use the lat/long for the city in question which is
pretty clever.

A DT::TZ::LatLong would be very cool.

Actually, the Olson database approximates this, since pre-modern times it
uses local mean time, which is an offset based off the lat/long of the
named city.

That sounds like it.

The past has a vote, but not a veto.
-- Mordecai M. Kaplan

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From mschwern@cpan.org (@schwern) on 2009-07-08 19:57:25:

I've dug into the DateTime::TimeZone code and now I understand why we're
talking past each other. I'm talking in terms of Olsen rules database
(after this year this time zone changes to DST at 2am on the 1st Sunday)
and you're talking in terms of these "span" things which cache the
calculation, one for every DST change. Now I see why you'd think it has
to be expensive and why you'd think it was untenable to store.

Since they're linear, and since the only way to search through them is
binary, I can see why TimeZone is locked into having to generate them
all and why it can't just leave them on disk. What are they indexed by?
Julian days? What's "rd" as in "ymd2rd"?

I understand the approach, but there's got to be a better way to do it
that doesn't involve calculating each DST change up to the current year.
Cached perhaps. I also think its possible to make use of the fact that
a DateTime object already knows what year it is to perhaps just cache
this all in a DBM file.

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From autarch@urth.org (@autarch) on 2009-07-08 21:21:33:

On Wed, 8 Jul 2009, Michael G Schwern via RT wrote:

  Queue: DateTime

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=47671 >

I've dug into the DateTime::TimeZone code and now I understand why we're
talking past each other. I'm talking in terms of Olsen rules database
(after this year this time zone changes to DST at 2am on the 1st Sunday)
and you're talking in terms of these "span" things which cache the
calculation, one for every DST change. Now I see why you'd think it has
to be expensive and why you'd think it was untenable to store.

Since they're linear, and since the only way to search through them is
binary, I can see why TimeZone is locked into having to generate them
all and why it can't just leave them on disk. What are they indexed by?
Julian days? What's "rd" as in "ymd2rd"?

They're indexed by UTC days.

I understand the approach, but there's got to be a better way to do it
that doesn't involve calculating each DST change up to the current year.
Cached perhaps. I also think its possible to make use of the fact that
a DateTime object already knows what year it is to perhaps just cache
this all in a DBM file.

Probably the quickest, most efficient way would be to have the time zone
generator spit out some code that encapsulates the decisions that need to
be made, one when given UTC values, another for local. This would require
some changes to DateTime as well, since right now it passes in just the
rata die day and seconds.

-dave

/============================================================
http://VegGuide.org http://blog.urth.org
Your guide to all that's veg House Absolute(ly Pointless)
============================================================
/

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From schwern@pobox.com on 2009-07-08 23:43:42:

autarch@urth.org via RT wrote:

<URL: https://rt.cpan.org/Ticket/Display.html?id=47671 >

On Wed, 8 Jul 2009, Michael G Schwern via RT wrote:

  Queue: DateTime

Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=47671 >

I've dug into the DateTime::TimeZone code and now I understand why we're
talking past each other. I'm talking in terms of Olsen rules database
(after this year this time zone changes to DST at 2am on the 1st Sunday)
and you're talking in terms of these "span" things which cache the
calculation, one for every DST change. Now I see why you'd think it has
to be expensive and why you'd think it was untenable to store.

Since they're linear, and since the only way to search through them is
binary, I can see why TimeZone is locked into having to generate them
all and why it can't just leave them on disk. What are they indexed by?
Julian days? What's "rd" as in "ymd2rd"?

They're indexed by UTC days.

Presumably this is the number of UTC days since some epoch? What's the epoch?

I understand the approach, but there's got to be a better way to do it
that doesn't involve calculating each DST change up to the current year.
Cached perhaps. I also think its possible to make use of the fact that
a DateTime object already knows what year it is to perhaps just cache
this all in a DBM file.

Probably the quickest, most efficient way would be to have the time zone
generator spit out some code that encapsulates the decisions that need to
be made, one when given UTC values, another for local. This would require
some changes to DateTime as well, since right now it passes in just the
rata die day and seconds.

Ah HA! That's the mysterious "rd".

Alligator sandwich, and make it snappy!

@autarch
Copy link
Member Author

autarch commented Nov 12, 2016

From autarch@urth.org (@autarch) on 2009-07-09 01:32:56:

Can we take this conversation to the datetime@perl.org list?

/============================================================
http://VegGuide.org http://blog.urth.org
Your guide to all that's veg House Absolute(ly Pointless)
============================================================
/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant