Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeZone memory usage is doubled with nanosecond precision #1699

Closed
sffc opened this issue Aug 6, 2021 · 11 comments
Closed

TimeZone memory usage is doubled with nanosecond precision #1699

sffc opened this issue Aug 6, 2021 · 11 comments

Comments

@sffc
Copy link
Collaborator

sffc commented Aug 6, 2021

In all practical uses, the time zone offset is sufficient storing minutes or seconds. However, we currently allow a TimeZone to store an offset down to nanoseconds. According to @FrankYFTang, this doubles the memory usage of a TimeZone instance.

@justingrant
Copy link
Collaborator

What's the desired outcome if TimeZone precision were reduced? Fewer bytes occupied on disk or in memory for TZDB data? Smaller in-memory size of TimeZone instances? Less engineering complexity of optimization? Less runtime perf to unpack history data? Something else?

If it's TZDB size, then that's controlled by TZDB's resolution not Temporal's. Also, regardless of the precision of Temporal's public API or TZDB's most-granular transition, I'd assume that we'd want to optimize the storage of TZDB history data to leverage the fact that the vast majority of TZDB offset changes happen on UTC-hour boundaries (~20 bits for 100+ years of UTC hours) and have hour-aligned offsets (7 bits).

If the issue is TimeZone instances in memory, I'm not sure I understand the issue. Won't each TimeZone instance just contain a pointer to (or a cached copy of) the relevant TZDB data whose (per above) size is already unrelated to Temporal's public API granularity? Even if each instance unpacks the optimized history data for performance reasons, per above the precision is determined by TZDB not Temporal.

I'm not opposed to considering reducing granularity of the public TimeZone API if it would yield a significant benefit, but before making a normative change like that I'd want to better understand the problem.

@sffc
Copy link
Collaborator Author

sffc commented Aug 6, 2021

My understanding is that it's about TimeZone instances in memory.

IANA time zones probably have a pointer into data as you described, but offset time zones need all the bits.

@FrankYFTang
Copy link
Contributor

millisecond precision means we need log2 (2 * 86400 * 1000) = 28 bits to repersent the offset - which fit into 4 bytes and we can use 4 bytes per object.
nanosecond precision means we need log2 (2 * 86400 * 10^9) = 48 bits to repersent the offset - which would take into 6-8 bytes per object.
The point is- this is a pointless waste. No practical timezone in the world would need nanosecond precision for the offset and the spec force the implementaiton to store wasteful info into memory. The current spec allow someone to craft a timezone as
let tz= new Temporal.TimeZone("+11:59:57.987654321") and the implementation have to store all of the info in the memory with no practical real world usage.

@justingrant
Copy link
Collaborator

justingrant commented Aug 6, 2021

AFAIK, offset time zones are relatively rarely used in real-world applications, so I'm not sure optimizing them is a high priority.

EDIT: And if it turns out that they do get a lot of use, then I could easily imagine an optimization which only allocates 7 bits for all modern offsets (which are always on 15-minute boundaries), and 6-8 bytes for more-granular offsets, with the 8th bit used to differentiate the two cases.

TL;DR - This seems like a low-priority use case that can nontheless be optimized if needed. Not sure this meets the very high bar for post-Stage-3 changes.

@sffc
Copy link
Collaborator Author

sffc commented Aug 7, 2021

I could easily imagine an optimization which only allocates 7 bits for all modern offsets (which are always on 15-minute boundaries), and 6-8 bytes for more-granular offsets, with the 8th bit used to differentiate the two cases.

I think Frank is trying to minimize the stack size of these objects. The largest case always dominates, even if it is not often used.

@justingrant
Copy link
Collaborator

I think Frank is trying to minimize the stack size of these objects. The largest case always dominates, even if it is not often used.

Ahh, so all TimeZone instances are fixed-length inside V8? We can't use variable-length data structures? If so, then I (finally!) understand the problem. Thanks for patiently explaining to me. I agree that reducing the precision of offsets could be helpful to reduce the memory footprint of ZonedDateTime and TimeZone instances. Is this a correct understanding of the problem?

One more question: custom userland timezones presumably require a pointer to the userland-defined TimeZoneProtocol object. Are those pointers smaller than the 6-8 bytes required for a nanosecond-precision offset?

@ptomato
Copy link
Collaborator

ptomato commented Sep 2, 2021

Meeting, Sept. 2: We see enough use cases or potential use cases to keep this, and don't see a problem with spending a few more bits in these probably-rare objects.

@ptomato ptomato closed this as completed Sep 2, 2021
@FrankYFTang
Copy link
Contributor

FrankYFTang commented Sep 4, 2021

Just be aware during the meeting one argument is the timezone offset in the String with sub second precision would be expected it to work, but the current spec text round every fractional second inside TimeZoneUTCOffsetFraction to 0 because TimeZoneUTCOffsetFraction contains the first char as "." (the DecimalSeparator) but
13.44 ParseTemporalTimeZoneString ( isoString )
would take TimeZoneUTCOffsetFraction as variable fraction and then append "000000000", take the first 9 characters (which will be one "." and 8 digits) and conver it by calling ToIntegerOrInfinity which will always turn it to 0 because of the first "." char inside TimeZoneUTCOffsetFraction. This is probably not intentional but the spec text AS IS will have the timezone offset part only up to "second" precision, not any subsecond precision while the value came from a string. See #1795 for details

@FrankYFTang
Copy link
Contributor

I post the previous note NOT to want to rediscuss it, but just want to point out the parsing issue in the current spec text. I think it is not intentional from the surrounding text but it would need a normative PR to change it to make the sub-second precision from string work inside timezone offset.

@HKalbasi
Copy link

It also makes computations slower and complex. If there were no sub second timezone (at least not builtin ones), implementation could store seconds and nanos separated, and use only second part for most of the calculations. But now it needs division for calculating seconds from nanoseconds in most operations, or handling time zones without sub second part as a special case fast path. I don't mean that it should be changed at this point.

We see enough use cases or potential use cases to keep this

Are they documented somewhere? Again, I'm not requesting a change, I'm just interested in rationale behind sub second time zones.

@ptomato
Copy link
Collaborator

ptomato commented Feb 18, 2022

I don't think there's anything preventing implementations from storing seconds and nanoseconds as separate fields if they want to. It doesn't matter that they are stored that way in the specification, as long as the observable results are the same.

You can see the whole discussion from that meeting on Sept. 2 here: https://github.com/tc39/proposal-temporal/blob/main/meetings/agenda-minutes-2021-09-02.md#storing-nanoseconds-as-time-zone-offsets-1699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants