Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessions to tz-naive? Schedule times to UTC? #42

Closed
gerrymanoim opened this issue Jul 7, 2021 · 9 comments
Closed

Sessions to tz-naive? Schedule times to UTC? #42

gerrymanoim opened this issue Jul 7, 2021 · 9 comments

Comments

@gerrymanoim
Copy link
Owner

Another issue following this latest pandas release is that the ExchangeCalendar constructor is now raising a FutureWarning.

The cause is within the creation of the schedule DataFrame, due to conversion of DatetimeIndex values with dtype 'datetime64[ns, UTC]' (self._opens etc.) to 'datetime64[ns]'.

        self.schedule = DataFrame(
            index=_all_days,
            data=OrderedDict(
                [
                    ("market_open", self._opens),
                    ("break_start", self._break_starts),
                    ("break_end", self._break_ends),
                    ("market_close", self._closes),
                ]
            ),
            dtype="datetime64[ns]",
        )

One option would be to change the target dtype to "datetime64[ns, UTC]" thereby defining the schedule columns in terms of UTC. This additional context would be meaningful although I suspect it would open a can of worms with respect to changes it would necessitate within exchange_calendars and its clients. For now I'll add a fix to PR #41 that retains the existing behaviour.

Originally posted by @maread99 in #40 (comment)

@maread99
Copy link
Collaborator

maread99 commented Jul 7, 2021

I included a fix to the merged #41 that addressed this FutureWarning. I believe it closes this issue.

@gerrymanoim
Copy link
Owner Author

gerrymanoim commented Jul 8, 2021

I think what you did closes the immediate problem, but I'd like to think more about defining the target dtype to be datetime64[ns, UTC].

Morally everything here should be in UTC I think? Especially internally.

@gerrymanoim gerrymanoim changed the title FutureWarning in ExchangeCalendar constructor dateimte[ns, UTC] in ExchangeCalendar constructor Jul 8, 2021
@maread99
Copy link
Collaborator

maread99 commented Jul 8, 2021

I did wonder if you opened this for the UTC question.

I've been half expecting to come across something that would give me a mini-eureka moment of understanding as to why sessions are UTC and opens, closes etc are tz-naive? Haven't had it yet.

pandas_market_calendars defines them the other way around which IMHO seems to be far more meaningful. opens, closes etc. are UTC times and would surely benefit from having that context(?). However, session labels are nothing more than an arbitrary timestamp to represent a trading day. Indeed, the ExchangeCalendar doc notes that a session label should not be considered a specific point in time. In which case, why assign them a timezone? I'd suggest that defining session labels in terms of UTC adds context that isn't meaningful, and indeed confuses their purpose. Are there advantages to defining sessions as UTC that I'm just not seeing?

I suspect that changing opens, closes etc. to UTC and/or sessions to tz-naive would involve far more than changing a few lines of code. Also, I imagine it would break a lot of exchange_calendars users (for a start, referencing a tz-naive DatetimeIndex with a tz-aware datetime has been deprecated). Perhaps one for release 4.0?

All that said, if you decide you would like to make sessions tz-naive and/or make open, closes etc. UTC then I'd more than happily have a look at putting together a PR.

@gerrymanoim
Copy link
Owner Author

I've been half expecting to come across something that would give me a mini-eureka moment of understanding as to why sessions are UTC and opens, closes etc are tz-naive? Haven't had it yet.

I'd say a lot of this is an artifact of how the original trading calendars library was created https://github.com/quantopian/trading_calendars, its use in zipline, and how far back this was all done (the earliest commits in trading calendars are from 2013, and that was after this was spun out of zipline). So some of this grew out organically, some of this is a lack of/difficulty of tz support back in the mid 2010s. Prior to this fork, we also maintained compatibility with py27/much older versions of pandas and numpy.

My personal views are probably that exchange_calendars should always represent times in UTC, internally and externally. This makes the interface more uniform and (I argue) makes it easier to work with downstream (especially if you're doing something with multiple timezones, your data is probably in UTC already and you get "consistent" times across markets. Also it is arguably easier to convert everything into one tz as you need (say you want all of this in EST).

Are there advantages to defining sessions as UTC that I'm just not seeing?

The main advantage (in my mind) is just consistency. You know whenever you work with the library or are relying on data coming from the library that the timezone is in UTC, you never have to think about it. This definitely is a reflection of my own experience/personal preference, so I'm definitely open to other opinions (though maybe there's no "correct" answer).

I agree that this would be a bigger change, both in work and interface, and is more suitable to a longer term 4.0 release and not something we should do lightly.

@maread99
Copy link
Collaborator

maread99 commented Jul 9, 2021

My personal views are probably that exchange_calendars should always represent times in UTC, internally and externally. This makes the interface more uniform and (I argue) makes it easier to work with downstream (especially if you're doing something with multiple timezones, your data is probably in UTC already and you get "consistent" times across markets. Also it is arguably easier to convert everything into one tz as you need (say you want all of this in EST).

Big +1 on this. I'm working on a library that creates price data sets that can include instruments trading in different timezones (will be open source). All the internals are UTC and it would certainly be easier to work with exchange_calendars if opens, closes etc. were UTC.

With regards to defining session labels as UTC:

The main advantage (in my mind) is just consistency. You know whenever you work with the library or are relying on data coming from the library that the timezone is in UTC, you never have to think about it. This definitely is a reflection of my own experience/personal preference, so I'm definitely open to other opinions (though maybe there's no "correct" answer).

Interesting. My experience led me in the opposite direction, wishing they were tz-naive! For me I think it comes down to the labels not being times, and defining them as UTC confuses this. Maybe this doesn't outweigh advantages of across-the-board consistency. It would be interesting if other users were to offer their thoughts on this one. I'm due to have a good tidy up of the library I'm working on - I'll add to this conversation anything I come across that informed my view.

In the meantime, would it be reasonable to start a Release 4.0 issue with a todo along the lines of...

Change opens, closes, break_starts and break_ends to UTC.

?

@maread99 maread99 mentioned this issue Jul 22, 2021
49 tasks
@maread99 maread99 changed the title dateimte[ns, UTC] in ExchangeCalendar constructor Sessions to tz-naive? Schedule times to UTC? Oct 5, 2021
@maread99
Copy link
Collaborator

maread99 commented Oct 5, 2021

Summary of notes above in favour of tz-naive sessions:

  • defining sessions as UTC adds context that isn't meaningful, confusing their purpose. A session is broadly analogous with a date, but not a time.
  • tz-naive sessions would bring exchange_calendars and pandas_market_calendars in line.

Work on the 'side' option and new TestBase has again had me thinking that sessions should be tz-naive...

Internals
Rather than the internals being made simpler with everything in UTC, I find if anything it complicates it by requiring the likes of calendar definitions and test interrogations to all be put in terms of UTC. I've found myself trying to simplify these by allowing dates to be defined as simple strings which are later converted to UTC timestamps by the base class.

Moving forwards I'd anticipate the internals looking at session nanos rather than the timestamps (discussion #87). This would make the question more a matter of how sessions should be represented externally.

Externals
I struggle to see a use case where users will have their session inputs already defined as UTC timestamps (more likely to be tz-naive dates or strings). Most ExchangeCalendar methods now provide for parsing strings and non-UTC timestamps although users still need to convert to UTC if directly interrogating schedule.index.

As for output, I can't see any case where a session that's spat out would be compared with a time. Indeed, if a user were comparing a session with a time then it's likely they haven't understood the concept of a session. In my experience, session output is more likely to be used to compare against tz-naive dates. In this case users have to convert the session output back to tz-naive (otherwise you get the 'cannot compare a tz-naive timestamp with a tz-aware timestamp' error).

In short, IMHO defining sessions as UTC is forcing consistency on concepts that aren't consistent. Consequently:

  • confuses what a session is.
  • adds work both internally and externally.

@maread99
Copy link
Collaborator

maread99 commented Oct 7, 2021

This test might be a good example of how having tz-aware sessions can give rise to confusion. The test mistakenly assigns a local timezone to every expected holiday and then tries to verify that those expected holidays are not sessions by looking for them in all_sessions. By assigning a local timezone to them it's impossible for an expected holiday to be found in all_sessions (as they are UTC), i.e. the test will always pass, even if an expected holiday was actually a session.

@gerrymanoim
Copy link
Owner Author

In the meantime, would it be reasonable to start a Release 4.0 issue with a todo along the lines of...
Change opens, closes, break_starts and break_ends to UTC.

Yep! I'm convinced of your point actually, this is definitely confusing what a session is.

@maread99
Copy link
Collaborator

Changes made in #179 v4 PR. Merged with master in #201.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants