-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
date_range / creating trading indices #138
Comments
Originally posted by @Stryder-Git in #136 (comment) Hi @maread99 and @rsheftel, I haven't used schedules with breaks in my own research, so that is something I haven't considered. I only set out to make it faster and wanted to preserve the behaviour of the original function. But your suggestions come at a good time because I actually just finished a newer version, which still behaves the exact same way, is slightly more efficient and is the result of finding some edge cases when doing more thorough testing. Running these tests I found a couple of peculiarities that I would adjust if I were to change the behaviour of the function, which you may be interested in, additional to your suggestions:
Both of these are the cause of the call to pd.date_range in the original function, and might not be what you want/expect. So I am definitely interested in looking into how to implement this as effectively as possible. But I wanted your thoughts on it (and give others the chance to point out what they might want to be different) before working on changing the original behaviour. I also added the test I am looking forward to your thoughts. |
Originally posted by @Stryder-Git in #136 (comment) To plant some seeds for ideas, my initial thoughts on each of these situations:
|
Hi @Stryder-Git, @rsheftel, I haven't contributed to It is still very much a draft, I've only tested it thoroughly with a couple of calendars and with the 24 hour calendars I've only tested it lightly - I wouldn't be surprised if you can break it quite easily. I wrote it based on If you don't have / want to install pydantic then just comment out the decorator. With regards to the specific points you raised:
This is exactly what I did. It obviously makes it considerably slower.
Personally, if a user asks for the index to be
Having said that, you'll see that my attempt interprets As an aside, from looking at your test I suspect you'd find the hypothesis package of interest if you're not already familiar with it - you only need to define what type of inputs a function can receive and then if the function can be broken within those constraints, it'll find a way to break it (in my experience, usually with edge cases that I would never have imagined). |
Hi @maread99, @rsheftel I have briefly read through your code and you seem to have already taken it some steps further than I have. Regarding breaks, splitting the calculation will definitely hamper the performance, but it may be an interesting optimization problem to consider and, with the vectorized calculation, can still be done at a decent speed. I guess the exact behaviour of Implementing our ideas would cause a different behaviour of the original function and possibly some functions and/or classes to be added to the project. Before I spend a more serious amount of time, it would be great to get some other users opinions as well, in particular @rsheftel's. But, knowing myself, I will probably end up playing around with some ideas anyway and will keep you posted if I happen to create something I like. |
@maread99 and @Stryder-Git , this is all great work. I know the community and myself appreciate it. Here are my thoughts:
For the examples given above, I am comfortable with the behavior you both have been describing. It could lead to some confusion, but I think the instances would be rare and the code would be consistent with less special cases so that is a plus. Thanks both for taking the time to do this, if/when you submit the final PR I would be happy to merge it. |
@Stryder-Git, I agree with all your most recent comments and certainly look forward to seeing your revised implementation. @rsheftel, regarding...
...as @Stryder-Git noted, I guess it comes down to clear documentation. On the meaning of the
...with the last interval considered as the last interval with a left side earlier than the close (i.e. right side may align with close or may fall after the close). I believe the above, together with |
I agree with and am fine with the implementation you suggested with one change. To be consistent with the pandas date_range function and documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html) the default should be None and None should have the same meaning as "both" in your list above. I also strongly agree with @Stryder-Git comment that we need more tests. That is also the best way for people to be sure of how the implementation works. Other than that I would say go ahead and submit the PR and I will merge it in. |
Thanks for your reactions, @rsheftel and @maread99, and sorry for the late response. I am working on the enhanced version at the moment, paying attention to the points that I outlined here. Currently all tests pass, except for the tests using calendars with breaks. @rsheftel, please see the first bulletpoint.
---> Do you agree that aligning the indices along break_start and break_end makes more sense than how it has been handled originally, and that I can change those tests?
Extra Points To be as consistent as possible I have only changed two things about the possible inputs to the function.
The new interpretation of the possible values of
This allows for an extra configuration that that I don't remember being possible previously, while still allowing the same results with the old inputs for Since I still want to do some tests and add a way to handle some edge cases when getting overlapping indices, it isn't ready for a PR yet, but here is the code if you want to have a look. You can also see the docstring there for a permutation analysis. But I will explain the kwargs more clearly in the actual PR/ final docstring. |
Yes.
We should definitely have a warning, otherwise I am sure we will get the question over and over in the issues or on Reddit
The problem with the futures calendars is that to make them start properly on a Sunday, the open needs to be greater than the close. We will have to think about how to handle this. |
When implementing the warning, I did some more digging and made changes to handle some edge cases and get a much cleaner result. The edge cases are where the start of the trading session E.g.:
The same logic applies to market_open == break_start, and market_open == market_close (which I actually haven't encountered yet) |
Thank you everyone for your hard work. This is merged into master and updated to v3.0 in 5634385 with corresponding PyPi release |
@Stryder-Git, I've only just got round to having a look at your new If you're interested, the PR's here in a bit of a devlopement queue over at
Worth noting that, if you were interested in doing something similar with the nanos, you could simplify what I've done a lot by not offering the separate |
Hi @Stryder-Git, great to see a new take on evaluating a trading index, especially if its quicker!
I've copied your
date_range
function into a Jupyter Notebook and this...returned...
...which is the same as the return from the existing function.
As you're working on this now I thought I'd mention a couple of things that occured to me when I was looking at using
date_range
. You may or may not want to consider them...For reference:
schedule.iloc[0]
->
closed
on "left"), or at a push 05:25 (on-frequency based on break end), although 05:20 seems rather meaningless in light of there having been a trading break? Personally, I'd consider this a bug in the existing function.Maybe some food for thought if you hadn't already considered these points.
Originally posted by @maread99 in #136 (comment)
The text was updated successfully, but these errors were encountered: