-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/window transformer #1269
Feat/window transformer #1269
Conversation
…ng) + testing + formatting
Codecov ReportBase: 93.94% // Head: 94.02% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1269 +/- ##
==========================================
+ Coverage 93.94% 94.02% +0.08%
==========================================
Files 80 82 +2
Lines 8708 8893 +185
==========================================
+ Hits 8181 8362 +181
- Misses 527 531 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
merging two branches history
…g window transformations to darts/dataprocessing/transformers/window_transformer.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start @eliane-maalouf. My main comments are mainly
- I'm not sure I understand the behavior with multivariate series. I'd have expected window transformed applied on a multivariate series to return a new multivariate series with windowing applied component-wise.
- I see you iterate over series (that's OK) and components when extracting the values. Would be nicer for performance reasons to extract values for all components at once.
- How about moving the core of the windowing logic to a function
TimeSeries.window_transform()
? It would allow users to simply get windowed series without instantiating a transformer, similar toTimeSeries.map()
. Then this transformer could simply callTimeSeries.window_transform()
. (seeMapper
transformer). WDYT?
Two optional keys can be provided for more flexibility: 'series_id' and 'comp_id'. | ||
The 'series_id' key specifies the index of the series in the input sequence of series to which the | ||
transformation should be applied. | ||
The 'comp_id' key specifies the index of the component of the series to which the transformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to support preferentially the case where components are provided as string names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that series_id
and component
could also be lists?
Co-authored-by: Julien Herzen <julien@unit8.co>
Co-authored-by: Julien Herzen <julien@unit8.co>
Co-authored-by: Julien Herzen <julien@unit8.co>
Co-authored-by: Julien Herzen <julien@unit8.co>
… window transformations on a given series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's some good foundations for the windowing @eliane-maalouf, thanks for that! I think the core of it, relying on pandas and separating the stochastic series over several columns is good.
There are still quite a few things to change, and behaviour that might need correction (I think I spotted a couple of bugs).
As an overall comment, I would really recommend that you try to simplify your code to the maximum possible. For instance you do many checks, many of which can be either simply skipped (delegated to another part of the code responsible for raising the exception) or drastically simplified. I've put quite a few examples of how that could look like in the comments. Each line of code is a small debt for our future selves who'll have to maintain it, so it's good to minimise it when we can. Small is often beautiful :)
Also in terms of structure, I think you could separate a bit more two steps in the TimeSeries function:
- First do all the checks required (should hold in few lines of code only hopefully)
- Then assume everything is checked and correct, and apply the logic.
This way you can avoid inter-mingling some computation code with correctness- or type-checking code.
…mer_example.ipynb
@hrzn what do you think of this new version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great @eliane-maalouf ! I think it's nearly ready to be merged :) I've added a bunch of tiny comments, nothing important. I have also pushed 1 commit to make the transformer be listed in the documentation page.
Could you remove the draft notebook before merging? We can later transform it into a nice user-facing example notebook :) I've created a new task for that
transformed_ts = self.series_multi_det.window_transform( | ||
transforms=window_transformations, keep_non_transformed=True | ||
) | ||
self.assertEqual(len(transformed_ts.components), 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpicking) it would be a tiny bit better to test the actual component names (incl the original ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand, I proposed something based on what I understood. Using sets to work with the column names was messing up the order in fact, good thing you asked me to check the component names.
Thanks @hrzn for the feedback. I removed the draft notebook and included your other comments. |
nice to see that this feature is in! |
Fixes #1079.
Draft for code review.Summary
Implement window features generation as transformer that can be called as a standalone transformation, from pipeline
or from forecasting model.Other Information
Implementation started by @adamkells in #1203