-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor initialization for simpler maintenance #730
Conversation
145ee39
to
c841dcb
Compare
Codecov Report
@@ Coverage Diff @@
## main #730 +/- ##
=====================================
Coverage 95.0% 95.0%
=====================================
Files 59 59
Lines 6014 6020 +6
=====================================
+ Hits 5717 5725 +8
+ Misses 297 295 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>
Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>
Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>
c841dcb
to
3146714
Compare
This LGTM but best if @danielhuppmann approves! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few suggestions inline.
Also, maybe move _validate_complete_index(df)
above format_data()
to have all components defined (in the order as they are called) before the actual function.
df = df.stack(dropna=True) | ||
df.name = "value" | ||
df.index.names = df.index.names[:-1] + [time_col] | ||
df, time_col, extra_cols = _format_data_to_series(df, index) | ||
|
||
# cast value column to numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also put these two checks into their own function _check_data_integrity()
...
Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>
LGTM - thanks @coroa ! Will merge after tests pass. |
Please confirm that this PR has done the following:
Documentation AddedName of contributors Added to AUTHORS.rstDescription of PR
On-top of #729.Rebased tomain
.Splits
utils.format_data
into 6 different functions:_convert_r_columns(df)
- Check and convert R-style year columns_knead_data(df, **kwargs)
- Replace, rename and concat according to user arguments_format_from_database(df)
- Post-process database results_intuit_column_groups(df, index)
- Check and categorise columns in dataframe_format_data_to_series(df, index)
- Convert a long or wide pandas dataframe to a series with the required columns_validate_complete_index(df)
Functionally it should be neutral. Make sure to look at individual commits to follow the refactoring trail.