-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make add_lags
dataframe-agnostic
#661
feat: make add_lags
dataframe-agnostic
#661
Conversation
sklego/pandas_utils.py
Outdated
combos = (df[col].shift(-lag).rename(col + str(lag)) for col in cols for lag in lags) | ||
answer = df.with_columns([nw.col(col).shift(-lag).alias(col + str(lag)) for col in cols for lag in lags]) | ||
|
||
answer = pd.concat([df, *combos], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this diff is surprisingly pleasing
def test_log_step(capsys, test_df): | ||
def test_log_step(capsys, data): | ||
"""Base test of log_step without any arguments to the logger""" | ||
test_df = pd.DataFrame(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't worked on log_step
in this PR, so for now I'm just keeping the test_log_step
tests to pandas
a3dab2e
to
539e657
Compare
add_lags
dataframe-agnosticadd_lags
dataframe-agnostic
🤔 not sure what's going on with tests in CI EDIT: same thing is happening for PRs against main: #662 |
raise KeyError("The column does not exist") | ||
|
||
combos = (df[col].shift(-lag).rename(col + str(lag)) for col in cols for lag in lags) | ||
answer = df.with_columns(nw.col(col).shift(-lag).alias(col + str(lag)) for col in cols for lag in lags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plan for the .name
namespace? Asking for a friend 😂
I can take a look myself during the weekend 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely! nw.col(col).shift(-lag).name.suffix(str(lag))
would be even nicer
.github/workflows/test.yml
Outdated
@@ -28,6 +28,6 @@ jobs: | |||
with: | |||
python-version: ${{ matrix.python-version }} | |||
- name: Install dependencies | |||
run: python -m pip install -e ".[test-all]" | |||
run: python -m pip install "cvxpy<1.5.0" -e ".[test-all]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge main there should be no need for this pinning now
5717d84
to
b8d5fca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com> Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com> Co-authored-by: Dea María Léon <deamarialeon@gmail.com>
Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.
Description
add_lags is now dataframe-agnostic - this one was surprisingly easy
Fixes #(issue)
Type of change
Checklist:
If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.