feat: make `add_lags` dataframe-agnostic #661

MarcoGorelli · 2024-05-10T09:05:19Z

Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

add_lags is now dataframe-agnostic - this one was surprisingly easy

Fixes #(issue)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.

MarcoGorelli · 2024-05-10T09:06:14Z

sklego/pandas_utils.py

-    combos = (df[col].shift(-lag).rename(col + str(lag)) for col in cols for lag in lags)
+    answer = df.with_columns([nw.col(col).shift(-lag).alias(col + str(lag)) for col in cols for lag in lags])

-    answer = pd.concat([df, *combos], axis=1)


this diff is surprisingly pleasing

MarcoGorelli · 2024-05-10T09:06:50Z

tests/test_pandas_utils/test_pandas_utils.py

-def test_log_step(capsys, test_df):
+def test_log_step(capsys, data):
    """Base test of log_step without any arguments to the logger"""
+    test_df = pd.DataFrame(data)


I haven't worked on log_step in this PR, so for now I'm just keeping the test_log_step tests to pandas

MarcoGorelli · 2024-05-10T09:36:33Z

FAILED tests/test_estimators/test_demographic_parity.py::test_standard_checks[check_fit2d_predict1d] - TypeError: Clarabel: unrecognized solver setting 'max_iters'.

🤔 not sure what's going on with tests in CI

EDIT: same thing is happening for PRs against main: #662

FBruzzesi · 2024-05-10T09:37:07Z

sklego/pandas_utils.py

        raise KeyError("The column does not exist")

-    combos = (df[col].shift(-lag).rename(col + str(lag)) for col in cols for lag in lags)
+    answer = df.with_columns(nw.col(col).shift(-lag).alias(col + str(lag)) for col in cols for lag in lags)


Any plan for the .name namespace? Asking for a friend 😂
I can take a look myself during the weekend 😇

definitely! nw.col(col).shift(-lag).name.suffix(str(lag)) would be even nicer

FBruzzesi · 2024-05-10T11:35:01Z

.github/workflows/test.yml

@@ -28,6 +28,6 @@ jobs:
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
-      run: python -m pip install -e ".[test-all]"
+      run: python -m pip install "cvxpy<1.5.0" -e ".[test-all]"


If you merge main there should be no need for this pinning now

FBruzzesi

Thanks Marco 👋🏼 ! This is moving fast!

* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com> Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com> Co-authored-by: Dea María Léon <deamarialeon@gmail.com>

MarcoGorelli commented May 10, 2024

View reviewed changes

make add_lags dataframe-agnostic

539e657

MarcoGorelli force-pushed the add-lags-df-agnostic branch from a3dab2e to 539e657 Compare May 10, 2024 09:22

MarcoGorelli changed the title ~~Make add_lags dataframe-agnostic~~ feat: make add_lags dataframe-agnostic May 10, 2024

try getting tests to run?

4c87a1c

FBruzzesi reviewed May 10, 2024

View reviewed changes

MarcoGorelli mentioned this pull request May 10, 2024

debug: no-op #662

Closed

MarcoGorelli marked this pull request as ready for review May 10, 2024 09:44

MarcoGorelli mentioned this pull request May 10, 2024

Add .name.suffix narwhals-dev/narwhals#130

Closed

FBruzzesi reviewed May 10, 2024

View reviewed changes

patch: cvxpy 1.5.0 support (koaning#663)

b8d5fca

MarcoGorelli force-pushed the add-lags-df-agnostic branch from 5717d84 to b8d5fca Compare May 10, 2024 12:05

FBruzzesi approved these changes May 10, 2024

View reviewed changes

FBruzzesi merged commit 28c102b into koaning:narwhals-development May 10, 2024
16 checks passed

FBruzzesi mentioned this pull request May 10, 2024

[FEATURE] Narwhals migration for dataframe-agnostic codebase #658

Closed

FBruzzesi mentioned this pull request May 18, 2024

feat: Narwhals for dataframe-agnostic codebase #671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make `add_lags` dataframe-agnostic #661

feat: make `add_lags` dataframe-agnostic #661

MarcoGorelli commented May 10, 2024

MarcoGorelli May 10, 2024

MarcoGorelli May 10, 2024

MarcoGorelli commented May 10, 2024 •

edited

Loading

FBruzzesi May 10, 2024

MarcoGorelli May 10, 2024 •

edited

Loading

FBruzzesi May 10, 2024

FBruzzesi left a comment

feat: make add_lags dataframe-agnostic #661

feat: make add_lags dataframe-agnostic #661

Conversation

MarcoGorelli commented May 10, 2024

Description

Type of change

Checklist:

MarcoGorelli May 10, 2024

Choose a reason for hiding this comment

MarcoGorelli May 10, 2024

Choose a reason for hiding this comment

MarcoGorelli commented May 10, 2024 • edited Loading

FBruzzesi May 10, 2024

Choose a reason for hiding this comment

MarcoGorelli May 10, 2024 • edited Loading

Choose a reason for hiding this comment

FBruzzesi May 10, 2024

Choose a reason for hiding this comment

FBruzzesi left a comment

Choose a reason for hiding this comment

feat: make `add_lags` dataframe-agnostic #661

feat: make `add_lags` dataframe-agnostic #661

MarcoGorelli commented May 10, 2024 •

edited

Loading

MarcoGorelli May 10, 2024 •

edited

Loading