make FairClassifier data-agnostic #669

DeaMariaLeon · 2024-05-13T19:18:53Z

Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Trying to make FairClassifier data-agnostic. Issue #658
The issue is equal_opportunity_score is used that to test it. But this is a public function, so Narwhals can't be added there yet. I did try but a lot of changes needed to be done (I think).

That's why I added this to tests/test_estimators/test_equal_opportunity.py:

ln 121  if isinstance(X, pl.DataFrame):
           X = pd.DataFrame(X.to_dict())
           y = pd.Series(y.to_list())

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.
@MarcoGorelli

MarcoGorelli · 2024-05-13T19:40:49Z

hey

But this is a public function, so Narwhals can't be added there yet.

Not sure I understand, could you clarify what you mean please?

Does it not work to do

--- a/sklego/metrics.py
+++ b/sklego/metrics.py
@@ -152,7 +152,7 @@ def equal_opportunity_score(sensitive_column, positive_target=1):
         """Remember: X is the thing going *in* to your pipeline."""
         sensitive_col = X[:, sensitive_column] if isinstance(X, np.ndarray) else X[sensitive_column]
 
-        if not np.all((sensitive_col == 0) | (sensitive_col == 1)):
+        if not ((sensitive_col == 0) | (sensitive_col == 1)).all():

?

FBruzzesi

Ciao Dea! Thanks for the contribution, as hinted by Marco, please take a look at metrics module which fairness estimators are supposed to interoperate with.

If there is no way to work around this, please add a few tests checking how .predict(..) behaves. In fact we do not implement any predict method but inherit from LinearClassifierMixin, which uses the array api standard internally.

DeaMariaLeon · 2024-05-14T11:19:03Z

Ciao and thanks to both of you!

I meant that equal_opportunity_score is part of the API, and can be used with other classifiers. I thought that I shouldn't change that part if others are & can use this function. Specially because I had earlier tried Marco's solution, but also something dumb and was breaking metrics. 🙈

Long-story short, it seems to work.. thanks for the quick feedback.

FBruzzesi

It's awesome that it can work with both arrays and dataframes 👌🚀

Two more requests for tests:

Could you please remove the sensitive_classification_dataset fixture, and
Update the test_demographic_parity with the sensitive_classification_dataset_equalopportunity dataset.

DemographicParityClassifier is also based on _FairClassifier class

DeaMariaLeon · 2024-05-14T14:47:06Z

sensitive_classification_dataset is also used here edit: (line 68...):
https://github.com/koaning/scikit-lego/blob/main/tests/test_metrics/test_equal_opportunity.py on a LogisticRegression..
Should I really remove it?

That breaks the tests bellow. To avoid that, I would have to convert from polars to pandas before "feeding" the data to LogisticRegression.. am I missing something?

ERROR tests/test_metrics/test_equal_opportunity.py::test_p_percent_numpy
ERROR tests/test_metrics/test_equal_opportunity.py::test_warning_is_logged
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_pandas
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_numpy
ERROR tests/test_metrics/test_p_percent.py::test_warning_is_logged

@FBruzzesi

DeaMariaLeon · 2024-05-14T16:01:41Z

Just to clarify, I did all the requested changes - except:

Could you please remove the sensitive_classification_dataset fixture

That way the errors I showed earlier are gone. (Until new instructions 🤓)

FBruzzesi

Sorry I got caught up with life! Thanks for adjustments and the heads up!

I am ok with keeping it as is for now and iterate later on. I tried to remove numpy specific parts and use methods directly (similarly to what Marco suggested), but I couldn't manage to make pass all the tests.

cc: @koaning do you have any strong opinions on this matter?

* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com> Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com> Co-authored-by: Dea María Léon <deamarialeon@gmail.com>

DeaMariaLeon added 4 commits May 13, 2024 17:26

start all over

cc89f91

fixture working

e444e00

wip

f521ea5

passing tests - again

9b97417

FBruzzesi reviewed May 13, 2024

View reviewed changes

pre-commit complaining

90359fb

DeaMariaLeon requested a review from FBruzzesi May 14, 2024 11:57

FBruzzesi requested changes May 14, 2024

View reviewed changes

FBruzzesi mentioned this pull request May 14, 2024

[FEATURE] Narwhals migration for dataframe-agnostic codebase #658

Closed

changed fixture on test_demographic_parity

bf491d2

DeaMariaLeon requested a review from FBruzzesi May 14, 2024 16:01

FBruzzesi approved these changes May 16, 2024

View reviewed changes

FBruzzesi merged commit 8d33f1c into koaning:narwhals-development May 18, 2024
16 checks passed

DeaMariaLeon deleted the fair5 branch May 18, 2024 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make FairClassifier data-agnostic #669

make FairClassifier data-agnostic #669

DeaMariaLeon commented May 13, 2024

MarcoGorelli commented May 13, 2024

FBruzzesi left a comment

DeaMariaLeon commented May 14, 2024

FBruzzesi left a comment •

edited

Loading

DeaMariaLeon commented May 14, 2024 •

edited

Loading

DeaMariaLeon commented May 14, 2024

FBruzzesi left a comment •

edited

Loading

make FairClassifier data-agnostic #669

make FairClassifier data-agnostic #669

Conversation

DeaMariaLeon commented May 13, 2024

Description

Type of change

Checklist:

MarcoGorelli commented May 13, 2024

FBruzzesi left a comment

Choose a reason for hiding this comment

DeaMariaLeon commented May 14, 2024

FBruzzesi left a comment • edited Loading

Choose a reason for hiding this comment

DeaMariaLeon commented May 14, 2024 • edited Loading

DeaMariaLeon commented May 14, 2024

FBruzzesi left a comment • edited Loading

Choose a reason for hiding this comment

FBruzzesi left a comment •

edited

Loading

DeaMariaLeon commented May 14, 2024 •

edited

Loading

FBruzzesi left a comment •

edited

Loading