Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make FairClassifier data-agnostic #669

Merged
merged 6 commits into from
May 18, 2024

Conversation

DeaMariaLeon
Copy link
Contributor

Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Trying to make FairClassifier data-agnostic. Issue #658
The issue is equal_opportunity_score is used that to test it. But this is a public function, so Narwhals can't be added there yet. I did try but a lot of changes needed to be done (I think).

That's why I added this to tests/test_estimators/test_equal_opportunity.py:

ln 121  if isinstance(X, pl.DataFrame):
           X = pd.DataFrame(X.to_dict())
           y = pd.Series(y.to_list())

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines (ruff)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (also to the readme.md)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added tests to check whether the new feature adheres to the sklearn convention
  • New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.
@MarcoGorelli

@MarcoGorelli
Copy link
Contributor

hey

But this is a public function, so Narwhals can't be added there yet.

Not sure I understand, could you clarify what you mean please?

Does it not work to do

--- a/sklego/metrics.py
+++ b/sklego/metrics.py
@@ -152,7 +152,7 @@ def equal_opportunity_score(sensitive_column, positive_target=1):
         """Remember: X is the thing going *in* to your pipeline."""
         sensitive_col = X[:, sensitive_column] if isinstance(X, np.ndarray) else X[sensitive_column]
 
-        if not np.all((sensitive_col == 0) | (sensitive_col == 1)):
+        if not ((sensitive_col == 0) | (sensitive_col == 1)).all():

?

Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ciao Dea! Thanks for the contribution, as hinted by Marco, please take a look at metrics module which fairness estimators are supposed to interoperate with.

If there is no way to work around this, please add a few tests checking how .predict(..) behaves. In fact we do not implement any predict method but inherit from LinearClassifierMixin, which uses the array api standard internally.

@DeaMariaLeon
Copy link
Contributor Author

Ciao and thanks to both of you!

I meant that equal_opportunity_score is part of the API, and can be used with other classifiers. I thought that I shouldn't change that part if others are & can use this function. Specially because I had earlier tried Marco's solution, but also something dumb and was breaking metrics. 🙈

Long-story short, it seems to work.. thanks for the quick feedback.

@DeaMariaLeon DeaMariaLeon requested a review from FBruzzesi May 14, 2024 11:57
Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome that it can work with both arrays and dataframes 👌🚀

Two more requests for tests:

  • Could you please remove the sensitive_classification_dataset fixture, and
  • Update the test_demographic_parity with the sensitive_classification_dataset_equalopportunity dataset.

DemographicParityClassifier is also based on _FairClassifier class

@DeaMariaLeon
Copy link
Contributor Author

DeaMariaLeon commented May 14, 2024

sensitive_classification_dataset is also used here edit: (line 68...):
https://github.com/koaning/scikit-lego/blob/main/tests/test_metrics/test_equal_opportunity.py on a LogisticRegression..
Should I really remove it?

That breaks the tests bellow. To avoid that, I would have to convert from polars to pandas before "feeding" the data to LogisticRegression.. am I missing something?

ERROR tests/test_metrics/test_equal_opportunity.py::test_p_percent_numpy
ERROR tests/test_metrics/test_equal_opportunity.py::test_warning_is_logged
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_pandas
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_numpy
ERROR tests/test_metrics/test_p_percent.py::test_warning_is_logged

@FBruzzesi

@DeaMariaLeon
Copy link
Contributor Author

Just to clarify, I did all the requested changes - except:

Could you please remove the sensitive_classification_dataset fixture

That way the errors I showed earlier are gone. (Until new instructions 🤓)

@DeaMariaLeon DeaMariaLeon requested a review from FBruzzesi May 14, 2024 16:01
Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got caught up with life! Thanks for adjustments and the heads up!

I am ok with keeping it as is for now and iterate later on. I tried to remove numpy specific parts and use methods directly (similarly to what Marco suggested), but I couldn't manage to make pass all the tests.

cc: @koaning do you have any strong opinions on this matter?

@FBruzzesi FBruzzesi merged commit 8d33f1c into koaning:narwhals-development May 18, 2024
16 checks passed
@DeaMariaLeon DeaMariaLeon deleted the fair5 branch May 18, 2024 09:57
koaning pushed a commit that referenced this pull request May 24, 2024
* placeholder to develop narwhals features

* feat: make `ColumnDropper` dataframe-agnostic (#655)

* feat: make ColumnDropped dataframe-agnostic

* use narwhals[polars] in pyproject.toml, link to list of supported libraries

* note that narwhals is used for cross-dataframe support

* test refactor

* docstrings

---------

Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com>

* feat: make ColumnSelector dataframe-agnostic (#659)

* columnselector with test rufformatted

* adding whitespace

* fixed the fit and transform

* removed intendation in examples

* font:false

* feat: make `add_lags` dataframe-agnostic (#661)

* make add_lags dataframe-agnostic

* try getting tests to run?

* patch: cvxpy 1.5.0 support (#663)

---------

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

* Make `RegressionOutlier` dataframe-agnostic (#665)

* make regression outlier df-agnostic

* need to use eager-only for this one

* pass native to check_array

* remove cudf, link to check_X_y

* feat: Make InformationFilter dataframe-agnostic

* Make Timegapsplit dataframe-agnostic (#668)

* make timegapsplit dataframe-agnostic

* actually, include cuDF

* feat: make FairClassifier data-agnostic (#669)

* start all over

* fixture working

* wip

* passing tests - again

* pre-commit complaining

* changed fixture on test_demographic_parity

* feat: Make PandasTypeSelector selector dataframe-agnostic (#670)

* make pandas dtype selector df-agnostic

* bump version

* 3.8 compat

* Update sklego/preprocessing/pandastransformers.py

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

* fixup pyproject.toml

* unify (and test!) error message

* deprecate

* update readme

* undo contribution.md change

---------

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

* format typeselector and bump version

* feat: Make grouped and hierarchical dataframe-agnostic (#667)

* feat: make grouped and hierarchical dataframe-agnostic

* add pyarrow

* narwhals grouped_transformer

* grouped transformer eureka

* hierarchical narwhalified

* so close but so far

* return series instead of DataFrame for y

* grouped WIP

* merge branch and fix grouped

* future annotations

* format

* handling negative indices

* solve conflicts

* hacking C

* fairness: change C values in tests

---------

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com>
Co-authored-by: Dea María Léon <deamarialeon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants