-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make FairClassifier data-agnostic #669
Conversation
hey
Not sure I understand, could you clarify what you mean please? Does it not work to do --- a/sklego/metrics.py
+++ b/sklego/metrics.py
@@ -152,7 +152,7 @@ def equal_opportunity_score(sensitive_column, positive_target=1):
"""Remember: X is the thing going *in* to your pipeline."""
sensitive_col = X[:, sensitive_column] if isinstance(X, np.ndarray) else X[sensitive_column]
- if not np.all((sensitive_col == 0) | (sensitive_col == 1)):
+ if not ((sensitive_col == 0) | (sensitive_col == 1)).all(): ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ciao Dea! Thanks for the contribution, as hinted by Marco, please take a look at metrics
module which fairness estimators are supposed to interoperate with.
If there is no way to work around this, please add a few tests checking how .predict(..)
behaves. In fact we do not implement any predict
method but inherit from LinearClassifierMixin
, which uses the array api standard internally.
Ciao and thanks to both of you! I meant that Long-story short, it seems to work.. thanks for the quick feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome that it can work with both arrays and dataframes 👌🚀
Two more requests for tests:
- Could you please remove the
sensitive_classification_dataset
fixture, and - Update the
test_demographic_parity
with thesensitive_classification_dataset_equalopportunity
dataset.
DemographicParityClassifier
is also based on _FairClassifier
class
That breaks the tests bellow. To avoid that, I would have to convert from polars to pandas before "feeding" the data to
|
Just to clarify, I did all the requested changes - except:
That way the errors I showed earlier are gone. (Until new instructions 🤓) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I got caught up with life! Thanks for adjustments and the heads up!
I am ok with keeping it as is for now and iterate later on. I tried to remove numpy specific parts and use methods directly (similarly to what Marco suggested), but I couldn't manage to make pass all the tests.
cc: @koaning do you have any strong opinions on this matter?
* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com> Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com> Co-authored-by: Dea María Léon <deamarialeon@gmail.com>
Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
Trying to make FairClassifier data-agnostic. Issue #658
The issue is
equal_opportunity_score
is used that to test it. But this is a public function, so Narwhals can't be added there yet. I did try but a lot of changes needed to be done (I think).That's why I added this to
tests/test_estimators/test_equal_opportunity.py
:Type of change
Checklist:
If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.
@MarcoGorelli