Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to generate Report for Multi-class Classification with single output class #1275

Open
yudhiesh opened this issue Sep 2, 2024 · 0 comments

Comments

@yudhiesh
Copy link

yudhiesh commented Sep 2, 2024

I have simulated a data drift which results in the model predicting the same class over and over again, when I try to run the Report on the reference data and current data, here is the current code:

from evidently.pipeline.column_mapping import ColumnMapping

from evidently.report import Report
from evidently.metrics import ClassificationQualityMetric

column_mapping = ColumnMapping()

column_mapping.target = 'label'
column_mapping.prediction = ['prob_NEGATIVE', 'prob_NEUTRAL', 'prob_POSITIVE']
column_mapping.text_features = ['text']
column_mapping.numerical_features = []
column_mapping.task = 'classification'
column_mapping.categorical_features = []

performance_report = Report(metrics=[
    ClassificationQualityMetric()
])

performance_report.run(reference_data=test_df, current_data=data_drift_df, column_mapping=column_mapping)
performance_report.show()

Here is the current/reference data example:

text | label | prob_NEGATIVE | prob_NEUTRAL | prob_POSITIVE | predicted_label | predicted_sentiment
« C’est de loin la méthode de contraception la... | 0 | 0.219654 | 0.071736 | 0.708610 | 2 | POSITIVE
« Je prends de la doxy depuis un certain temps... | 0 | 0.307037 | 0.108540 | 0.584423 | 2 | POSITIVE
« En 8 heures de prise d'un comprimé, j'ai eu ... | 0 | 0.159101 | 0.039321 | 0.801578 | 2 | POSITIVE
« Cela a changé ma vie. Je peux travailler eff... | 2 | 0.172600 | 0.040159 | 0.787241 | 2 | POSITIVE
« Cela a changé ma vie. L’anxiété a disparu, e... | 2 | 0.172715 | 0.037171 | 0.790113 | 2 | POSITIVE

I get the following error, which stems from scikit-learn:

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-57-7c6b02163273>](https://localhost:8080/#) in <cell line: 1>()
----> 1 performance_report.show()

13 frames

[/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py](https://localhost:8080/#) in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    338             return np.zeros((n_labels, n_labels), dtype=int)
    339         elif len(np.intersect1d(y_true, labels)) == 0:
--> 340             raise ValueError("At least one label specified must be in y_true")
    341 
    342     if sample_weight is None:

ValueError: At least one label specified must be in y_true

It seems that the labels are not getting propogated down to calculate metrics that leverage probabilities such as ROC_AUC as per this Stackoverflow thread. Noticed a similar issue before that was fixed.

I am currently using evidently==0.4.19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant