Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using TestSuites for numerical data #1122

Open
jeric250 opened this issue May 23, 2024 · 3 comments
Open

Error using TestSuites for numerical data #1122

jeric250 opened this issue May 23, 2024 · 3 comments

Comments

@jeric250
Copy link

Hi there, first time opening an issue so bear with me (and let me know if more info is needed).

Basic information:
Package version used: 0.4.20
Operating system and version: macOS VSCode
Programming language and version used: Python 3.12.2

Code snippet:

from evidently.calculations.stattests import StatTest
from evidently.test_suite import TestSuite
from evidently.tests import *

data_drift_dataset_tests = TestSuite(tests=[
    TestShareOfDriftedColumns(stattest='psi'),
])

# ref_df: represents reference pandas DataFrame data (only numerical features)
# curr_df: represents current pandas DataFrame data (only numerical features)
data_drift_dataset_tests.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_tests

The above code is based on Evidently documentation: https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_specify_stattest_for_a_testsuite.ipynb

Error message:
image

The above code snippet takes in only numerical data in a pandas DataFrame (data type of 'float64', 'int64'). When I use the exact same code for only categorical data (data type of 'object','category'), the above code works fine with a report generated.

I checked whether the numerical data used contain any weird values, and it doesn't seem to be the case. For example, to find records with non-numeric values:
ref_df[~ref_df.applymap(np.isreal).all(1)]

What am I missing? Any advice?

@elenasamuylova
Copy link
Collaborator

Hi @jeric250, could you try to run pd.to_numeric on your input columns?

@jeric250
Copy link
Author

jeric250 commented May 23, 2024

Thanks @elenasamuylova for responding so quickly. Forgot to mention, I did try pd.to_numeric as well, something like:
ref_df = ref_df.apply(pd.to_numeric, errors='coerce')
However, the same error still occurred. There's also no null values in the dataset as well.

When I tried to test on a single numerical column, I get the same error as well.

# test on AGE column, represent age of people (e.g. 32, 40)
data_drift_column_report = Report(metrics=[
    ColumnDriftMetric('AGE'),
    ColumnValuePlot('AGE'),  
])

data_drift_column_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_column_report

Error:
UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U14'), dtype('float64')) -> None

Same error when I tried DataDriftTable:

data_drift_dataset_report = Report(metrics=[
    DataDriftTable(num_stattest='wasserstein', cat_stattest='psi'),    
])

data_drift_dataset_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_report

When I limit DataDriftTable to just categorical columns, it works fine with a report generated.

@rezan21
Copy link

rezan21 commented Oct 21, 2024

@jeric250

I found out that the UFuncTypeError when using evidently.ai is oddly related to the index of the dataframes passed as reference_data or current_data. If your dataframes have a named index, it will cause the error: "UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U20'), dtype('float64')) -> None"

Solution: To address this, remove (drop) the index from the dataframe:

x = df.copy()
x.reset_index(drop=True, inplace=True) # <- remove index
report = Report(metrics=[ColumnDriftMetric(column_name="premium")]) # 'premium' is an arbitrary feature in my dataset
report.run(reference_data=x, current_data=x) # <- note: you should set reference_data and current_data accordingly 
report

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants