Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

Merged
merged 13 commits into from
Jul 2, 2022
2 changes: 2 additions & 0 deletions docs/release_notes/release_notes-0.16.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Key Features and Updates
* FIX-#4577: Set attribute of Modin dataframe to updated value (#4588)
* FIX-#4411: Fix binary_op between datetime64 Series and pandas timedelta (#4592)
* FIX-#4582: Inherit custom log layer (#4583)
* FIX-#4593: Ensure Modin warns when setting columns via attributes (#4621)
* Performance enhancements
* PERF-#4182: Add cell-wise execution for binary ops, fix bin ops for empty dataframes (#4391)
* Benchmarking enhancements
Expand Down Expand Up @@ -47,3 +48,4 @@ Contributors
@pyrito
@suhailrehman
@RehanSD
@helmeleegy
2 changes: 1 addition & 1 deletion modin/pandas/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2502,7 +2502,7 @@ def __setattr__(self, key, value):
# `__getattr__` will return the columns not present in `dir(self)`, so we do not need
# to manually track this state in the `dir`.
return
elif isinstance(value, pandas.Series):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add an extra elif case that checks to see if the value the user is trying to set is_list_like() here and throw the warning in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacement is sufficient rather than having both checks, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacement is sufficient since is_list_like() will always be True for pandas.Series.

elif is_list_like(value):
warnings.warn(
"Modin doesn't allow columns to be created via a new attribute name - see "
+ "https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access",
Expand Down
31 changes: 31 additions & 0 deletions modin/pandas/test/dataframe/test_iter.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import matplotlib
import modin.pandas as pd
import io
import warnings

from modin.pandas.test.utils import (
random_state,
Expand Down Expand Up @@ -289,6 +290,36 @@ def test___setattr__mutating_column():
df_equals(modin_df, pandas_df)
assert modin_df.col0.equals(modin_df["col0"])

# Check that attempting to add a new col via attributes raises warning
# and adds the provided list as a new attribute and not a column.
with pytest.warns(
UserWarning,
match="Modin doesn't allow columns to be created via a new attribute name - see "
+ "https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access",
):
modin_df.col1 = [4]

with warnings.catch_warnings():
warnings.filterwarnings(
Comment on lines +302 to +303
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we catching the warnings and filtering them out? Is this so ensure that only these warnings are thrown here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only the warnings we care about will cause the test to fail if they were raised (they should not!). Other types of warnings we have no control over. They may or may not be raised (possibly depending on the environment), so we want to ignore them (filter them out).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, makes sense

action="error",
message="Modin doesn't allow columns to be created via a new attribute name - see "
+ "https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access",
)
modin_df.col1 = [5]
modin_df.new_attr = 6
modin_df.col0 = 7

assert "new_attr" in dir(
modin_df
), "Modin attribute was not correctly added to the df."
assert (
"new_attr" not in modin_df
), "New attribute was not correctly added to columns."
assert modin_df.new_attr == 6, "Modin attribute value was set incorrectly."
assert isinstance(
modin_df.col0, pd.Series
), "Scalar was not broadcasted properly to an existing column."


@pytest.mark.parametrize("data", test_data_values, ids=test_data_keys)
def test_isin(data):
Expand Down