FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

helmeleegy · 2022-06-30T00:56:55Z

Signed-off-by: Hazem Elmeleegy hazem@ponder.io

What do these changes do?

commit message follows format outlined here
passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
signed commit with git commit -s
Resolves Modin should match pandas UserWarning for assigning columns via attributes #4593
tests added and passing
module layout described at docs/development/architecture.rst is up-to-date
added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

…ttributes Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito

Good start! Could you also check that this is working as mentioned in the issue by writing a small test? Also please don't forget to add your name to the release notes in docs/release_notes

modin/pandas/dataframe.py

jeffreykennethli

nit: add link to issue #4593 in "Resolves:" in PR description, and add note in release_notes

codecov · 2022-06-30T17:21:40Z

Codecov Report

Merging #4621 (5ea1a05) into master (2de5c67) will increase coverage by 4.58%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4621      +/-   ##
==========================================
+ Coverage   85.11%   89.70%   +4.58%     
==========================================
  Files         230      231       +1     
  Lines       18438    18733     +295     
==========================================
+ Hits        15694    16804    +1110     
+ Misses       2744     1929     -815

Impacted Files	Coverage Δ
modin/pandas/dataframe.py	`91.69% <100.00%> (+0.10%)`	⬆️
...odin/core/storage_formats/pandas/query_compiler.py	`96.16% <0.00%> (-0.03%)`	⬇️
modin/pandas/groupby.py	`93.68% <0.00%> (ø)`
modin/experimental/batch/test/test_pipeline.py	`100.00% <0.00%> (ø)`
modin/pandas/base.py	`95.30% <0.00%> (+0.08%)`	⬆️
...mentations/pandas_on_ray/partitioning/partition.py	`93.57% <0.00%> (+1.83%)`	⬆️
...tations/pandas_on_python/partitioning/partition.py	`93.75% <0.00%> (+2.08%)`	⬆️
...entations/pandas_on_dask/partitioning/partition.py	`91.46% <0.00%> (+2.43%)`	⬆️
modin/pandas/__init__.py	`69.69% <0.00%> (+3.03%)`	⬆️
... and 24 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito · 2022-07-01T02:34:59Z

modin/pandas/dataframe.py

+        elif key not in dir(self):
+            if key not in self:
+                warnings.warn(


I believe the logic here is leading to test failures in CI, and this was actually a bug we encountered before where we were improperly setting the attributes. If the key is in self and not in dir, we need to call __setitem__ and return so that we don't call the object.__setattr__ at the bottom of the call. For example, in the failing test case we have this scenario where we are adding a new column as an attribute:

def test___setattr__not_column(): pandas_df = pandas.DataFrame([1, 2, 3]) modin_df = pd.DataFrame([1, 2, 3]) pandas_df.new_col = [4, 5, 6] modin_df.new_col = [4, 5, 6] df_equals(modin_df, pandas_df)

The current code as is will check if key is not in dir(self), which is true and will also see that key is not in self. We will print the warning, but we will also call __setitem__ which is not what we want.

I think this is outdated now. See below.

pyrito · 2022-07-01T02:37:34Z

modin/pandas/dataframe.py

-        elif key in self and key not in dir(self):
+        elif key not in dir(self) and key not in self and not is_list_like(value):


This seems some what of confusing logic to me. I think the way it was expressed the comment above somewhat clearly: if we have an attribute that already exists (column that already exists), we can go ahead and assign the new value. However, if that's not the case, we should promptly let the user know that we can't set new columns as an attribute.

You were right. The current fix is as simple as replacing isinstance(value, pandas.Series) with is_list_like(...).

Initially we thought that assigning a list to a non-existing column should raise a warning but still add a new column. It then turned out that the correct behavior is to store the list as a new attribute rather than a new column, which simplifies the logic a lot.

pyrito · 2022-07-01T02:41:28Z

modin/pandas/test/dataframe/test_iter.py

@@ -289,6 +290,28 @@ def test___setattr__mutating_column():
    df_equals(modin_df, pandas_df)
    assert modin_df.col0.equals(modin_df["col0"])

+    # Check taht adding a new col via attributes raises warning


Suggested change

# Check taht adding a new col via attributes raises warning

# Check that adding a new col via attributes raises warning

pyrito · 2022-07-01T02:42:43Z

modin/pandas/dataframe.py

            self.__setitem__(key, value)
            # Note: return immediately so we don't keep this `key` as dataframe state.
            # `__getattr__` will return the columns not present in `dir(self)`, so we do not need
            # to manually track this state in the `dir`.
            return
-        elif isinstance(value, pandas.Series):


We could add an extra elif case that checks to see if the value the user is trying to set is_list_like() here and throw the warning in this case.

I think replacement is sufficient rather than having both checks, right?

Replacement is sufficient since is_list_like() will always be True for pandas.Series.

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito · 2022-07-01T15:26:34Z

modin/pandas/test/dataframe/test_iter.py

+    with warnings.catch_warnings():
+        warnings.simplefilter("error")
+        modin_df.col1 = [5]
+        modin_df.col0 = 6


What does this bit do? Looks like this test is failing here in CI

This was originally suggested by @RehanSD. I think the idea is to make sure that these statements are not raising warnings. The test does pass locally using pytest. I'm not sure why it doesn't in CI. I removed one of the checks in the last commit, but it's still failing. I'm not yet sure why honestly.

Oh I think I might know why. I think the base execution engine is Ray. The test you are failing uses a base execution engine of python so it's failing because it throws a warning that "we are defaulting to pandas". You can try running this locally yourself: pytest -n 2 modin/pandas/test/dataframe/test_iter.py --execution=BaseOnPython

I think this is fixed now by matching on the warning message.

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito · 2022-07-01T17:07:42Z

modin/pandas/test/dataframe/test_iter.py

+    assert (
+        "new_attr" not in modin_df
+    ), "New attribute was not correctly added to columns."
+    assert modin_df.new_attr == 7, "Modin attribute value was set incorrectly."


Shouldn't we be checking if it's equal to 6 not 7?

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

…ng columns or creating a new scalar attribute Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

naren-ponder · 2022-07-01T22:10:34Z

modin/pandas/dataframe.py

            self.__setitem__(key, value)
            # Note: return immediately so we don't keep this `key` as dataframe state.
            # `__getattr__` will return the columns not present in `dir(self)`, so we do not need
            # to manually track this state in the `dir`.
            return
-        elif isinstance(value, pandas.Series):


Replacement is sufficient since is_list_like() will always be True for pandas.Series.

pyrito

LGTM! Just had one question

pyrito · 2022-07-01T22:18:47Z

modin/pandas/test/dataframe/test_iter.py

+    with warnings.catch_warnings():
+        warnings.filterwarnings(


Why are we catching the warnings and filtering them out? Is this so ensure that only these warnings are thrown here?

Yes, only the warnings we care about will cause the test to fail if they were raised (they should not!). Other types of warnings we have no control over. They may or may not be raised (possibly depending on the environment), so we want to ignore them (filter them out).

Does it make sense?

Gotcha, makes sense

YarShev · 2022-07-01T22:21:26Z

modin/pandas/test/dataframe/test_iter.py

+    # and adds the provided list as a new attribute and not a column.
+    with pytest.warns(
+        UserWarning,
+        match="Modin doesn't allow columns to be created via a new attribute name - see",


This warning should be probably extended. "see ..." what?

YarShev · 2022-07-01T22:21:37Z

modin/pandas/test/dataframe/test_iter.py

+    with warnings.catch_warnings():
+        warnings.filterwarnings(
+            action="error",
+            message="Modin doesn't allow columns to be created via a new attribute name - see",


Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito

LGTM! @modin-project/modin-core please take a look and give a stamp of approval if all looks good!

YarShev

@helmeleegy, LGTM, thanks!

) Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

FIX-modin-project#4593: Ensure Modin warns when setting columns via a…

e8b5550

…ttributes Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

helmeleegy requested a review from a team as a code owner June 30, 2022 00:56

pyrito reviewed Jun 30, 2022

View reviewed changes

modin/pandas/dataframe.py Outdated Show resolved Hide resolved

jeffreykennethli reviewed Jun 30, 2022

View reviewed changes

helmeleegy added 3 commits June 30, 2022 13:42

Add test and fix logic

2962950

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Add release notes

c000e49

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Fix typo

e5afc1d

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito reviewed Jul 1, 2022

View reviewed changes

helmeleegy added 2 commits June 30, 2022 19:48

Fix and simplify logic

c233318

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Remove unsed import

efb6fe7

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito reviewed Jul 1, 2022

View reviewed changes

Address test failure in CI

2d21543

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito reviewed Jul 1, 2022

View reviewed changes

helmeleegy added 4 commits July 1, 2022 10:21

More addressing of test failure in CI

ded947c

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Only check that a specific warning is not raised when updating existi…

b4654dd

…ng columns or creating a new scalar attribute Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Fix formatting

0ed4d7b

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

Fix formatting

5166bd3

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

helmeleegy force-pushed the issue/4593 branch from 30712e8 to 5166bd3 Compare July 1, 2022 21:04

naren-ponder previously approved these changes Jul 1, 2022

View reviewed changes

pyrito previously approved these changes Jul 1, 2022

View reviewed changes

YarShev reviewed Jul 1, 2022

View reviewed changes

Expand the warning message to match on

79f67c2

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

helmeleegy dismissed stale reviews from pyrito and naren-ponder via 79f67c2 July 1, 2022 22:58

Expand the warning message to match on

5ea1a05

Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

pyrito approved these changes Jul 2, 2022

View reviewed changes

YarShev approved these changes Jul 2, 2022

View reviewed changes

YarShev merged commit 82746b9 into modin-project:master Jul 2, 2022

YarShev pushed a commit that referenced this pull request Sep 6, 2022

FIX-#4593: Ensure Modin warns when setting columns via attributes (#4621

f7671a4

) Signed-off-by: Hazem Elmeleegy <hazem@ponder.io>

RehanSD mentioned this pull request Oct 12, 2022

Calling dataframe column produces an ndarray not a series as in stock pandas #3302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

helmeleegy commented Jun 30, 2022 •

edited

Loading

pyrito left a comment

jeffreykennethli left a comment

codecov bot commented Jun 30, 2022 •

edited

Loading

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

pyrito Jul 1, 2022

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

naren-ponder Jul 1, 2022

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

naren-ponder Jul 1, 2022

pyrito left a comment

pyrito Jul 1, 2022

helmeleegy Jul 1, 2022

helmeleegy Jul 1, 2022

pyrito Jul 1, 2022

YarShev Jul 1, 2022

helmeleegy Jul 2, 2022

YarShev Jul 1, 2022

helmeleegy Jul 2, 2022

pyrito left a comment

YarShev left a comment

		elif key in self and key not in dir(self):
		elif key not in dir(self) and key not in self and not is_list_like(value):

	# Check taht adding a new col via attributes raises warning
	# Check that adding a new col via attributes raises warning

FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

FIX-#4593: Ensure Modin warns when setting columns via attributes #4621

Conversation

helmeleegy commented Jun 30, 2022 • edited Loading

What do these changes do?

pyrito left a comment

Choose a reason for hiding this comment

jeffreykennethli left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 30, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyrito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyrito left a comment

Choose a reason for hiding this comment

YarShev left a comment

Choose a reason for hiding this comment

helmeleegy commented Jun 30, 2022 •

edited

Loading

codecov bot commented Jun 30, 2022 •

edited

Loading