-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1458137 Tests to verify set_index
and reset_index
work as expected
#2138
SNOW-1458137 Tests to verify set_index
and reset_index
work as expected
#2138
Conversation
# Conflicts: # tests/integ/modin/index/test_index_methods.py
snow_idx = pd.Index(native_idx) | ||
|
||
# Test that df.index = new_index works with lazy index. | ||
with SqlCounter(query_count=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reduct to single query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand the reason for the three queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's three queries because:
- 2 queries from
set_index
in the query compiler - one fromself.get_axis_len(axis=0)
and another fromkey.get_axis_len(0)
- 1 query to perform
to_pandas
for comparison
I'm going to see if I can get the number of queries down but right now I'm not sure how to reduce the query count without eliminating the length checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we lazily fail if the lengths aren't correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if key.get_axis_len(0) != self_num_rows:
at least we can drop 1 query by check the two lengths into a single query.
# Conflicts: # tests/integ/modin/index/test_index_methods.py
snow_idx = pd.Index(native_idx) | ||
|
||
# Test that df.index = new_index works with lazy index. | ||
with SqlCounter(query_count=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we lazily fail if the lengths aren't correct?
# Conflicts: # tests/integ/modin/index/test_index_methods.py
Discussed offline and on slack: I am removing the length checks for the index length and series/df length since a join can be performed anyway. |
The length checks are removed and the pandas tests pass @sfc-gh-azhan @sfc-gh-evandenberg @sfc-gh-helmeleegy PTAL when you have a minute, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -80,22 +80,6 @@ def test_set_index_multiindex_columns(snow_df): | |||
) | |||
|
|||
|
|||
@sql_count_checker(query_count=2) | |||
def test_set_index_negative(snow_df, native_df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have test to show current behavior when the lengths do not match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one!
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1458137
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Verifying that
df.index = new_index
is implemented correctly.