Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1458137 Tests to verify set_index and reset_index work as expected #2138

Merged
merged 10 commits into from
Aug 27, 2024

Conversation

sfc-gh-vbudati
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1458137

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

Verifying that df.index = new_index is implemented correctly.

@sfc-gh-vbudati sfc-gh-vbudati added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Aug 21, 2024
@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team as a code owner August 21, 2024 18:29
snow_idx = pd.Index(native_idx)

# Test that df.index = new_index works with lazy index.
with SqlCounter(query_count=3):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reduct to single query?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand the reason for the three queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's three queries because:

  • 2 queries from set_index in the query compiler - one from self.get_axis_len(axis=0) and another from key.get_axis_len(0)
  • 1 query to perform to_pandas for comparison

I'm going to see if I can get the number of queries down but right now I'm not sure how to reduce the query count without eliminating the length checks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we lazily fail if the lengths aren't correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if key.get_axis_len(0) != self_num_rows:
at least we can drop 1 query by check the two lengths into a single query.

# Conflicts:
#	tests/integ/modin/index/test_index_methods.py
snow_idx = pd.Index(native_idx)

# Test that df.index = new_index works with lazy index.
with SqlCounter(query_count=3):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we lazily fail if the lengths aren't correct?

tests/integ/modin/index/test_index_methods.py Outdated Show resolved Hide resolved
@sfc-gh-vbudati
Copy link
Contributor Author

Discussed offline and on slack: I am removing the length checks for the index length and series/df length since a join can be performed anyway.

@sfc-gh-vbudati
Copy link
Contributor Author

The length checks are removed and the pandas tests pass @sfc-gh-azhan @sfc-gh-evandenberg @sfc-gh-helmeleegy PTAL when you have a minute, thanks!

Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -80,22 +80,6 @@ def test_set_index_multiindex_columns(snow_df):
)


@sql_count_checker(query_count=2)
def test_set_index_negative(snow_df, native_df):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test to show current behavior when the lengths do not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one!

@sfc-gh-vbudati sfc-gh-vbudati merged commit 0a9bbc7 into main Aug 27, 2024
34 of 35 checks passed
@sfc-gh-vbudati sfc-gh-vbudati deleted the vbudati/SNOW-1458137-lazy-set-index-reset-index branch August 27, 2024 01:07
@github-actions github-actions bot locked and limited conversation to collaborators Aug 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants