Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Reduce merge conflicts #2438

Draft
wants to merge 219 commits into
base: ls-SNOW-1491199-merge-phase0-server-side
Choose a base branch
from

Conversation

sfc-gh-lspiegelberg
Copy link
Contributor

@sfc-gh-lspiegelberg sfc-gh-lspiegelberg commented Oct 11, 2024

Branch to reduce merge conflicts.

sfc-gh-helmeleegy and others added 30 commits August 21, 2024 01:02
…2113)

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…rations. (#2130)

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1418500

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

This PR refactors `Session._resolve_packages` to not have any
side-effects. In the old implementation, we relied on this function to
make updates to `Session._packages` variable. Now we return the final
resulting state after resolving packages and update `Session._packages`
only in `Session.add_packages` making `Session._resolve_packages` have
no side-effect.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1458127

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Implemented `Index.min` and `Index.max` using the existing Series
implementation.

```py
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3

>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'

>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1

>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
```
…#2110)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1010216, #2019

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Adding support to specify the following:
- REFERSH_MODE
- INITIALIZE
- CLUSTER BY
- TRANSIENT
- DATA_RETENTION_TIME_IN_DAYS
- MAX_DATA_EXTENSION_TIME_IN_DAYS

---------

Co-authored-by: Jamison Rose <Jamison.Rose@snowflake.com>
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1635810

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1625536

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
#2141)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   SNOW-1625468 Support indexing with Timedelta data columns

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Support indexing with Timedelta data columns:

- Added sufficient tests cases for indexing including get and set values
with the same timedelta type or different types.
- To achieve this, I updated several util methods related to transpose,
join too.
- Also, I need to refactor `SnowparkPandasType` since there are some
confusing on `type` object and class there.

This pull request includes several changes to enhance support for
`Timedelta` and improve type safety in the Snowflake Snowpark Modin
plugin. The most important changes are summarized below:

### Enhancements to Timedelta Support:
* Added support for indexing with `Timedelta` data columns.
(`CHANGELOG.md`:
[CHANGELOG.mdR30-L37](diffhunk://#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4edR30-L37))

### Type Safety Improvements:
* Added assertions to check that `data_column_types` and
`index_column_types` are instances of `SnowparkPandasType` in
`_create_snowflake_quoted_identifier_to_snowpark_pandas_type`.
(`src/snowflake/snowpark/modin/plugin/_internal/frame.py`:
[src/snowflake/snowpark/modin/plugin/_internal/frame.pyR92-R106](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR92-R106))
* Updated `project_columns` to include an optional `column_types`
parameter and ensure its length matches `pandas_labels`.
(`src/snowflake/snowpark/modin/plugin/_internal/frame.py`:
[[1]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR999)
[[2]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR1013-R1021)
[[3]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL1023-R1037)
* Modified `set_frame_2d_labels` and `set_frame_2d_positional` to handle
`SnowparkPandasColumn` and `SnowparkPandasType` correctly.
(`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`:
[[1]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2439-R2499)
[[2]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2502-R2555)
[[3]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2677-R2709)
[[4]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR2775)
[[5]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR2818-R2837)
[[6]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2800-R2848)
* Updated `get_item_series_as_single_row_frame` to include
`SnowparkPandasType`.
(`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`:
[src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.pyR3065](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR3065))
* Enhanced `_create_internal_frame_with_join_or_align_result` to handle
`data_column_types` and `index_column_types` during joins.
(`src/snowflake/snowpark/modin/plugin/_internal/join_utils.py`:
[[1]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R239-R246)
[[2]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R265-R291)
[[3]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R316-R327)
Fixes SNOW-1618349

This PR adds initial support for `pd.merge_asof` -- parameters `by`,
`left_by`, `right_by`, `left_index`, `right_index`, `suffixes`, and
`tolerance` are not yet supported. Additionally `direction=nearest` is
not yet supported but this can be done in a follow-up PR.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…atetimeIndex.normalize (#2143)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1632900, SNOW-1625232

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for Series.dt.normalize and DatetimeIndex.normalize.
…, `is_floating`, and `is_object` (#2146)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1458123

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Implemented Index `is_numeric`, `is_integer`, `is_boolean`,
`is_floating`, and `is_object`.
SNOW-1620412

This PR adds support for `astype` by allowing conversion to Timedelta.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1620446

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Adding notebooks with timedelta usecases and negative integration tests

---------

Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1625378 Test write timedelta in I/O

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Warn user the timedelta type may be lost during writing back to
Snowflake.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…hods (#2135)

Fixes SNOW-1558919

Added support for DatetimeIndex ceil, floor and round methods. Raise not
implemented error if ambiguous or nonexistent parameter is set.
Fixes SNOW-1636767 
Fixes SNOW-1635405

---------

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Andong Zhan <andong.zhan@snowflake.com>
SNOW-1638433 Add AssertionError message into Telemetry

Also make sure all assert errors from plugin code have error message
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1637932

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

The main purpose for this refactoring is to avoid calling schema when
cached type is available, e.g., avoid calling schema when we know the
type is Timedelta in binary ops.

This pull request includes several changes to the Snowflake Snowpark
Modin plugin, focusing on refactoring the way Snowflake types are
retrieved and used. The main changes involve replacing the
`quoted_identifier_to_snowflake_type` method with a new
`get_snowflake_type` method, which simplifies type retrieval by allowing
it to accept a single identifier or a list of identifiers.

### Refactoring Type Retrieval

*
[`src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py`](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1044-L1047):
Updated `generate_column_agg_info` to use `get_snowflake_type` instead
of `quoted_identifier_to_snowflake_type` for retrieving Snowflake types.
[[1]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1044-L1047)
[[2]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cR1067-R1070)
[[3]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1109-R1109)

*
[`src/snowflake/snowpark/modin/plugin/_internal/binary_op_utils.py`](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL631-R631):
Modified `prepare_binop_pairs_between_dataframe_and_dataframe` to use
`get_snowflake_type` for type mapping.
[[1]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL631-R631)
[[2]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL649-R649)
[[3]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL671-R671)

*
[`src/snowflake/snowpark/modin/plugin/_internal/frame.py`](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL360-R417):
Added `get_snowflake_type` method and updated existing methods to use
it.
[[1]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL360-R417)
[[2]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL518-R564)

### Updating Indexing Utilities

*
[`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL386-R388):
Replaced `quoted_identifier_to_snowflake_type` with `get_snowflake_type`
in multiple functions for checking data types.
[[1]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL386-R388)
[[2]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1241-R1243)
[[3]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1655-R1655)
[[4]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1742-R1743)
[[5]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2682-R2682)

### Enhancing Join and Where Utilities

*
[`src/snowflake/snowpark/modin/plugin/_internal/join_utils.py`](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0L1070-R1071):
Updated `convert_incompatible_types_to_variant` to use
`get_snowflake_type` for type mapping.

*
[`src/snowflake/snowpark/modin/plugin/_internal/where_utils.py`](diffhunk://#diff-ddf55ef822ec0c6f0e5406f498175dce3ae3a2177c036884d7356961d5b15015L19-R26):
Refactored `validate_expected_boolean_data_columns` to use
`get_snowflake_type` for type validation.

### Compiler Adjustments

*
[`src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py`](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L473-R477):
Refactored several methods to use `get_snowflake_type` for type
retrieval and mapping.
[[1]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L473-R477)
[[2]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L497-R502)
[[3]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1500)
[[4]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1545-R1546)
[[5]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1710-R1711)
[[6]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1870-R1877)
[[7]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1924)
…aysinmonth (#2151)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1636789, SNOW-1636790

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for Series.dt.days_in_month/daysinmonth.
…ithout initializing Snowpark pandas (#2097)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1625830

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [x] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Calling `to_snowpark_pandas` on a Snowpark Python DataFrame without
first performing `import snowflake.snowpark.modin.plugin` currently
raises an error (see attached JIRA for exact reproduction). This was not
the case in previous releases, and this PR fixes internal imports such
that performing this operation implicitly initializes Snowpark pandas.

After this PR, when `to_snowpark_pandas` is called without explicitly
initializing Snowpark pandas, one of two things happens:
1. If modin is not installed, Snowpark pandas will surface the following
error:
```
ModuleNotFoundError: Modin is not installed. Run `pip install "snowflake-snowpark-python[modin]"` to resolve.
```
This is the same error as if the user had tried to `import
snowflake.snowpark.modin.plugin` without installing the modin
dependency.

2. If modin is installed, Snowpark pandas will implicitly initialize
Snowpark pandas by internally running `import
snowflake.snowpark.modin.plugin`. If the user performs `import
modin.pandas as pd` afterwards, the modin namespace will be set up with
Snowpark pandas behavior.
…pected (#2138)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1458137

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Verifying that `df.index = new_index` is implemented correctly.
sfc-gh-yzou and others added 29 commits October 3, 2024 10:11
… in query generator (#2387)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1706295

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
The snowflake plan query generation is not used in actual generation,
but only used by the testing. This pr does the following:
1) remove the SnowflakePlan overwrite in the code and test the query
generator in a different way
2) add sql counter check for the test
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1678113]

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
In the reported case, the unstack calls pivot table underneath, and
customer end with about 2000 columns after pivot. The pivot it's self
took about 1~2 seconds to finish, but the after pivot it start looping
over all columns and calling append_columns in each loop here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py
at 272e4e1 ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704).
The append columns eventually calls select with all existing projected
columns, and the a check is performed on each column here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py
at main ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616)

our profiling shows that each check too about 0.015s, and 2000 check is
about 30seconds, and the snowpark select took about 0.5 seconds since it
need to perform sql simplification. Since there is an outer loop of
2000, overall it could take about (30.5*2000)s , which is close to 16 h.

In order to handle the issue, we did the following:
1) use "*" for append_columns to avoid checking for each columns
2) instead of calling append_column in each loop, try to get all columns
to append and only call append_columns once.
with manual testing, the customer case now took about 8.002492904663086
s to finish the unstack reported

TODO:
add this to our performance benchmark
https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
SNOW-1445732

This PR implements `Series.items()` by using
`SnowparkPandasRowPartitionIterator` in much the same manner as
`DataFrame.iterrows()`.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…ilation stage is applied (#2385)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1703599

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

We shouldn't call session._table_exists inside resolve, but we can call
it before resolve.
… Series.tz_localize (#2398)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677892, SNOW-1677897

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_localize and Series.tz_localize.
…Series.tz_convert (#2399)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677888, SNOW-1677890

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_convert and Series.tz_convert.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1566363

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

  1. add status for number of selectStatement with complexity merged
2.add status for number of cte created during repeated subquery
elimination
…plan node (#2407)

1. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

2. Please describe how your code solves the related issue.

1) fix a bug in test_query_generator that is was repeated test the query
generation on the same plan node
2) use a copy for the alias_map to to use during resolve to avoid
unexpected update of the alias map
…value (#2213)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1649172

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

When doing `df.loc[x] = series`, an error occurs because series does not
have the same number of columns as the dataframe being set. Instead, the
Series should be transposed and set, regardless of whether it has an
equal number of rows as the dataframe has columns.

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
…is already closed (#2409)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1727163

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

It's possible that the session is already closed before garbage
collection kicks in, where we should avoid sending drop table sql and
eliminate the warning
…read_snowflake` tests (#2408)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1726720

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Use `session.sql.to_pandas()` instead of
`native_pd.DataFrame(session.sql.collect)` to generate expected
DataFrames when testing `read_snowflake`, so that if the expected
DataFrame is empty, but has metadata, e.g. columns, that data is passed
on to the expected DataFrame.
…atement instead of SnowflakeTable (#2411)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1727512

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

It should be the same as session.table(...).select_statement. Otherwise,
we will cache metadata on wrong source_plan in the future.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   SNOW-1690717

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This is the first PR to support Snowpark Python functions in pandas
apply. It only introduce `sin` as the first example.
SNOW-1727534

This PR adds support for `Resampler.indices`.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
…na and SeriesGroupBy.fillna (#2417)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1728471

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Fix changelog entry placement for DataFrameGroupBy.fillna and
SeriesGroupBy.fillna.
…2403)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1708573

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

1) Add sql_counter check for test points in large query breakdown
2) Refine the sql_counter check for test_cte, and allow it to run on env
without pandas
…gs (#2422)

Expands on the the earlier support for ufuncs by making the ufunc
mapping clearer, and adding telemetry support. Another CL will add log
support.

* Add support for subtract, multiply, divide, and true_divide ufuncs
* Fill out the list of possible ufuncs from
https://numpy.org/doc/stable/reference/ufuncs.html
* Unimplemented ufuncs are marked with NotImplemented for visibility
* Add __array_ufunc__ to telemetry to get some usage statistics

---------

Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…2428)

Fixes SNOW-1730923

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

---------

Co-authored-by: Adam Ling <adam.ling@snowflake.com>
@sfc-gh-lspiegelberg sfc-gh-lspiegelberg changed the title [DRAFT] Ls reduce merge conflicts [DRAFT] Reduce merge conflicts Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.