-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Reduce merge conflicts #2438
Draft
sfc-gh-lspiegelberg
wants to merge
219
commits into
ls-SNOW-1491199-merge-phase0-server-side
Choose a base branch
from
ls-reduce-merge-conflicts
base: ls-SNOW-1491199-merge-phase0-server-side
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[DRAFT] Reduce merge conflicts #2438
sfc-gh-lspiegelberg
wants to merge
219
commits into
ls-SNOW-1491199-merge-phase0-server-side
from
ls-reduce-merge-conflicts
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…eekday/time and DatetimeIndex.time (#2128)
…2113) Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com> Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…rations. (#2130) Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1418500 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. This PR refactors `Session._resolve_packages` to not have any side-effects. In the old implementation, we relied on this function to make updates to `Session._packages` variable. Now we return the final resulting state after resolving packages and update `Session._packages` only in `Session.add_packages` making `Session._resolve_packages` have no side-effect.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1458127 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Implemented `Index.min` and `Index.max` using the existing Series implementation. ```py >>> idx = pd.Index([3, 2, 1]) >>> idx.max() 3 >>> idx = pd.Index(['c', 'b', 'a']) >>> idx.max() 'c' >>> idx = pd.Index([3, 2, 1]) >>> idx.min() 1 >>> idx = pd.Index(['c', 'b', 'a']) >>> idx.min() 'a' ```
…#2110) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1010216, #2019 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Adding support to specify the following: - REFERSH_MODE - INITIALIZE - CLUSTER BY - TRANSIENT - DATA_RETENTION_TIME_IN_DAYS - MAX_DATA_EXTENSION_TIME_IN_DAYS --------- Co-authored-by: Jamison Rose <Jamison.Rose@snowflake.com>
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1635810 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1625536 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
#2141) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1625468 Support indexing with Timedelta data columns 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Support indexing with Timedelta data columns: - Added sufficient tests cases for indexing including get and set values with the same timedelta type or different types. - To achieve this, I updated several util methods related to transpose, join too. - Also, I need to refactor `SnowparkPandasType` since there are some confusing on `type` object and class there. This pull request includes several changes to enhance support for `Timedelta` and improve type safety in the Snowflake Snowpark Modin plugin. The most important changes are summarized below: ### Enhancements to Timedelta Support: * Added support for indexing with `Timedelta` data columns. (`CHANGELOG.md`: [CHANGELOG.mdR30-L37](diffhunk://#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4edR30-L37)) ### Type Safety Improvements: * Added assertions to check that `data_column_types` and `index_column_types` are instances of `SnowparkPandasType` in `_create_snowflake_quoted_identifier_to_snowpark_pandas_type`. (`src/snowflake/snowpark/modin/plugin/_internal/frame.py`: [src/snowflake/snowpark/modin/plugin/_internal/frame.pyR92-R106](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR92-R106)) * Updated `project_columns` to include an optional `column_types` parameter and ensure its length matches `pandas_labels`. (`src/snowflake/snowpark/modin/plugin/_internal/frame.py`: [[1]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR999) [[2]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeR1013-R1021) [[3]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL1023-R1037) * Modified `set_frame_2d_labels` and `set_frame_2d_positional` to handle `SnowparkPandasColumn` and `SnowparkPandasType` correctly. (`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`: [[1]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2439-R2499) [[2]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2502-R2555) [[3]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2677-R2709) [[4]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR2775) [[5]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR2818-R2837) [[6]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2800-R2848) * Updated `get_item_series_as_single_row_frame` to include `SnowparkPandasType`. (`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`: [src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.pyR3065](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eR3065)) * Enhanced `_create_internal_frame_with_join_or_align_result` to handle `data_column_types` and `index_column_types` during joins. (`src/snowflake/snowpark/modin/plugin/_internal/join_utils.py`: [[1]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R239-R246) [[2]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R265-R291) [[3]](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0R316-R327)
Fixes SNOW-1618349 This PR adds initial support for `pd.merge_asof` -- parameters `by`, `left_by`, `right_by`, `left_index`, `right_index`, `suffixes`, and `tolerance` are not yet supported. Additionally `direction=nearest` is not yet supported but this can be done in a follow-up PR. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…atetimeIndex.normalize (#2143) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1632900, SNOW-1625232 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for Series.dt.normalize and DatetimeIndex.normalize.
…, `is_floating`, and `is_object` (#2146) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1458123 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Implemented Index `is_numeric`, `is_integer`, `is_boolean`, `is_floating`, and `is_object`.
SNOW-1620412 This PR adds support for `astype` by allowing conversion to Timedelta. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1620446 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Adding notebooks with timedelta usecases and negative integration tests --------- Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1625378 Test write timedelta in I/O 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Warn user the timedelta type may be lost during writing back to Snowflake.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…hods (#2135) Fixes SNOW-1558919 Added support for DatetimeIndex ceil, floor and round methods. Raise not implemented error if ambiguous or nonexistent parameter is set.
Fixes SNOW-1636767 Fixes SNOW-1635405 --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com> Co-authored-by: Andong Zhan <andong.zhan@snowflake.com>
SNOW-1638433 Add AssertionError message into Telemetry Also make sure all assert errors from plugin code have error message
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1637932 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. The main purpose for this refactoring is to avoid calling schema when cached type is available, e.g., avoid calling schema when we know the type is Timedelta in binary ops. This pull request includes several changes to the Snowflake Snowpark Modin plugin, focusing on refactoring the way Snowflake types are retrieved and used. The main changes involve replacing the `quoted_identifier_to_snowflake_type` method with a new `get_snowflake_type` method, which simplifies type retrieval by allowing it to accept a single identifier or a list of identifiers. ### Refactoring Type Retrieval * [`src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py`](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1044-L1047): Updated `generate_column_agg_info` to use `get_snowflake_type` instead of `quoted_identifier_to_snowflake_type` for retrieving Snowflake types. [[1]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1044-L1047) [[2]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cR1067-R1070) [[3]](diffhunk://#diff-036a8cce05771914c03d260ad7fb1ab74a1578b353ff5156a65fbe546788872cL1109-R1109) * [`src/snowflake/snowpark/modin/plugin/_internal/binary_op_utils.py`](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL631-R631): Modified `prepare_binop_pairs_between_dataframe_and_dataframe` to use `get_snowflake_type` for type mapping. [[1]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL631-R631) [[2]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL649-R649) [[3]](diffhunk://#diff-dd6dcb779b1e636fa0bc9541f9c0f8f0e18227367ef40a779146c5a6108676ebL671-R671) * [`src/snowflake/snowpark/modin/plugin/_internal/frame.py`](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL360-R417): Added `get_snowflake_type` method and updated existing methods to use it. [[1]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL360-R417) [[2]](diffhunk://#diff-dc59d6fb5be73824e72c1e84ca671739e68c28f651a164e96af7f19a3f732edeL518-R564) ### Updating Indexing Utilities * [`src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py`](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL386-R388): Replaced `quoted_identifier_to_snowflake_type` with `get_snowflake_type` in multiple functions for checking data types. [[1]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL386-R388) [[2]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1241-R1243) [[3]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1655-R1655) [[4]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL1742-R1743) [[5]](diffhunk://#diff-524607b71f519819352dae1467474661e35164db39180ad106e30e7bf2e3265eL2682-R2682) ### Enhancing Join and Where Utilities * [`src/snowflake/snowpark/modin/plugin/_internal/join_utils.py`](diffhunk://#diff-67e1df8ec1e45b14cf51e35c6f67ac04982e41f85580cdfff391e35e025546d0L1070-R1071): Updated `convert_incompatible_types_to_variant` to use `get_snowflake_type` for type mapping. * [`src/snowflake/snowpark/modin/plugin/_internal/where_utils.py`](diffhunk://#diff-ddf55ef822ec0c6f0e5406f498175dce3ae3a2177c036884d7356961d5b15015L19-R26): Refactored `validate_expected_boolean_data_columns` to use `get_snowflake_type` for type validation. ### Compiler Adjustments * [`src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py`](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L473-R477): Refactored several methods to use `get_snowflake_type` for type retrieval and mapping. [[1]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L473-R477) [[2]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L497-R502) [[3]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1500) [[4]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1545-R1546) [[5]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1710-R1711) [[6]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1870-R1877) [[7]](diffhunk://#diff-834ee069919510e7e410c503a8afa455154c40e65389769c08d35b0ec3f8ec03L1924)
…aysinmonth (#2151) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1636789, SNOW-1636790 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for Series.dt.days_in_month/daysinmonth.
…ithout initializing Snowpark pandas (#2097) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1625830 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [x] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Calling `to_snowpark_pandas` on a Snowpark Python DataFrame without first performing `import snowflake.snowpark.modin.plugin` currently raises an error (see attached JIRA for exact reproduction). This was not the case in previous releases, and this PR fixes internal imports such that performing this operation implicitly initializes Snowpark pandas. After this PR, when `to_snowpark_pandas` is called without explicitly initializing Snowpark pandas, one of two things happens: 1. If modin is not installed, Snowpark pandas will surface the following error: ``` ModuleNotFoundError: Modin is not installed. Run `pip install "snowflake-snowpark-python[modin]"` to resolve. ``` This is the same error as if the user had tried to `import snowflake.snowpark.modin.plugin` without installing the modin dependency. 2. If modin is installed, Snowpark pandas will implicitly initialize Snowpark pandas by internally running `import snowflake.snowpark.modin.plugin`. If the user performs `import modin.pandas as pd` afterwards, the modin namespace will be set up with Snowpark pandas behavior.
…pected (#2138) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1458137 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Verifying that `df.index = new_index` is implemented correctly.
… in query generator (#2387) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1706295 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. The snowflake plan query generation is not used in actual generation, but only used by the testing. This pr does the following: 1) remove the SnowflakePlan overwrite in the code and test the query generator in a different way 2) add sql counter check for the test
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1678113] 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. In the reported case, the unstack calls pivot table underneath, and customer end with about 2000 columns after pivot. The pivot it's self took about 1~2 seconds to finish, but the after pivot it start looping over all columns and calling append_columns in each loop here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py at 272e4e1 · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704). The append columns eventually calls select with all existing projected columns, and the a check is performed on each column here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py at main · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616) our profiling shows that each check too about 0.015s, and 2000 check is about 30seconds, and the snowpark select took about 0.5 seconds since it need to perform sql simplification. Since there is an outer loop of 2000, overall it could take about (30.5*2000)s , which is close to 16 h. In order to handle the issue, we did the following: 1) use "*" for append_columns to avoid checking for each columns 2) instead of calling append_column in each loop, try to get all columns to append and only call append_columns once. with manual testing, the customer case now took about 8.002492904663086 s to finish the unstack reported TODO: add this to our performance benchmark https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
SNOW-1445732 This PR implements `Series.items()` by using `SnowparkPandasRowPartitionIterator` in much the same manner as `DataFrame.iterrows()`. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com> Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…2396) SNOW-1659098 - Added support for errors='ignore' in pd.to_datetime
…ilation stage is applied (#2385) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1703599 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. We shouldn't call session._table_exists inside resolve, but we can call it before resolve.
… Series.tz_localize (#2398) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677892, SNOW-1677897 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_localize and Series.tz_localize.
…Series.tz_convert (#2399) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677888, SNOW-1677890 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_convert and Series.tz_convert.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1566363 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1. add status for number of selectStatement with complexity merged 2.add status for number of cte created during repeated subquery elimination
…plan node (#2407) 1. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 2. Please describe how your code solves the related issue. 1) fix a bug in test_query_generator that is was repeated test the query generation on the same plan node 2) use a copy for the alias_map to to use during resolve to avoid unexpected update of the alias map
…value (#2213) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1649172 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. When doing `df.loc[x] = series`, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns. --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
…is already closed (#2409) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1727163 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. It's possible that the session is already closed before garbage collection kicks in, where we should avoid sending drop table sql and eliminate the warning
…read_snowflake` tests (#2408) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1726720 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Use `session.sql.to_pandas()` instead of `native_pd.DataFrame(session.sql.collect)` to generate expected DataFrames when testing `read_snowflake`, so that if the expected DataFrame is empty, but has metadata, e.g. columns, that data is passed on to the expected DataFrame.
…atement instead of SnowflakeTable (#2411) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1727512 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. It should be the same as session.table(...).select_statement. Otherwise, we will cache metadata on wrong source_plan in the future.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1690717 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This is the first PR to support Snowpark Python functions in pandas apply. It only introduce `sin` as the first example.
SNOW-1727534 This PR adds support for `Resampler.indices`. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com> Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
…na and SeriesGroupBy.fillna (#2417) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1728471 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Fix changelog entry placement for DataFrameGroupBy.fillna and SeriesGroupBy.fillna.
…2403) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1708573 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1) Add sql_counter check for test points in large query breakdown 2) Refine the sql_counter check for test_cte, and allow it to run on env without pandas
…gs (#2422) Expands on the the earlier support for ufuncs by making the ufunc mapping clearer, and adding telemetry support. Another CL will add log support. * Add support for subtract, multiply, divide, and true_divide ufuncs * Fill out the list of possible ufuncs from https://numpy.org/doc/stable/reference/ufuncs.html * Unimplemented ufuncs are marked with NotImplemented for visibility * Add __array_ufunc__ to telemetry to get some usage statistics --------- Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…2428) Fixes SNOW-1730923 Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. --------- Co-authored-by: Adam Ling <adam.ling@snowflake.com>
sfc-gh-lspiegelberg
changed the title
[DRAFT] Ls reduce merge conflicts
[DRAFT] Reduce merge conflicts
Oct 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Branch to reduce merge conflicts.