SNOW-1418533 handle dropping temp objects in post actions #2405

sfc-gh-aalam · 2024-10-07T00:23:56Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1418533
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe
Please describe how your code solves the related issue.

This PR updates the way we generate name for temp objects. This is done to protect unexpected behavior when query generated by snowflake plan is as follows:
i. Create temp object
ii. Build sql query involving the temp object
iii. Drop temp object in post_actions.

When a dataframe with queries that create the same temp object and drop the same temp object, when run using multiple threads might cause a failure if one thread has not started working on temp object but a different thread has already dropped the temp object.

In this PR, we update temp name creation for those objects that are dropped in a subsequent post action. We add temp_name_placeholder in Query class and generate a temp object name at query submission stage similar to query_id_placeholder.

…iables-thread-safe

…ad-safe' into aalam-SNOW-1418523-handle-post-actions

github-actions · 2024-10-08T00:08:31Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

sfc-gh-jdu · 2024-10-10T20:26:52Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

@@ -1673,6 +1678,7 @@ def __init__(
        *,
        query_id_place_holder: Optional[str] = None,
        is_ddl_on_temp_object: bool = False,
+        temp_name_place_holder: Optional[Tuple[str, TempObjectType]] = None,


is it possible that you can have two temp_name_place_holders, like joining two dataframes with temp tables?

a join statement when resolve can contain multiple temp_name_place_holders, but currently, when we generate temp object like temp tables or temp file format, we create them one at a time.

github-actions · 2024-10-17T18:18:23Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-17T18:21:58Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-18T23:04:33Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

sfc-gh-yzou · 2024-10-21T18:40:50Z

CHANGELOG.md

@@ -55,6 +55,8 @@

 ### Snowpark Python API Updates

+- Updated `Session` class to be thread-safe. This allows concurrent dataframe transformations, dataframe actions, UDF and store procedure registration, and concurrent file uploads.


this message is not clear about the change here. Please update the change log to reflect the actual change in this pr, and the reason to add this to help making the session class to be safe

sfc-gh-yzou · 2024-10-21T18:42:05Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

@@ -731,7 +730,7 @@ def large_local_relation_plan(
        source_plan: Optional[LogicalPlan],
        schema_query: Optional[str],
    ) -> SnowflakePlan:
-        temp_table_name = random_name_for_temp_object(TempObjectType.TABLE)
+        temp_table_name = f"temp_name_placeholder_{generate_random_alphanumeric()}"


isn't that going to impact the df.queries? not just people will see a different table, but is people going to see a non-sense temp table name called "temp_name_placeholder_xxx"?

Is large_local_relation_plan it only case we will have internally created query scoped temp table?

A more efficient way for those is probably making the temp table session scoped, and let @sfc-gh-jdu 's temp table clean up to handle the table drop off, i guess we can do this as future improvement

sfc-gh-yzou · 2024-10-21T18:49:33Z

src/snowflake/snowpark/_internal/server_connection.py

@@ -645,10 +648,20 @@ def get_result_set(
                final_queries = []
                last_place_holder = None
                for q in main_queries:
+                    query = q.sql
+                    if q.temp_name_place_holder:


instead of doing it here, might be better to do it at the execute_queries when calling plan_compiler to get the final query, so that this can be applied to all user facing queries

sfc-gh-yzou · 2024-10-21T18:50:25Z

tests/integ/test_multithreading.py

+    run=False,
+)
+def test_temp_name_placeholder_for_sync(threadsafe_session):
+    from snowflake.snowpark._internal.analyzer import analyzer


I just recall, do we have any multi-threading tests with the new query compilation stage?

we do not at the moment

@sfc-gh-aalam can you get a ticket to add the tests? we should make sure all things are tested out

https://snowflakecomputing.atlassian.net/browse/SNOW-1758742

sfc-gh-yzou · 2024-10-21T18:52:59Z

src/snowflake/snowpark/async_job.py

@@ -405,8 +407,12 @@ def result(
            else:
                raise ValueError(f"{async_result_type} is not supported")
        for action in self._post_actions:
+            query = action.sql
+            if self._placeholders:


why do we need a special handling for async job ?

because when dropping temp objects after async job is done, we need to drop the correct temp object. This information needs to be propagated to async job as well.

github-actions · 2024-10-21T22:31:39Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-21T22:43:47Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-21T23:15:52Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-21T23:23:32Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

sfc-gh-yzou · 2024-10-22T18:57:19Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

@@ -294,7 +294,30 @@ def execution_queries(self) -> Dict["PlanQueryType", List["Query"]]:
        from snowflake.snowpark._internal.compiler.plan_compiler import PlanCompiler

        compiler = PlanCompiler(self)
-        return compiler.compile()
+        compiled_queries = compiler.compile()


actually, let's further move this to the compile() function, the could keep the code clear, where the compile will be responsible to compile the plan into a set of executable queries.

You can do this post processing at the end right before we return the final query, and extract the following as a separate function

sfc-gh-yzou · 2024-10-22T18:58:02Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

+        compiled_queries = compiler.compile()
+
+        if self.session._conn._thread_safe_session_enabled:
+            placeholders = {}


please add comment here about what are we doing here, and why are we doing this

sfc-gh-yzou · 2024-10-22T19:01:37Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

@@ -1719,6 +1765,13 @@ def __init__(
            if query_id_place_holder
            else f"query_id_place_holder_{generate_random_alphanumeric()}"
        )
+        # This is a temporary workaround to handle the case when a snowflake plan is created


i am not sure if that is a temporary workaround. The sessions scoped temp table could be a long term solution, but it may have other draw back also, you can mention this as an alternative solution below

sfc-gh-yzou · 2024-10-22T19:02:48Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

+        # in the following way in a multi-threaded environment:
+        # 1. Create a temp object
+        # 2. Use the temp object in a query
+        # 3. Drop the temp object


this comment explained the scenario that needs that field, but not really explained how this field can be used to solve the problem, can you add some more detailed comment here

sfc-gh-yzou · 2024-10-22T19:04:56Z

src/snowflake/snowpark/_internal/server_connection.py

@@ -666,6 +669,7 @@ def get_result_set(
                    num_statements=len(main_queries),
                    params=params,
                    ignore_results=ignore_results,
+                    async_post_actions=post_actions,


is that fixing a bug?

yes indeed.

https://github.com/snowflakedb/snowpark-python/pull/2405/files#diff-a7f9cc0095b005bbd89b9100dc2d75b6b26c7ad70923bc502917ee46c1989bb4R449-R452

earlier it would pull post_actions from snowflake plan which may not have drop table commands coming from large query breakdown

sfc-gh-yzou · 2024-10-22T19:05:24Z

src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py

@@ -1711,6 +1756,7 @@ def __init__(
        *,
        query_id_place_holder: Optional[str] = None,
        is_ddl_on_temp_object: bool = False,
+        temp_name_place_holder: Optional[Tuple[str, TempObjectType]] = None,


let's call this temp_obj_name_placeholder to the more clear

sfc-gh-yzou · 2024-10-22T19:06:30Z

CHANGELOG.md

@@ -15,7 +15,7 @@

 #### Improvements

- Disables sql simplification when sort is performed after limit. 
+- Disables sql simplification when sort is performed after limit.


@sfc-gh-aalam do we have a ticket to track the documentation for multi-threading session object, i think you will want to mention this behavior in the doc on release

created: https://snowflakecomputing.atlassian.net/browse/SNOW-1758759

sfc-gh-yzou · 2024-10-22T19:07:48Z

tests/integ/test_multithreading.py

+    run=False,
+)
+def test_temp_name_placeholder_for_sync(threadsafe_session):
+    from snowflake.snowpark._internal.analyzer import analyzer


@sfc-gh-aalam can you get a ticket to add the tests? we should make sure all things are tested out

github-actions · 2024-10-22T20:40:46Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

github-actions · 2024-10-23T04:30:18Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

sfc-gh-aalam · 2024-10-23T16:54:10Z

https://ci-dev-142.int.snowflakecomputing.com/job/PythonStoredProcBuildPrecommitTest/1412/

github-actions · 2024-10-23T18:00:13Z

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

sfc-gh-aalam added 19 commits September 11, 2024 14:04

add locks

fb4ecf6

Merge branch 'main' into aalam-SNOW-1418523-make-internal-session-var…

f720701

…iables-thread-safe

Merge branch 'main' into aalam-SNOW-1418523-make-internal-session-var…

eca13dc

…iables-thread-safe

Merge branch 'main' into aalam-SNOW-1418523-make-internal-session-var…

96949be

…iables-thread-safe

SNOW-1418523 make analyzer server connection thread safe (#2282)

5f140ab

SNOW-1418523: concurrent file operations (#2288)

0624824

SNOW-1418523: make udf and sproc registration thread safe (#2289)

42d6e19

merge with main

801ad6e

Merge branch 'main' into aalam-SNOW-1418523-make-internal-session-var…

c7fa3ae

…iables-thread-safe

SNOW-1663726 make session config updates thread safe (#2302)

5672a1d

SNOW-1663726 make temp table cleaner thread safe (#2309)

bd0528d

SNOW-1642189: collect telemetry about concurrency usage (#2316)

39a07d4

SNOW-1546090 add merge gate for future thread safe updates (#2323)

4d4e257

add plan-builder that was accidentally removed

66374ee

changelog updates

8e57d95

create hyperlink for doc

822e3f8

handle dropping of temp objects in post actions

e4d2147

Merge branch 'aalam-SNOW-1418523-make-internal-session-variables-thre…

067b3b8

…ad-safe' into aalam-SNOW-1418523-handle-post-actions

add test

71ff4a2

sfc-gh-aalam changed the title ~~SNOW-1418523 handle dropping temp objects in post actions~~ SNOW-1418533 handle dropping temp objects in post actions Oct 7, 2024

sfc-gh-aalam added 2 commits October 7, 2024 16:32

reduce num workers

fea19d2

skip local test

86d760f

sfc-gh-aalam marked this pull request as ready for review October 8, 2024 19:56

sfc-gh-aalam requested a review from a team as a code owner October 8, 2024 19:56

sfc-gh-aalam requested review from sfc-gh-jdu, sfc-gh-yixie and sfc-gh-jrose and removed request for a team October 8, 2024 19:56

sfc-gh-jdu reviewed Oct 10, 2024

View reviewed changes

sfc-gh-aalam requested a review from a team as a code owner October 14, 2024 21:32

sfc-gh-aalam requested review from sfc-gh-yzou and sfc-gh-joshi October 14, 2024 21:32

merge

706ed6f

fix merge

df25d1d

merge

75ea846

sfc-gh-yzou reviewed Oct 21, 2024

View reviewed changes

address feedback

7092dd6

fix test

2236ad9

fix test

4b0ef03

Merge branch 'main' into aalam-SNOW-1418523-handle-post-actions

eefb26c

sfc-gh-yzou reviewed Oct 22, 2024

View reviewed changes

sfc-gh-aalam added 2 commits October 22, 2024 13:32

address comments

ee1ac8b

Merge branch 'main' into aalam-SNOW-1418523-handle-post-actions

3508b32

skip fileIO in localfs sp

3f829b5

Merge branch 'main' into aalam-SNOW-1418523-handle-post-actions

d7aa1f8

sfc-gh-aalam requested a review from a team as a code owner October 23, 2024 17:59

sfc-gh-yzou approved these changes Oct 23, 2024

View reviewed changes

		@@ -55,6 +55,8 @@

		### Snowpark Python API Updates

		- Updated `Session` class to be thread-safe. This allows concurrent dataframe transformations, dataframe actions, UDF and store procedure registration, and concurrent file uploads.

SNOW-1418533 handle dropping temp objects in post actions #2405

Are you sure you want to change the base?

SNOW-1418533 handle dropping temp objects in post actions #2405

Conversation

sfc-gh-aalam commented Oct 7, 2024 • edited Loading

github-actions bot commented Oct 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 17, 2024

github-actions bot commented Oct 17, 2024

github-actions bot commented Oct 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2024

github-actions bot commented Oct 21, 2024

github-actions bot commented Oct 21, 2024

github-actions bot commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 22, 2024

github-actions bot commented Oct 23, 2024

sfc-gh-aalam commented Oct 23, 2024

github-actions bot commented Oct 23, 2024

sfc-gh-aalam commented Oct 7, 2024 •

edited

Loading