Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1690923 Encode Intervals #2454

Open
wants to merge 5 commits into
base: ls-SNOW-1491199-merge-phase0-server-side
Choose a base branch
from

Conversation

sfc-gh-vbudati
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1690923

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

Encoding the Interval object in AST. Relevant test is interval.test.

@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team October 15, 2024 00:16
@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team as a code owner October 15, 2024 00:16
…NOW-1690923-encode-intervals

# Conflicts:
#	tests/ast/data/DataFrame.agg.test
#	tests/ast/data/DataFrame.collect.test
#	tests/ast/data/DataFrame.count.test
#	tests/ast/data/DataFrame.count2.test
#	tests/ast/data/DataFrame.create_or_replace.test
#	tests/ast/data/DataFrame.cross_join.lsuffix.test
#	tests/ast/data/DataFrame.cross_join.rsuffix.test
#	tests/ast/data/DataFrame.cross_join.suffix.test
#	tests/ast/data/DataFrame.describe.test
#	tests/ast/data/DataFrame.flatten.test
#	tests/ast/data/DataFrame.indexers.test
#	tests/ast/data/DataFrame.join.inner.column.test
#	tests/ast/data/DataFrame.join.inner.column_list.test
#	tests/ast/data/DataFrame.join.inner.column_list_predicate.test
#	tests/ast/data/DataFrame.join.inner.predicate.test
#	tests/ast/data/DataFrame.join.inner.predicate_rsuffix.test
#	tests/ast/data/DataFrame.join.left_outer.column.test
#	tests/ast/data/DataFrame.join.right_outer.predicate.test
#	tests/ast/data/DataFrame.natural_join.test
#	tests/ast/data/DataFrame.pivot.test
#	tests/ast/data/DataFrame.select_expr.test
#	tests/ast/data/DataFrame.stat.test
#	tests/ast/data/DataFrame.to_df.test
#	tests/ast/data/DataFrame.to_local_iterator.test
#	tests/ast/data/DataFrame.to_pandas.test
#	tests/ast/data/DataFrame.to_pandas_batch.test
#	tests/ast/data/DataFrame.unpivot.test
#	tests/ast/data/DataFrame.write.test
#	tests/ast/data/Dataframe.cube.test
#	tests/ast/data/Dataframe.distinct.test
#	tests/ast/data/Dataframe.drop_duplicates.test
#	tests/ast/data/Dataframe.filter.test
#	tests/ast/data/Dataframe.getitem.test
#	tests/ast/data/Dataframe.group_by.test
#	tests/ast/data/Dataframe.group_by_grouping_sets.test
#	tests/ast/data/Dataframe.join.asof.test
#	tests/ast/data/Dataframe.rollup.test
#	tests/ast/data/Dataframe.with_col_fns.test
#	tests/ast/data/DataframeNaFunctions.test
#	tests/ast/data/RelationalGroupedDataFrame.agg.test
#	tests/ast/data/RelationalGroupedDataFrame.test
#	tests/ast/data/Session.call.test
#	tests/ast/data/Session.create_dataframe.test
#	tests/ast/data/Session.flatten.test
#	tests/ast/data/Session.table_function.test
#	tests/ast/data/Table.init.test
#	tests/ast/data/case_when.test
#	tests/ast/data/col_alias.test
#	tests/ast/data/col_asc.test
#	tests/ast/data/col_between.test
#	tests/ast/data/col_binops.test
#	tests/ast/data/col_bitops.test
#	tests/ast/data/col_cast.test
#	tests/ast/data/col_cast_coll.test
#	tests/ast/data/col_desc.test
#	tests/ast/data/col_getitem.test
#	tests/ast/data/col_in_.test
#	tests/ast/data/col_literal.test
#	tests/ast/data/col_null_nan.test
#	tests/ast/data/col_rbinops.test
#	tests/ast/data/col_star.test
#	tests/ast/data/col_string.test
#	tests/ast/data/col_try_cast.test
#	tests/ast/data/col_udf.test
#	tests/ast/data/col_unary_ops.test
#	tests/ast/data/df_alias.test
#	tests/ast/data/df_analytics_functions.test
#	tests/ast/data/df_col.test
#	tests/ast/data/df_drop.test
#	tests/ast/data/df_except.test
#	tests/ast/data/df_first.test
#	tests/ast/data/df_intersect.test
#	tests/ast/data/df_limit.test
#	tests/ast/data/df_random_split.test
#	tests/ast/data/df_sample.test
#	tests/ast/data/df_sort.test
#	tests/ast/data/df_union.test
#	tests/ast/data/functions.test
#	tests/ast/data/select.test
#	tests/ast/data/session.read.test
#	tests/ast/data/session.sql.test
#	tests/ast/data/session_generator.test
#	tests/ast/data/session_range.test
#	tests/ast/data/session_table_dq_abs_l.test
#	tests/ast/data/session_table_dq_abs_s.test
#	tests/ast/data/session_table_dq_rs_l.test
#	tests/ast/data/session_table_dq_rs_s.test
#	tests/ast/data/session_table_dq_rt_l.test
#	tests/ast/data/session_table_dq_rt_s.test
#	tests/ast/data/session_table_uq_abs_l.test
#	tests/ast/data/session_table_uq_abs_s.test
#	tests/ast/data/session_table_uq_rs_l.test
#	tests/ast/data/session_table_uq_rs_s.test
#	tests/ast/data/session_table_uq_rt_l.test
#	tests/ast/data/session_table_uq_rt_s.test
#	tests/ast/data/session_write_pandas.test
#	tests/ast/data/shadowed_local_name.test
#	tests/ast/data/sproc.test
#	tests/ast/data/udaf.test
#	tests/ast/data/udtf.test
#	tests/ast/data/windows.test
) -> None:
from snowflake.snowpark._internal.ast_utils import with_src_position
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to import here to prevent circular imports issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far in the code, we tried to avoid building expressions directly. The reasoning was that Expressions should disappear as they're not a first-class entity in snowpark. Instead we chose an approach where always the caller (i.e. the function that creates Interval) creates the AST and assigns it to the expression if need be. I.e.,

def build_expr_from_snowpark_column_or_python_val(
    expr_builder: proto.Expr, value: ColumnOrLiteral
) -> None:
    """Copy from a Column object's AST, or copy a literal value into an AST expression.

    Args:
        ast (proto.Expr): A previously created Expr() IR entity intance to be filled
        value (ColumnOrLiteral): The value from which to populate the provided ast parameter.

    Raises:
        TypeError: The Expr provided should only be populated from a Snowpark Column with a valid _ast field or a literal value
    """
    if isinstance(value, snowflake.snowpark.Column):
        build_expr_from_snowpark_column(expr_builder, value)
    elif isinstance(value, VALID_PYTHON_TYPES_FOR_LITERAL_VALUE):
        build_expr_from_python_val(expr_builder, value)
    elif isinstance(value, Expression):
        # Expressions must be handled by caller.
        pass
    else:
        raise TypeError(f"{type(value)} is not a valid type for Column or literal AST.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prob. can move the AST logic to build_expr_from_snowpark_column_or_python_val and have an Interval case there. I don't think the original intention of snowpark was to use Interval directly, but rather the make_interval function.

@sfc-gh-vbudati
Copy link
Contributor Author

I ran the linter which is why there's 100+ file changes - most of these changes are in the test data directory and I think are superficial. I'm not sure how to fix the pyright/precommit type checking test.

@@ -405,6 +409,14 @@ def __init__(
if nanosecond is not None:
self.values_dict["NANOSECOND"] = nanosecond

if self._ast is None and _emit_ast:
expr = proto.Expr()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/expression.py:413:26 - error: "Expr" is not a known member of module "snowflake.snowpark._internal.proto.ast_pb2" (reportGeneralTypeIssues)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you repro locally?

python -m tox -e pyright

It is saying you may need a new version:
"Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to latest"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the same errors locally, I'm not sure what to do about the first error though. There's other parts of the code that use proto.Expr() as well so I'm not sure why it's erroring here

# source: proto/ast.proto
# Protobuf Python Version: 5.28.2
# Protobuf Python Version: 4.25.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to protobuf to 5.28.2 to avoid so many conflicts?

ast = with_src_position(expr.sp_interval)
# Set the AST values based on the values_dict.
for k, v in self.values_dict.items():
getattr(ast, k.lower()).value = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to avoid indirection via getattr and use the values explicitly. It's much easier to read and reason about :)

df2 = df1.select(
df1["a"]
+ Column(
Interval(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use make_interval here to test. Not sure we should support the Column(Interval(...)) syntax at all.


df1 = session.create_dataframe([[datetime.datetime(2010, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2011, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))], [datetime.datetime(2012, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2013, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))]], schema=["a", "b"])

df2 = df1.select(df1["a"] + Interval(quarter=1, month=1, week=2, day=2, hour=2, minute=3, second=3, millisecond=3, microsecond=4, nanosecond=4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work in snowpark?

Copy link
Contributor Author

@sfc-gh-vbudati sfc-gh-vbudati Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does - I pulled this from test_interval:

def test_interval(session):
    df1 = session.create_dataframe(
        [
            [datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)],
            [datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)],
        ],
        schema=["a", "b"],
    )
    df2 = df1.select(
        df1["a"]
        + Column(
            Interval(
                quarters=1,
                months=1,
                weeks=2,
                days=2,
                hours=2,
                minutes=3,
                seconds=3,
                milliseconds=3,
                microseconds=4,
                nanoseconds=4,
            )
        )
    )
    verify_column_result(
        session,
        df2,
        [
            '"(""A"" + INTERVAL \'1 QUARTER,1 MONTH,2 WEEK,2 DAY,2 HOUR,3 MINUTE,3 SECOND,3 MILLISECOND,4 MICROSECOND,4 NANOSECOND\')"',
        ],
        [TimestampType(timezone=TimestampTimeZone.NTZ)],
        None,
    )

but I see that you added the make_interval changes - I'll update my tests accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants