Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial support for time windows #3074

Merged
merged 3 commits into from
Jul 29, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Jul 28, 2021

This fixes #2943

Spark supports time windows, and technically this lets us also support them, but without support for grouping by a struct of (timestamp, timestamp) or partitioning by this on window operations it is not going to do a lot for actual customers.

Time windows are supported by the window function. This essentially does a bunch of math operations along with an expand to produce a struct column with the timestamp bucketed into a number of different time window buckets. You can then group by these buckets or do window operations on them to get either tumbling or sliding window aggregations. This adds some tests that are in the context of those more complete operations, because just creating the bucket, though interesting, is not that useful on its own. When we do support groupby and window operations on structs then this can be extended.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added this to the July 19 - July 30 milestone Jul 28, 2021
@revans2 revans2 self-assigned this Jul 28, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Jul 28, 2021

build

@sameerz sameerz added the feature request New feature or request label Jul 29, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Jul 29, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Jul 29, 2021

@jlowe I think I have addressed all of your review comments.

@@ -868,9 +868,9 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False):

boolean_gens = [boolean_gen]

single_level_array_gens = [ArrayGen(sub_gen) for sub_gen in all_basic_gens + decimal_gens + [null_gen]]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI null_gen was already a part of all_basic_gens so we would double up a few tests.

@revans2
Copy link
Collaborator Author

revans2 commented Jul 29, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Jul 29, 2021

build got stuck and I aborted it. Not sure what happened. The logs didn't show any errors beyond Could not connect to ... to send interrupt signal to process which makes me think it was a CI system issue, but I am not 100% sure.

@revans2 revans2 merged commit 3220a11 into NVIDIA:branch-21.10 Jul 29, 2021
@revans2 revans2 deleted the timestamp_conversion branch July 29, 2021 19:07
@revans2 revans2 linked an issue Jul 29, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support PreciseTimestampConversion when using windowing function
3 participants