-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial support for time windows #3074
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
build |
@jlowe I think I have addressed all of your review comments. |
@@ -868,9 +868,9 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): | |||
|
|||
boolean_gens = [boolean_gen] | |||
|
|||
single_level_array_gens = [ArrayGen(sub_gen) for sub_gen in all_basic_gens + decimal_gens + [null_gen]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI null_gen was already a part of all_basic_gens so we would double up a few tests.
build |
build got stuck and I aborted it. Not sure what happened. The logs didn't show any errors beyond |
This fixes #2943
Spark supports time windows, and technically this lets us also support them, but without support for grouping by a struct of (timestamp, timestamp) or partitioning by this on window operations it is not going to do a lot for actual customers.
Time windows are supported by the
window
function. This essentially does a bunch of math operations along with an expand to produce a struct column with the timestamp bucketed into a number of different time window buckets. You can then group by these buckets or do window operations on them to get either tumbling or sliding window aggregations. This adds some tests that are in the context of those more complete operations, because just creating the bucket, though interesting, is not that useful on its own. When we do support groupby and window operations on structs then this can be extended.