[FEA] Hash Aggregate Cleanup to make closer to spark #3

revans2 · 2020-05-28T17:12:45Z

Is your feature request related to a problem? Please describe.
A lot of the processing in the GPU aggregate operations are not as close to spark as they could be. It would be nice to make it more similar to help reduce the possibility of bugs and to increase the possibility of code reuse.

Update README and remove useless file

abellina · 2021-10-29T15:00:42Z

It seems like this issue can be closed by #3910.

We have made ourselves closer (re: setupReferences), and our interface calls out the relationship between our work and aggBufferAttributes in Spark. I am not sure how much closer we could get to the CPU. But adding the comment to this issue to see if we are satisfied with #3910 or we need more work to get us closer.

abellina · 2022-02-01T22:01:13Z

This issue should have been closed by the work in #3910 and #4272.

In a nutshell, the aggregate.scala code was made simpler and more consistent between group-by and reduction, in addition to removing complicate code in setupReferences that was not required. In AggregateFunctions.scala the GpuAggregateFunction and CudfAggregate interfaces were better defined and documented and flexible enough for complicated operations like overflow checking for decimals.

1. Fixed indentation. 2. Hardcode for supported date format. 3. Added tests for timestamp strings read as dates. 4. Fixed behaviour for NVIDIA#3 above.

* Hive Text parsing of invalid date strings should not cause exceptions. Fixes #7089. There were two problems: 1. Strings between field delimiters should not be trimmed before casting to dates. 2. Invalid date strings should not be causing exceptions. They should return null values, as is the convention in Hive's `LazySimpleSerDe`. Signed-off-by: MithunR <mythrocks@gmail.com> * Fixed verify errors. * Fixed merge duplication. * Review fixes: 1. Fixed indentation. 2. Hardcode for supported date format. 3. Added tests for timestamp strings read as dates. 4. Fixed behaviour for #3 above. Signed-off-by: MithunR <mythrocks@gmail.com> Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 28, 2020

revans2 changed the title ~~[FEA] Hash Aggregate Cleanup to mke closer to spark~~ [FEA] Hash Aggregate Cleanup to make closer to spark May 28, 2020

sameerz removed the ? - Needs Triage Need team to review and classify label Oct 20, 2020

wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020

Merge pull request NVIDIA#3 from GaryShen2008/update-new-docs

390caf2

Update README and remove useless file

gerashegalov referenced this issue in gerashegalov/spark-rapids Sep 1, 2021

review round #3

a59e782

viadea mentioned this issue Jan 12, 2022

[DOC] Misc Doc improvements #4319

Closed

abellina closed this as completed Feb 1, 2022

viadea mentioned this issue Feb 22, 2022

[QST] - Spark3 question #4828

Closed

This was referenced Jun 14, 2022

[BUG] Root-cause exception is missing in UCX shuffle init when native lib is missing #5832

Open

[BUG] Non-deterministic query result corruption when RAPIDS shuffle manager is enabled #5818

Open

mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Dec 6, 2022

Review fixes:

51aace4

1. Fixed indentation. 2. Hardcode for supported date format. 3. Added tests for timestamp strings read as dates. 4. Fixed behaviour for NVIDIA#3 above.

andygrove mentioned this issue Apr 18, 2023

Add 340 shim for GpuInsertIntoHiveTable #8144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Hash Aggregate Cleanup to make closer to spark #3

[FEA] Hash Aggregate Cleanup to make closer to spark #3

revans2 commented May 28, 2020

abellina commented Oct 29, 2021

abellina commented Feb 1, 2022

[FEA] Hash Aggregate Cleanup to make closer to spark #3

[FEA] Hash Aggregate Cleanup to make closer to spark #3

Comments

revans2 commented May 28, 2020

abellina commented Oct 29, 2021

abellina commented Feb 1, 2022