Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running CPU based UDF efficiently [databricks] #3897

Merged
merged 18 commits into from
Oct 29, 2021

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Oct 22, 2021

This PR is to support running CPU based UDF efficiently by pulling back only the columns the UDF needs to host and do the processing on CPU, instead of falling back a whole plan to CPU.

TODOs

  • More unit tests for decimal and nested types (list, struct, map).
  • More investigation on the errors when using input encoders. (Tracked by #3924)
  • Update hive UDFs to support this CPU based UDF. (Tracked by #3904)

This partially addresses #3855

Signed-off-by: Firestarman firestarmanllc@gmail.com

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

Filed this draft PR for early review.

@firestarman
Copy link
Collaborator Author

build

1 similar comment
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@sameerz sameerz added the feature request New feature or request label Oct 26, 2021
along with some small refactors.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman self-assigned this Oct 26, 2021
@firestarman firestarman marked this pull request as ready for review October 26, 2021 09:39
@firestarman firestarman requested review from jlowe and revans2 October 26, 2021 09:39
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

build

@pxLi pxLi changed the title Support running CPU based UDF efficiently Support running CPU based UDF efficiently [databricks] Oct 26, 2021
@pxLi
Copy link
Collaborator

pxLi commented Oct 26, 2021

added [databricks] to make sure we pass db runtime tests

@firestarman
Copy link
Collaborator Author

firestarman commented Oct 27, 2021

added [databricks] to make sure we pass db runtime tests

Any action is required for this ? I did a verification on DB by a dev job and it passed.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman requested a review from jlowe October 27, 2021 06:05
…cala


Update the config doc

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested a review from jlowe October 28, 2021 06:10
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested a review from revans2 October 29, 2021 03:02
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants