Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support decimal type in ORC writer #3192

Closed
firestarman opened this issue Aug 11, 2021 · 4 comments · Fixed by #3831
Closed

[FEA] Support decimal type in ORC writer #3192

firestarman opened this issue Aug 11, 2021 · 4 comments · Fixed by #3831
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P0 Must have for release

Comments

@firestarman
Copy link
Collaborator

firestarman commented Aug 11, 2021

cuDF has added decimal type support in ORC writer by the PR rapidsai/cudf#8198.

So we can enable the decimal support on plugin side now. Maybe just updating the type signature can have this done.

But this should need more work than just updating the type checks.

#3177 is for ORC reader

@firestarman firestarman added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 11, 2021
@jlowe
Copy link
Member

jlowe commented Aug 11, 2021

This will need to be more involved than just updating type checks. Unlike reading where we can rely on Spark conveying to us the expected decimal precision in the read schema, we need to tell cudf the decimal precision to write. The scale can be automatically detected based on the cudf type, but cudf does not track precision. For example, we may want to write a type that's Decimal(10,2) but there's currently no way to specify to the cudf ORC writer that the decimal precision of the column in the ORC file needs to be 10.

@jlowe jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Aug 11, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Aug 17, 2021
@GaryShen2008
Copy link
Collaborator

@Salonijain27 This issue depends on cuDF to support specifying precision of Decimal, which is not implemented yet. Can we move it to 21.12 plan? Of course, need to confirm with cuDF team.

@sameerz sameerz added the P0 Must have for release label Sep 14, 2021
@sameerz
Copy link
Collaborator

sameerz commented Sep 14, 2021

Marking as P1 for 21.12.

@Salonijain27 Salonijain27 added this to the Sep 27 - Oct 1 milestone Sep 24, 2021
@sameerz sameerz modified the milestones: Sep 27 - Oct 1, Oct 4 - Oct 15 Oct 7, 2021
@GaryShen2008 GaryShen2008 assigned res-life and unassigned firestarman Oct 14, 2021
@GaryShen2008
Copy link
Collaborator

This issue is related. rapidsai/cudf#9319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants