Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for decimal types in ORC writer #8198

Merged
merged 40 commits into from
May 18, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 10, 2021

Closes #8159, #7126

Current implementation uses an array to hold the exact size of each encoded element before the encode step. This allows us to simplify the encoding (each element encode is independent) and to allocate streams of exact size instead of the worst-case. The process is different from other types because decimal data streams do not use RLE encoding.

Will add benchmarks once data generator can produce decimal data.

vuule added 28 commits April 22, 2021 15:24
@vuule vuule self-assigned this May 10, 2021
@vuule vuule requested review from rgsl888prabhu and devavret and removed request for ttnghia and jrhemstad May 12, 2021 23:29
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest lgtm

cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved
Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say partial review but I know everything else is very orc and implementation specific.

cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
@vuule vuule requested review from devavret and rgsl888prabhu May 14, 2021 22:37
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment, apart from that looks good.

cpp/src/io/orc/stripe_enc.cu Show resolved Hide resolved
@vuule
Copy link
Contributor Author

vuule commented May 17, 2021

rerun tests

@vuule vuule changed the title Add support for decimal types in ORC writer Add support for decimal types in ORC writer May 18, 2021
@vuule vuule changed the title Add support for decimal types in ORC writer Add support for decimal types in ORC writer May 18, 2021
@vuule
Copy link
Contributor Author

vuule commented May 18, 2021

rerun tests

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 18, 2021
@vuule
Copy link
Contributor Author

vuule commented May 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 72e017b into rapidsai:branch-21.06 May 18, 2021
@vuule vuule deleted the fea-orc-writer-decimal branch May 18, 2021 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for writing dataframes containing decimal columns to orc writer
4 participants