-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype ArrayView Types #4253
Comments
The benchmarks in #4378 show half the execution time being spent rewriting data to remove empty array slices. This likely could be optimised, and it is unclear how realistic the benchmark is, but I thought it was an interesting data point. Theoretically ArrayView types would remove the need for this, whilst also removing the memcpy when decoding byte arrays. I'd anticipate roughly a 2x return, with bigger returns for more heavily nested data |
My current feelings on this matter are summarized in https://lists.apache.org/thread/1j0hdbfd0q2636zs9z0x19fkcn87gjhf TLDR I think improving the support for sparse dictionaries may be sufficient to support this use case |
Filed #5374 to track implementing what was added to the spec |
Here was a draft PR: #4585 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There is ongoing discussion of introducing an ArrayView type to the format - https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb
We should explore the design space around this, in particular to gather some empirical data as to the impact of introducing such a type.
Describe the solution you'd like
I would like to prototype an implementation of StringView and explore integrating it into the parquet reader, where it ostensibly could yield to some non-trivial performance improvements
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: