Prototype ArrayView Types #4253

tustvold · 2023-05-22T10:39:55Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There is ongoing discussion of introducing an ArrayView type to the format - https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb

We should explore the design space around this, in particular to gather some empirical data as to the impact of introducing such a type.

Describe the solution you'd like

I would like to prototype an implementation of StringView and explore integrating it into the parquet reader, where it ostensibly could yield to some non-trivial performance improvements

Describe alternatives you've considered

Additional context

tustvold · 2023-06-07T13:41:05Z

The benchmarks in #4378 show half the execution time being spent rewriting data to remove empty array slices. This likely could be optimised, and it is unclear how realistic the benchmark is, but I thought it was an interesting data point.

Theoretically ArrayView types would remove the need for this, whilst also removing the memcpy when decoding byte arrays. I'd anticipate roughly a 2x return, with bigger returns for more heavily nested data

tustvold · 2023-06-14T09:06:09Z

My current feelings on this matter are summarized in https://lists.apache.org/thread/1j0hdbfd0q2636zs9z0x19fkcn87gjhf

TLDR I think improving the support for sparse dictionaries may be sufficient to support this use case

alamb · 2024-02-08T01:43:18Z

Filed #5374 to track implementing what was added to the spec

alamb · 2024-02-12T16:12:31Z

Here was a draft PR: #4585

alamb · 2024-02-12T16:13:49Z

From my perspective, the prototype was completed in #4585 and follow on work is tracked in #5374 so closing this ticket down

tustvold added the enhancement Any new improvement worthy of a entry in the changelog label May 22, 2023

tustvold self-assigned this May 22, 2023

alamb added the arrow Changes to the arrow crate label Jun 7, 2023

tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2023

tustvold added the development-process Related to development process of arrow-rs label Jul 14, 2023

tustvold reopened this Jul 29, 2023

tustvold added a commit to tustvold/arrow-rs that referenced this issue Jul 29, 2023

Add StringViewArray and BinaryViewArray (apache#4253)

f90dfb0

tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 30, 2023

Add StringViewArray and BinaryViewArray (apache#4253)

51da8c2

tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 30, 2023

Add StringViewArray and BinaryViewArray (apache#4253)

0411e3e

tustvold mentioned this issue Jan 24, 2024

Support Arrow columnar format v1.4 #5326

Closed

3 tasks

alamb mentioned this issue Feb 8, 2024

[EPIC] Implement StringViewArray and BinaryViewArray #5374

Closed

31 tasks

alamb closed this as completed Feb 12, 2024

tustvold added a commit to tustvold/arrow-rs that referenced this issue Mar 11, 2024

Add StringViewArray and BinaryViewArray (apache#4253)

cc1fdc7

tustvold added a commit to tustvold/arrow-rs that referenced this issue Mar 11, 2024

Add StringViewArray and BinaryViewArray (apache#4253)

6ae547b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype ArrayView Types #4253

Prototype ArrayView Types #4253

tustvold commented May 22, 2023

tustvold commented Jun 7, 2023

tustvold commented Jun 14, 2023

alamb commented Feb 8, 2024

alamb commented Feb 12, 2024

alamb commented Feb 12, 2024 •

edited

Loading

Prototype ArrayView Types #4253

Prototype ArrayView Types #4253

Comments

tustvold commented May 22, 2023

tustvold commented Jun 7, 2023

tustvold commented Jun 14, 2023

alamb commented Feb 8, 2024

alamb commented Feb 12, 2024

alamb commented Feb 12, 2024 • edited Loading

alamb commented Feb 12, 2024 •

edited

Loading