Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataType::Utf8View and DataType::BinaryView #5468

Closed
Tracked by #5374
alamb opened this issue Mar 4, 2024 · 4 comments · Fixed by #5470
Closed
Tracked by #5374

Add DataType::Utf8View and DataType::BinaryView #5468

alamb opened this issue Mar 4, 2024 · 4 comments · Fixed by #5470
Assignees
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Mar 4, 2024

This is part of the larger project to implement StringViewArray -- see #5374

The first thing we will need is a new variant in DataType to support this type

So the basic task is to

  1. Add the appropriate variants in DataType
  2. Update the rest of arrow-rs to handle that new variant (largely would be an exercise in returning NotYetImplemented errors)

For inspiration I think you can look at #4585 (specifically types.rs https://github.com/apache/arrow-rs/pull/4585/files#diff-ff91e9fd06b025009cc1d0f9360ecdb8c3d9ea972e8f87b4419eab01e1e8fb7c)

I reviewed the Arrow spec that was approved and there does not appear to be any equivalent to LargeUtf8 (e.g there is no LargeUtf8View):

https://github.com/apache/arrow/blob/c4e088a1d6227868e020c71d596970f35bb9e4c9/format/Schema.fbs#L187-L205

@alamb alamb changed the title DataType Add DataType::Utf8View and DataType::BinaryView Mar 4, 2024
@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2024

FYI @ariesdevil

@XiangpengHao
Copy link
Contributor

I can work on this if no one else is working on it.

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'parquet'} from #5470

@tustvold tustvold added the arrow Changes to the arrow crate label Mar 15, 2024
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #5470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants