Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-3070: Add Variant logical type annotation to parquet-java #3072

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aihuaxu
Copy link

@aihuaxu aihuaxu commented Nov 22, 2024

Rationale for this change

This is to add Variant logical type in parquet-java to be used by dependent projects.

What changes are included in this PR?

Variant logical type is added to LogicalTypeAnnotation.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. Variant logical type is available.

Closes #3070

@aihuaxu
Copy link
Author

aihuaxu commented Nov 22, 2024

@Fokko, @wgtmac Can you help check if this is right implementation? Thanks.

@wgtmac
Copy link
Member

wgtmac commented Nov 22, 2024

Do we need to support writing and reading variant data?

@aihuaxu
Copy link
Author

aihuaxu commented Nov 22, 2024

The Variant reading and writing are getting implemented in Iceberg and/or the engines themselves. I think later we can think of pulling the implementation to Parquet if needed.

@emkornfield
Copy link
Contributor

The Variant reading and writing are getting implemented in Iceberg and/or the engines themselves. I think later we can think of pulling the implementation to Parquet if needed.

I think this is problematic if the spec lives in parquet and doesn't have a complete implementation per previously agreed upon guidelines for new parquet features. This probably warrants a discussion on the mailing list. CC @julienledem @rdblue @RussellSpitzer

@Fokko
Copy link
Contributor

Fokko commented Nov 26, 2024

@aihuaxu I agree with @emkornfield that the iceberg-java implementation should be able to read and write the variant type.

It would also be great to drop some example parquet files in https://github.com/apache/parquet-testing, this will also help the adoption of other implementations, see apache/parquet-format#456 (comment)

@wgtmac
Copy link
Member

wgtmac commented Nov 26, 2024

Usually we need two reference implementations for spec changes like this. I'm not sure if there is any chance to have another implementation ready in a timely manner. IMO, at least parquet-java should support basic roundtrip read and write.

@aihuaxu
Copy link
Author

aihuaxu commented Nov 26, 2024

I see. Per guideline, we need to have the implementation in parquet-java and then another one. Do we usually include the implementation with this annotation change or should be separate?

Completeness: The goal of this phase is to ensure the feature is viable, there is no ambiguity in its specification by demonstrating compatibility between implementations. Once a change has lazy consensus, two implementations of the feature demonstrating interopability must also be provided. One implementation MUST be parquet-java. It is preferred that the second implementation be parquet-cpp or parquet-rs,

@wgtmac
Copy link
Member

wgtmac commented Nov 27, 2024

I think it should be in one change. The parquet-format cannot be released without concrete PoC implementation in parquet-java. Without that release, separate changes may break CI and thus cannot be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Variant Logical Type
4 participants