Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnMetaData should no longer be written inline with data #6115

Closed
etseidl opened this issue Jul 25, 2024 · 2 comments · Fixed by #6117
Closed

ColumnMetaData should no longer be written inline with data #6115

etseidl opened this issue Jul 25, 2024 · 2 comments · Fixed by #6117
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@etseidl
Copy link
Contributor

etseidl commented Jul 25, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The writing of the thrift ColumnMetaData outside of the Parquet file footer was recently deprecated (apache/parquet-format#440), as was the setting of the ColumnChunk::file_offset field. Also, the ColumnMetaData currently written has incorrect values for dictionary_page_offset and data_page_offset (they are relative to the start of the chunk rather than being offset to their location in the file).

Describe the solution you'd like
The current Parquet spec indicates the file_offset field should be set to 0, and ColumnMetaData should no longer be written inline with the data.

Describe alternatives you've considered
If not removed, the offsets mentioned above should be set to correct values.

@etseidl etseidl added the enhancement Any new improvement worthy of a entry in the changelog label Jul 25, 2024
@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

Thanks @etseidl (I plan to review your next PRs later this afternoon, BTW)

@alamb
Copy link
Contributor

alamb commented Aug 31, 2024

label_issue.py automatically added labels {'parquet'} from #6117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants