-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of writing NULLs in parquet #4191
Conversation
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/AbstractBulkValuesWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python changes look good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change makes sense to me, good find. I suggested a bit of a refactoring for how we implement it, and I think we need to extend it to vectors and arrays.
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/PlainIntChunkedWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/PlainIntChunkedWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/PlainIntChunkedWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/ColumnWriterImpl.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/BulkWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/AbstractBulkValuesWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/ColumnWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight suggestions
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
…ble/ParquetTableWriter.java
…ble/ParquetTableWriter.java
…ble/ParquetTableWriter.java
Bug fix, closes #1548
Original State:
Before this change, nulls were handled incorrectly for primitive types as well as for arrays and vectors, such that null values were written as Deephaven null representations in the parquet file. For example, a null in an integer column would be written as QueryConstants.NULL_INT in the parquet file. Therefore, any reader other than deephaven's parquet reader would not be able to process it correctly.
Change description:
This PR fixes multiple small bugs in both java and python code for processing all nulls in a standard manner such that instead of writing different Deephaven null representations to the parquet file, we encode them by storing offsets of null values. #4186 was also discovered and fixed along with this PR for proper representation of floats in the parquet file.
Along with that, we have done refactoring in the ParquetFileWriter class to make the code more readable and add more documentation.