-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] parquet_test.py pytests FAILED on Databricks-9.1-ML-spark-3.1.2 #4069
Comments
@wbo4958 parquet tests only failed on DB9.1, tests PASS on DB7.3/DB8.2 and other non-DB environments |
Looks like 9.1 runtime has changed the API "ParquetReadSupport.clipParquetSchema" which result in different result
clippedSchemaTmp:message spark_schema {
optional group STRUCT {
optional int64 case_INSENsitive;
}
}
clippedSchemaTmp:message spark_schema {
optional group struct {
optional int64 case_insensitive;
}
} will continue check tomorrow |
"ParquetReadSupport.clipParquetSchema" in DB9.1 will return the same name with readDataSchema instead of parquet file schema which will result clipBlocks return empty ColumnChunkMetaData. So issue happened |
…turns the readSchema-same-name schema when case insensitive, which will cause clipBlocks return in-correct results since clipBlocks only takes care of case sensitive matching. Signed-off-by: Bobby Wang wbo4958@gmail.com To fix NVIDIA#4069
Describe the bug
Steps/Code to reproduce bug
Build rapids-4-spark and run IT on Databricks 9.1 ML spark-3.1.2
Environment details (please complete the following information)
The text was updated successfully, but these errors were encountered: