Remove all use of parquet's validate_schema #110
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #109.
Test suite passes using latest
pyarrow == 11.0.0
.Fix isn't quite as simple as removing the final use of
validate_schema
keyword argument. It was also necessary when identifying which columns to read from the parquet file to check which are classified as columns rather than indexes. I have also simplified the code a bit as it no longer needs a separate load of themetadata
before creating theParquetDataset
.This fix works for
pyarrow >= 5
(July 2021). I will try out another PR to support earlierpyarrow
but the changes will be wider-ranging as there are a number of places in the code that do not currently supportpyarrow < 5
before this PR is considered.