-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pyarrow and dask parquet issues #92
Fix pyarrow and dask parquet issues #92
Conversation
CI passing on github actions on ubuntu using Python 3.8. Expanding test matrix to cover Python 3.7 to 3.10 and all 3 major platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment.
PKG_TEST_PYTHON: "--test-python=py37" | ||
PYTHON_VERSION: "3.7" | ||
PKG_TEST_PYTHON: "--test-python=py39" | ||
PYTHON_VERSION: "3.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a known issue on pyctdev that occurs when building a package with a base Python version greater than 3.7. You might hit that :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will revert to 3.7 for the time being then.
This is the final set of fixes identified in issue #86. CI works locally for me now.
The changes are all related to API changes and deprecations in pyarrow between 5.0.0 and 8.0.0. Changes were required in our code to directly deal with these as well as other changes following dask modifications to deal with the same.
I have created a new script
_create_testdata.py
to create test parquet files that are stored in the newtests/test_data
directory and these are checked as part ofpytest
. The last time the CI definitely worked was July 2021 withpyarrow==5.0.0
anddask==2021.7.2
(and the same fordistributed
). These files are successfully read with up-to-datepyarrow==8.0.0
anddask==2022.7.1
. Similarly, test parquet files create with the up-to-datepyarrow
anddask
are successfully read withpyarrow==5.0.0
anddask==2021.7.2
.