-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: require pyarrow for pandas support #314
deps: require pyarrow for pandas support #314
Conversation
tests/unit/test_client.py
Outdated
@@ -7839,7 +7835,6 @@ def test_load_table_from_dataframe_unknown_table(self): | |||
) | |||
|
|||
@unittest.skipIf(pandas is None, "Requires `pandas`") | |||
@unittest.skipIf(fastparquet is None, "Requires `fastparquet`") | |||
def test_load_table_from_dataframe_no_pyarrow_warning(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised to see this test passing. I guess we still have some code that falls back to the default pandas parquet rendering?
Can you look into if we can remove that code path?
Related: We should be able to simplify this docstring now:
python-bigquery/google/cloud/bigquery/client.py
Lines 2134 to 2147 in cbcb4b8
parquet_compression (Optional[str]): | |
[Beta] The compression method to use if intermittently | |
serializing ``dataframe`` to a parquet file. | |
If ``pyarrow`` and job config schema are used, the argument | |
is directly passed as the ``compression`` argument to the | |
underlying ``pyarrow.parquet.write_table()`` method (the | |
default value "snappy" gets converted to uppercase). | |
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-table | |
If either ``pyarrow`` or job config schema are missing, the | |
argument is directly passed as the ``compression`` argument | |
to the underlying ``DataFrame.to_parquet()`` method. | |
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet |
@@ -53,7 +53,6 @@ | |||
"pyarrow >= 1.0.0, < 2.0dev", | |||
], | |||
"tqdm": ["tqdm >= 4.7.4, <5.0.0dev"], | |||
"fastparquet": ["fastparquet", "python-snappy", "llvmlite>=0.34.0"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see us add "pyarrow" to the "pandas" extras now, since it's needed for both uploads and downloads to dataframe.
We can maybe refactor the pyarrow >=1.0.0,<2.0dev
string into a variable since it's going to appear 3 times in setup.py now too
@tswast OK, this was a bit more involved than I expected at the beginning. Here goes my second attempt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
🤖 I have created a release \*beep\* \*boop\* --- ## [2.2.0](https://www.github.com/googleapis/python-bigquery/compare/v2.1.0...v2.2.0) (2020-10-19) ### Features * add method api_repr for table list item ([#299](https://www.github.com/googleapis/python-bigquery/issues/299)) ([07c70f0](https://www.github.com/googleapis/python-bigquery/commit/07c70f0292f9212f0c968cd5c9206e8b0409c0da)) * add support for listing arima, automl, boosted tree, DNN, and matrix factorization models ([#328](https://www.github.com/googleapis/python-bigquery/issues/328)) ([502a092](https://www.github.com/googleapis/python-bigquery/commit/502a0926018abf058cb84bd18043c25eba15a2cc)) * add timeout paramter to load_table_from_file and it dependent methods ([#327](https://www.github.com/googleapis/python-bigquery/issues/327)) ([b0dd892](https://www.github.com/googleapis/python-bigquery/commit/b0dd892176e31ac25fddd15554b5bfa054299d4d)) * add to_api_repr method to Model ([#326](https://www.github.com/googleapis/python-bigquery/issues/326)) ([fb401bd](https://www.github.com/googleapis/python-bigquery/commit/fb401bd94477323bba68cf252dd88166495daf54)) * allow client options to be set in magics context ([#322](https://www.github.com/googleapis/python-bigquery/issues/322)) ([5178b55](https://www.github.com/googleapis/python-bigquery/commit/5178b55682f5e264bfc082cde26acb1fdc953a18)) ### Bug Fixes * make TimePartitioning repr evaluable ([#110](https://www.github.com/googleapis/python-bigquery/issues/110)) ([20f473b](https://www.github.com/googleapis/python-bigquery/commit/20f473bfff5ae98377f5d9cdf18bfe5554d86ff4)), closes [#109](https://www.github.com/googleapis/python-bigquery/issues/109) * use version.py instead of pkg_resources.get_distribution ([#307](https://www.github.com/googleapis/python-bigquery/issues/307)) ([b8f502b](https://www.github.com/googleapis/python-bigquery/commit/b8f502b14f21d1815697e4d57cf1225dfb4a7c5e)) ### Performance Improvements * add size parameter for load table from dataframe and json methods ([#280](https://www.github.com/googleapis/python-bigquery/issues/280)) ([3be78b7](https://www.github.com/googleapis/python-bigquery/commit/3be78b737add7111e24e912cd02fc6df75a07de6)) ### Documentation * update clustering field docstrings ([#286](https://www.github.com/googleapis/python-bigquery/issues/286)) ([5ea1ece](https://www.github.com/googleapis/python-bigquery/commit/5ea1ece2d911cdd1f3d9549ee01559ce8ed8269a)), closes [#285](https://www.github.com/googleapis/python-bigquery/issues/285) * update snippets samples to support version 2.0 ([#309](https://www.github.com/googleapis/python-bigquery/issues/309)) ([61634be](https://www.github.com/googleapis/python-bigquery/commit/61634be9bf9e3df7589fc1bfdbda87288859bb13)) ### Dependencies * add protobuf dependency ([#306](https://www.github.com/googleapis/python-bigquery/issues/306)) ([cebb5e0](https://www.github.com/googleapis/python-bigquery/commit/cebb5e0e911e8c9059bc8c9e7fce4440e518bff3)), closes [#305](https://www.github.com/googleapis/python-bigquery/issues/305) * require pyarrow for pandas support ([#314](https://www.github.com/googleapis/python-bigquery/issues/314)) ([801e4c0](https://www.github.com/googleapis/python-bigquery/commit/801e4c0574b7e421aa3a28cafec6fd6bcce940dd)), closes [#265](https://www.github.com/googleapis/python-bigquery/issues/265) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #265 🦕