-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: extraneous copy of extension arrays in v1.3.0 #42501
Comments
AFAICT it's caused by (or related to) #38939 |
take |
I'm still learning how to infer work owner -- @simonjayhawkins if you're working on this please take it back from me. :) |
It seems that #38939 fixed the inverse test case, where copy is expressly true, but broke this test case. Current hypothesis is that this is a straightforward issue with argument propagation. I have a draft code change that fixes our test case here. I'll continue exploring the code. |
@jmcomie my commit that referenced this issue was just confirming commit that caused regression. carry on! |
|
Looking through the stack, astype calls concat with copy=False, but in the is_series axis=1 block in the get_result method in reshape/concat.py a DataFrame constructor is called without this copy value being passed along. Instrumenting the code with a few print statements reveals that we are setting copy to a value of True in this path, in the code added in #38939. So an apparent fix is an update to the get_result method to pass on the provided copy value to this DataFrame constructor. This does resolve the test case provided -- code:
However this change breaks the test case in tests/extension/decimal/test_decimal.py::test_astype_dispatches[True]. I paused after an initial review of the test_astype_dispatches behavior and am planning to pick up that investigation this week. Let me know if I'm missing something or if there's a better angle to pursue here. |
could removing the special-casing for EAs here (https://github.com/pandas-dev/pandas/pull/38939/files#diff-47a4d23478486a3722569045c05137aec72bc6030df19443c454872a7cdf90d4R445) be helpful? |
Removing the special-casing fixes the test case of this issue but causes other failures. Separately, I wonder if it's worth refactoring that block into a lower level call, since it seems to act on undocumented knowledge of the lower level calls. |
The test_astype_dispatches failure is due to a parameterized expected failure succeeding after the change so it might work to change the test to expect both parameterized inputs to work. |
The expected failure is |
Code Sample, a copy-pastable example
Problem description
In 1.3.0,
astype
attempts to create an extension array copy even when explicitly passedcopy=False
:Expected Output
1.2.5 works as expected (at least for this example):
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.7.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-1029-oem
Version : #30-Ubuntu SMP Fri May 28 23:53:50 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : 0.29.22
pytest : 6.2.2
hypothesis : 6.7.0
jinja2 : 2.11.3
IPython : 7.20.0
fsspec : 2021.04.0
fastparquet : 0.5.0
matplotlib : 3.4.1
pyarrow : 4.0.1
s3fs : 2021.04.0
scipy : 1.6.0
xarray : 0.17.0
numba : 0.53.1
The text was updated successfully, but these errors were encountered: