-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure this is the right place. I think putting the reshape into make_block
is a bit closer to what we want?
diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index 0c49ebb55..54d295cf0 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None,
# For now, blocks should be backed by ndarrays when possible.
if isinstance(values, ABCPandasArray):
values = values.to_numpy()
+ if ndim and ndim > 1:
+ values = np.atleast_2d(values)
if isinstance(dtype, PandasDtype):
dtype = dtype.numpy_dtype
pandas/core/frame.py
Outdated
# convert pandas array to numpy array | ||
if isinstance(value, ABCPandasArray): | ||
value = value.to_numpy() | ||
return np.atleast_2d(np.asarray(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the np.asarray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do you even need this early return? Can you not just do value = value.to_numpy()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the
np.asarray
?
The reason for np.asarray
was because the same was done if a Numpy array was passed to the column.
And do you even need this early return? Can you not just do value = value.to_numpy()?
You're right though, I shouldn't be returning this early, I had a misconception that the next check would return the wrong value
# return internal types directly
if is_extension_type(value) or is_extension_array_dtype(value):
return value
Will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure this is the right place. I think putting the reshape into
make_block
is a bit closer to what we want?diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py index 0c49ebb55..54d295cf0 100644 --- a/pandas/core/internals/blocks.py +++ b/pandas/core/internals/blocks.py @@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None, # For now, blocks should be backed by ndarrays when possible. if isinstance(values, ABCPandasArray): values = values.to_numpy() + if ndim and ndim > 1: + values = np.atleast_2d(values) if isinstance(dtype, PandasDtype): dtype = dtype.numpy_dtype
I guess putting the conversion in make_blocks
is a better idea, it was the first thing that came to mind, but I thought it to be a sanitary process, just like the conversion of NumpyArray to a 2D Format.
Anyway I'll make the requisite changes
df['c'] = pd.array([1, 2, None, 3]) | ||
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'], | ||
'c': pd.array([1, 2, None, 3])}) | ||
assert_frame_equal(df, df2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you assert that df2['c']._data.blocks[0]
is an ObjectBlock (not an extension block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add these checks
Codecov Report
@@ Coverage Diff @@
## master #26417 +/- ##
==========================================
- Coverage 91.69% 91.68% -0.01%
==========================================
Files 174 174
Lines 50739 50742 +3
==========================================
- Hits 46524 46523 -1
- Misses 4215 4219 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26417 +/- ##
==========================================
- Coverage 91.73% 91.72% -0.01%
==========================================
Files 174 174
Lines 50741 50756 +15
==========================================
+ Hits 46548 46558 +10
- Misses 4193 4198 +5
Continue to review full report at Codecov.
|
df['c'] = pd.array([1, 2, None, 3]) | ||
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'], | ||
'c': pd.array([1, 2, None, 3])}) | ||
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want assert statements
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock) | |
assert type(df2['c']._data.blocks[0]) == ObjectBlock |
Same for the line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these parens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do so in the next commit
The release note can go in the "Other" section. |
@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array(): | |||
result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype) | |||
assert result.is_integer is True | |||
assert result.is_extension is False | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you actually need this test; rather in the test right above, check the result.values is is the correct type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test was already present, was this valid only prior to the decision of converting PandasArray to numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2)
with the right placement. It wouldn't hurt to add that, but I think keep the test below.
@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array(): | |||
result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype) | |||
assert result.is_integer is True | |||
assert result.is_extension is False | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2)
with the right placement. It wouldn't hurt to add that, but I think keep the test below.
df['c'] = pd.array([1, 2, None, 3]) | ||
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'], | ||
'c': pd.array([1, 2, None, 3])}) | ||
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these parens?
thanks @shantanu-gontia |
git diff upstream/master -u -- "*.py" | flake8 --diff
Which section should i add the whatsnew entry in for this particular case?