Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

Merged
merged 8 commits into from
May 19, 2019
Merged

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

merged 8 commits into from
May 19, 2019

Conversation

shantanu-gontia
Copy link
Contributor

@shantanu-gontia shantanu-gontia commented May 15, 2019


Which section should i add the whatsnew entry in for this particular case?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure this is the right place. I think putting the reshape into make_block is a bit closer to what we want?

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index 0c49ebb55..54d295cf0 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None,
     # For now, blocks should be backed by ndarrays when possible.
     if isinstance(values, ABCPandasArray):
         values = values.to_numpy()
+        if ndim and ndim > 1:
+            values = np.atleast_2d(values)
     if isinstance(dtype, PandasDtype):
         dtype = dtype.numpy_dtype

# convert pandas array to numpy array
if isinstance(value, ABCPandasArray):
value = value.to_numpy()
return np.atleast_2d(np.asarray(value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the np.asarray?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do you even need this early return? Can you not just do value = value.to_numpy()?

Copy link
Contributor Author

@shantanu-gontia shantanu-gontia May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the np.asarray?

The reason for np.asarray was because the same was done if a Numpy array was passed to the column.

And do you even need this early return? Can you not just do value = value.to_numpy()?

You're right though, I shouldn't be returning this early, I had a misconception that the next check would return the wrong value

# return internal types directly
        if is_extension_type(value) or is_extension_array_dtype(value):
            return value

Will fix this.

Copy link
Contributor Author

@shantanu-gontia shantanu-gontia May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure this is the right place. I think putting the reshape into make_block is a bit closer to what we want?

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index 0c49ebb55..54d295cf0 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None,
     # For now, blocks should be backed by ndarrays when possible.
     if isinstance(values, ABCPandasArray):
         values = values.to_numpy()
+        if ndim and ndim > 1:
+            values = np.atleast_2d(values)
     if isinstance(dtype, PandasDtype):
         dtype = dtype.numpy_dtype

I guess putting the conversion in make_blocks is a better idea, it was the first thing that came to mind, but I thought it to be a sanitary process, just like the conversion of NumpyArray to a 2D Format.

Anyway I'll make the requisite changes

df['c'] = pd.array([1, 2, None, 3])
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
'c': pd.array([1, 2, None, 3])})
assert_frame_equal(df, df2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert that df2['c']._data.blocks[0] is an ObjectBlock (not an extension block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add these checks

@TomAugspurger TomAugspurger added the Internals Related to non-user accessible pandas implementation label May 15, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone May 15, 2019
@codecov
Copy link

codecov bot commented May 15, 2019

Codecov Report

Merging #26417 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26417      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50739    50742       +3     
==========================================
- Hits        46524    46523       -1     
- Misses       4215     4219       +4
Flag Coverage Δ
#multiple 90.19% <100%> (ø) ⬆️
#single 41.16% <33.33%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.07% <ø> (ø) ⬆️
pandas/core/frame.py 97.02% <100%> (-0.12%) ⬇️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff4437e...fcd29ed. Read the comment docs.

@codecov
Copy link

codecov bot commented May 15, 2019

Codecov Report

Merging #26417 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26417      +/-   ##
==========================================
- Coverage   91.73%   91.72%   -0.01%     
==========================================
  Files         174      174              
  Lines       50741    50756      +15     
==========================================
+ Hits        46548    46558      +10     
- Misses       4193     4198       +5
Flag Coverage Δ
#multiple 90.23% <100%> (ø) ⬆️
#single 41.69% <100%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.6% <0%> (-0.11%) ⬇️
pandas/core/base.py 97.99% <0%> (-0.01%) ⬇️
pandas/tseries/offsets.py 96.69% <0%> (-0.01%) ⬇️
pandas/core/indexes/datetimes.py 96.85% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 99.56% <0%> (ø) ⬆️
pandas/io/pytables.py 90.22% <0%> (ø) ⬆️
pandas/core/window.py 96.41% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1263e1a...997c0a5. Read the comment docs.

df['c'] = pd.array([1, 2, None, 3])
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
'c': pd.array([1, 2, None, 3])})
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want assert statements

Suggested change
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)
assert type(df2['c']._data.blocks[0]) == ObjectBlock

Same for the line below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these parens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do so in the next commit

@TomAugspurger
Copy link
Contributor

The release note can go in the "Other" section.

@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array():
result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype)
assert result.is_integer is True
assert result.is_extension is False


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually need this test; rather in the test right above, check the result.values is is the correct type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was already present, was this valid only prior to the decision of converting PandasArray to numpy array?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2) with the right placement. It wouldn't hurt to add that, but I think keep the test below.

@jreback jreback removed this from the 0.25.0 milestone May 15, 2019
doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array():
result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype)
assert result.is_integer is True
assert result.is_extension is False


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2) with the right placement. It wouldn't hurt to add that, but I think keep the test below.

df['c'] = pd.array([1, 2, None, 3])
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
'c': pd.array([1, 2, None, 3])})
assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these parens?

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
pandas/core/internals/blocks.py Show resolved Hide resolved
pandas/tests/internals/test_internals.py Outdated Show resolved Hide resolved
pandas/tests/internals/test_internals.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 0.25.0 milestone May 19, 2019
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label May 19, 2019
@jreback jreback merged commit f44ac06 into pandas-dev:master May 19, 2019
@jreback
Copy link
Contributor

jreback commented May 19, 2019

thanks @shantanu-gontia

@shantanu-gontia shantanu-gontia deleted the bug26390 branch May 19, 2019 19:10
@shantanu-gontia shantanu-gontia restored the bug26390 branch May 19, 2019 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: assigning Series.array / PandasArray to column fails
3 participants