BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

shantanu-gontia · 2019-05-15T18:39:03Z

closes BUG: assigning Series.array / PandasArray to column fails #26390
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Which section should i add the whatsnew entry in for this particular case?

TomAugspurger

Not quite sure this is the right place. I think putting the reshape into make_block is a bit closer to what we want?

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index 0c49ebb55..54d295cf0 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None,
     # For now, blocks should be backed by ndarrays when possible.
     if isinstance(values, ABCPandasArray):
         values = values.to_numpy()
+        if ndim and ndim > 1:
+            values = np.atleast_2d(values)
     if isinstance(dtype, PandasDtype):
         dtype = dtype.numpy_dtype

TomAugspurger · 2019-05-15T18:43:32Z

pandas/core/frame.py

+        # convert pandas array to numpy array
+        if isinstance(value, ABCPandasArray):
+            value = value.to_numpy()
+            return np.atleast_2d(np.asarray(value))


Why the np.asarray?

And do you even need this early return? Can you not just do value = value.to_numpy()?

Why the np.asarray?

The reason for np.asarray was because the same was done if a Numpy array was passed to the column.

And do you even need this early return? Can you not just do value = value.to_numpy()?

You're right though, I shouldn't be returning this early, I had a misconception that the next check would return the wrong value

# return internal types directly if is_extension_type(value) or is_extension_array_dtype(value): return value

Will fix this.

Not quite sure this is the right place. I think putting the reshape into make_block is a bit closer to what we want?

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py index 0c49ebb55..54d295cf0 100644 --- a/pandas/core/internals/blocks.py +++ b/pandas/core/internals/blocks.py @@ -3035,6 +3035,8 @@ def make_block(values, placement, klass=None, ndim=None, dtype=None, # For now, blocks should be backed by ndarrays when possible. if isinstance(values, ABCPandasArray): values = values.to_numpy() + if ndim and ndim > 1: + values = np.atleast_2d(values) if isinstance(dtype, PandasDtype): dtype = dtype.numpy_dtype

I guess putting the conversion in make_blocks is a better idea, it was the first thing that came to mind, but I thought it to be a sanitary process, just like the conversion of NumpyArray to a 2D Format.

Anyway I'll make the requisite changes

TomAugspurger · 2019-05-15T18:46:40Z

pandas/tests/internals/test_internals.py

+    df['c'] = pd.array([1, 2, None, 3])
+    df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
+                        'c': pd.array([1, 2, None, 3])})
+    assert_frame_equal(df, df2)


Can you assert that df2['c']._data.blocks[0] is an ObjectBlock (not an extension block).

Will add these checks

codecov · 2019-05-15T19:16:07Z

Codecov Report

Merging #26417 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26417      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50739    50742       +3     
==========================================
- Hits        46524    46523       -1     
- Misses       4215     4219       +4

Flag	Coverage Δ
#multiple	`90.19% <100%> (ø)`	⬆️
#single	`41.16% <33.33%> (-0.18%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.07% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.02% <100%> (-0.12%)`	⬇️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff4437e...fcd29ed. Read the comment docs.

codecov · 2019-05-15T19:16:07Z

Codecov Report

Merging #26417 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26417      +/-   ##
==========================================
- Coverage   91.73%   91.72%   -0.01%     
==========================================
  Files         174      174              
  Lines       50741    50756      +15     
==========================================
+ Hits        46548    46558      +10     
- Misses       4193     4198       +5

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.69% <100%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.08% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️
pandas/util/testing.py	`90.6% <0%> (-0.11%)`	⬇️
pandas/core/base.py	`97.99% <0%> (-0.01%)`	⬇️
pandas/tseries/offsets.py	`96.69% <0%> (-0.01%)`	⬇️
pandas/core/indexes/datetimes.py	`96.85% <0%> (ø)`	⬆️
pandas/core/reshape/reshape.py	`99.56% <0%> (ø)`	⬆️
pandas/io/pytables.py	`90.22% <0%> (ø)`	⬆️
pandas/core/window.py	`96.41% <0%> (ø)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1263e1a...997c0a5. Read the comment docs.

TomAugspurger · 2019-05-15T21:54:17Z

pandas/tests/internals/test_internals.py

+    df['c'] = pd.array([1, 2, None, 3])
+    df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
+                        'c': pd.array([1, 2, None, 3])})
+    assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)


We want assert statements

Suggested change

assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)

assert type(df2['c']._data.blocks[0]) == ObjectBlock

Same for the line below.

Can you remove these parens?

sure, will do so in the next commit

TomAugspurger · 2019-05-15T21:54:50Z

The release note can go in the "Other" section.

jreback · 2019-05-15T23:35:06Z

pandas/tests/internals/test_internals.py

@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array():
    result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype)
    assert result.is_integer is True
    assert result.is_extension is False
+
+


I don't think you actually need this test; rather in the test right above, check the result.values is is the correct type

this test was already present, was this valid only prior to the decision of converting PandasArray to numpy array?

I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2) with the right placement. It wouldn't hurt to add that, but I think keep the test below.

doc/source/whatsnew/v0.25.0.rst

TomAugspurger · 2019-05-17T16:54:06Z

pandas/tests/internals/test_internals.py

@@ -1310,3 +1311,14 @@ def test_make_block_no_pandas_array():
    result = make_block(arr.to_numpy(), slice(len(arr)), dtype=arr.dtype)
    assert result.is_integer is True
    assert result.is_extension is False
+
+


I don't think that wouuld quite hit the issue. I think you would need to add a new result = make_black(arr, ..., ndim=2) with the right placement. It wouldn't hurt to add that, but I think keep the test below.

TomAugspurger · 2019-05-17T16:54:23Z

pandas/tests/internals/test_internals.py

+    df['c'] = pd.array([1, 2, None, 3])
+    df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'],
+                        'c': pd.array([1, 2, None, 3])})
+    assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)


Can you remove these parens?

doc/source/whatsnew/v0.25.0.rst

pandas/core/internals/blocks.py

pandas/tests/internals/test_internals.py

jreback · 2019-05-19T18:11:33Z

thanks @shantanu-gontia

bugfix 26390 assigning pandas array to column works

fcd29ed

TomAugspurger reviewed May 15, 2019

View reviewed changes

TomAugspurger added the Internals Related to non-user accessible pandas implementation label May 15, 2019

TomAugspurger added this to the 0.25.0 milestone May 15, 2019

shantanu-gontia added 2 commits May 16, 2019 00:50

put conversion to ndarray in make_block

a4f6e89

fixed imports

b3da29f

TomAugspurger reviewed May 15, 2019

View reviewed changes

jreback requested changes May 15, 2019

View reviewed changes

jreback removed this from the 0.25.0 milestone May 15, 2019

shantanu-gontia added 2 commits May 16, 2019 21:50

Merge remote-tracking branch 'upstream/master' into bug26390

c44c6e3

added release note message

8a4c87f

TomAugspurger reviewed May 17, 2019

View reviewed changes

jreback requested changes May 18, 2019

View reviewed changes

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved

pandas/core/internals/blocks.py Show resolved Hide resolved

pandas/tests/internals/test_internals.py Outdated Show resolved Hide resolved

pandas/tests/internals/test_internals.py Outdated Show resolved Hide resolved

shantanu-gontia added 3 commits May 19, 2019 07:17

changed test location to block_internals

cd1d38b

changed section of release note

f1e3098

fixed import sort order

997c0a5

jreback added this to the 0.25.0 milestone May 19, 2019

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label May 19, 2019

jreback approved these changes May 19, 2019

View reviewed changes

jreback merged commit f44ac06 into pandas-dev:master May 19, 2019

shantanu-gontia deleted the bug26390 branch May 19, 2019 19:10

shantanu-gontia restored the bug26390 branch May 19, 2019 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

shantanu-gontia commented May 15, 2019 •

edited

Loading

TomAugspurger left a comment

TomAugspurger May 15, 2019

TomAugspurger May 15, 2019

shantanu-gontia May 15, 2019 •

edited

Loading

shantanu-gontia May 15, 2019 •

edited

Loading

TomAugspurger May 15, 2019

shantanu-gontia May 15, 2019

codecov bot commented May 15, 2019

codecov bot commented May 15, 2019 •

edited

Loading

TomAugspurger May 15, 2019

TomAugspurger May 17, 2019

shantanu-gontia May 19, 2019

TomAugspurger commented May 15, 2019

jreback May 15, 2019

shantanu-gontia May 16, 2019

TomAugspurger May 17, 2019

TomAugspurger May 17, 2019

TomAugspurger May 17, 2019

jreback commented May 19, 2019

	assert(df2['c']._data.blocks[0].__class__ == ObjectBlock)
	assert type(df2['c']._data.blocks[0]) == ObjectBlock

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

BUG: bugfix 26390 assigning PandasArray to DataFrame error #26417

Conversation

shantanu-gontia commented May 15, 2019 • edited Loading

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shantanu-gontia May 15, 2019 • edited Loading

Choose a reason for hiding this comment

shantanu-gontia May 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 15, 2019

Codecov Report

codecov bot commented May 15, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented May 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 19, 2019

shantanu-gontia commented May 15, 2019 •

edited

Loading

shantanu-gontia May 15, 2019 •

edited

Loading

shantanu-gontia May 15, 2019 •

edited

Loading

codecov bot commented May 15, 2019 •

edited

Loading