EHN: Improve from_items error message (#17312) #17881

reidy-p · 2017-10-15T17:34:54Z

closes ENH: Improve error message when using DataFrame.from_items instead of DataFrame.from_records #17312
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-10-15T18:12:44Z

Codecov Report

Merging #17881 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17881      +/-   ##
==========================================
+ Coverage   91.23%   91.24%   +<.01%     
==========================================
  Files         163      163              
  Lines       50102    50106       +4     
==========================================
+ Hits        45712    45719       +7     
+ Misses       4390     4387       -3

Flag	Coverage Δ
#multiple	`89.05% <100%> (+0.02%)`	⬆️
#single	`40.31% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.75% <100%> (-0.1%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aed9b92...a440e50. Read the comment docs.

codecov · 2017-10-15T18:13:13Z

Codecov Report

Merging #17881 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17881      +/-   ##
==========================================
+ Coverage    91.3%   91.32%   +0.02%     
==========================================
  Files         163      163              
  Lines       49781    49789       +8     
==========================================
+ Hits        45451    45471      +20     
+ Misses       4330     4318      -12

Flag	Coverage Δ
#multiple	`89.12% <100%> (+0.02%)`	⬆️
#single	`40.71% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.81% <100%> (ø)`	⬆️
pandas/plotting/_converter.py	`65.25% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...fc7fd26. Read the comment docs.

gfyoung · 2017-10-16T00:36:23Z

pandas/core/frame.py

@@ -1258,6 +1258,13 @@ def from_items(cls, items, columns=None, orient='columns'):
        """
        keys, values = lzip(*items)

+        import array


@jreback : I feel like we must have some function checking if an object is array-like?

isn't is_list_like enough in this case?

is_list_like seems to work fine. I'll update the PR.

jreback · 2017-10-16T11:26:35Z

pandas/core/frame.py

@@ -1258,6 +1258,11 @@ def from_items(cls, items, columns=None, orient='columns'):
        """
        keys, values = lzip(*items)

+        for val in values:


NO, this is completely non-performant.

you need to catch the error and then do the check.

jreback · 2017-10-16T18:08:02Z

pandas/core/frame.py

+            try:
+                return cls._from_arrays(arrays, columns, None)
+
+            except ValueError:


why are you not simply doing this inside ._from_arrays ? you only would need this once

In the case where orient == 'index' we don't get to ._from_arrays because the previous line

data = [lib.maybe_convert_objects(v) for v in arr]

throws the error:

TypeError: Argument 'objects' has incorrect type (expected numpy.ndarray, got int)

ok i see what you are doing. can you add a comment to each section about what you are guarding against here.
we normally don't like to try/except around multiple statements but this is really a 'more informative message' guard.

jreback · 2017-10-16T18:08:50Z

pandas/tests/frame/test_constructors.py

@@ -1205,6 +1205,18 @@ def test_constructor_from_items(self):
                       columns=['one', 'two', 'three'])
        tm.assert_frame_equal(rs, xp)

+        # GH 17312


make this a new tests, also tests DataFrame(dict(...))

Not sure what you mean by the second point here. From what I can tell from_items only works with a list of (key, value) tuples and not a dict? So how do I do a test with a dict?
Thanks.

I was saying, split off these added cases into a new tests.

also test DataFrame(dict(....)) with the same input, its a dict from the tuples (it will yield the same error messages)

e.g. on current master

In [1]: DataFrame.from_items([('A', 1), ('B', 4)]) ValueError: If using all scalar values, you must pass an index In [3]: DataFrame(dict({'A': 1, 'B': 4})) ValueError: If using all scalar values, you must pass an index

jreback

inline

jreback · 2017-10-28T00:09:00Z

pandas/core/frame.py

+            try:
+                return cls._from_arrays(arrays, columns, None)
+
+            except ValueError:


ok i see what you are doing. can you add a comment to each section about what you are guarding against here.
we normally don't like to try/except around multiple statements but this is really a 'more informative message' guard.

jreback · 2017-10-28T00:09:24Z

pandas/core/frame.py

        elif orient == 'index':
            if columns is None:
                raise TypeError("Must pass columns with orient='index'")

-            keys = _ensure_index(keys)
+            try:
+                keys = _ensure_index(keys)


I don't think _ensure_index can raise here

jreback · 2017-10-28T00:12:08Z

pandas/tests/frame/test_constructors.py

@@ -1205,6 +1205,18 @@ def test_constructor_from_items(self):
                       columns=['one', 'two', 'three'])
        tm.assert_frame_equal(rs, xp)

+        # GH 17312


I was saying, split off these added cases into a new tests.

also test DataFrame(dict(....)) with the same input, its a dict from the tuples (it will yield the same error messages)

e.g. on current master

In [1]: DataFrame.from_items([('A', 1), ('B', 4)]) ValueError: If using all scalar values, you must pass an index In [3]: DataFrame(dict({'A': 1, 'B': 4})) ValueError: If using all scalar values, you must pass an index

jreback · 2017-10-28T00:12:57Z

can you add a note on 0.22

reidy-p · 2017-10-31T22:01:56Z

The current behaviour of from_items on master when passed a (key, value) pair with a scalar value is:

In [1]: pd.DataFrame.from_items([('a', 1), ('b', 2)])
Out[1]: ValueError: If using all scalar values, you must pass an index 

In [2]: pd.DataFrame.from_items([('a', 1), ('b', 2)], columns=['col1'], orient='index')
Out[2]: TypeError: Argument 'objects' has incorrect type (expected numpy.ndarray, got int)

These error messages are not very helpful (from_items doesn't have an index parameter, for example). So I have tried to provide more informative error messages:

In [3]: pd.DataFrame.from_items([('a', 1), ('b', 2)])
Out[3]: TypeError: The value in each (key, value) pair must be an array, Series, or dict

In [4]: pd.DataFrame.from_items([('a', 1), ('b', 2)], columns=['col1'], orient='index')
Out[4]: TypeError: The value in each (key, value) pair must be an array, Series, or dict

On the current master pd.DataFrame(dict(..)) and pd.DataFrame.from_dict(dict(..)) raise the same error message as from_items when passed scalar values:

In [5]: pd.DataFrame({'A': 1, 'B': 2})
Out[5]: ValueError: If using all scalar values, you must pass an index 

In [6]: pd.DataFrame.from_dict({'A': 1, 'B': 2})
Out[6]: ValueError: If using all scalar values, you must pass an index

However, trying to change these error messages in a similar way to from_items caused problems and seems to affect other functions such as df.agg({}):

In [7]: pd.DataFrame({'A': 1, 'B': 2})
Out[7]: TypeError: The value in each key:value pair must be an array, Series, or dict

In [8]: pd.DataFrame.from_dict({'A': 1, 'B': 2})
Out[8]: TypeError: The value in each key:value pair must be an array, Series, or dict

In [9]:df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)})
In [10]:df.agg({'A': 'mean'})
Out[10]:
RecursionError: maximum recursion depth exceeded

So in the updated pull request I have only changed the error message for from_items. I haven't changed the error message for pd.DataFrame(dict(..)) or from_dict but have modified the pd.DataFrame(dict(..)) tests to check for the current error message. Is this solution acceptable or, if not, does anyone have any suggestions on how to proceed? Or maybe pd.DataFrame(dict(..)) or from_dict could be discussed in a separate issue if needed?

jreback · 2017-11-25T16:21:01Z

can you rebase / update

jreback · 2017-11-25T20:59:48Z

one more rebase and should be good

jreback · 2017-11-25T22:15:55Z

pandas/core/frame.py

+
+            except ValueError:
+                if not is_nested_list_like(values):
+                    raise TypeError('The value in each (key, value) pair must '


make these ValueErrors to be consistent with the scalar error

jreback · 2017-11-25T22:16:10Z

pandas/tests/frame/test_constructors.py

@@ -1204,6 +1205,19 @@ def test_constructor_from_items(self):
                       columns=['one', 'two', 'three'])
        tm.assert_frame_equal(rs, xp)

+    def test_constructor_from_items_scalars(self):
+        # GH 17312
+        with tm.assert_raises_regex(TypeError,


e.g. ValueError here

jreback · 2017-11-26T00:08:06Z

lgtm. ping on green.

reidy-p · 2017-11-26T14:00:25Z

@jreback thanks. It's green now.

jreback · 2017-11-26T15:09:29Z

thanks!

jreback · 2017-11-26T15:10:19Z

as an aside I think its ok to deprecate .from_items, e.g. #18262 as its trivially replace by dict(...)

gfyoung added the Error Reporting Incorrect or improved errors from pandas label Oct 16, 2017

gfyoung reviewed Oct 16, 2017

View reviewed changes

jreback requested changes Oct 16, 2017

View reviewed changes

toobaz mentioned this pull request Oct 16, 2017

API: add "level=" argument to MultiIndex.unique() #17897

Merged

4 tasks

jreback reviewed Oct 16, 2017

View reviewed changes

jreback requested changes Oct 16, 2017

View reviewed changes

jreback requested changes Oct 28, 2017

View reviewed changes

reidy-p force-pushed the from_items_error branch 2 times, most recently from 0d4ff3f to e57b142 Compare October 31, 2017 21:59

reidy-p force-pushed the from_items_error branch from e57b142 to 1aa4a56 Compare November 25, 2017 17:03

jreback added this to the 0.22.0 milestone Nov 25, 2017

reidy-p force-pushed the from_items_error branch from 1aa4a56 to d299ecd Compare November 25, 2017 21:04

jreback requested changes Nov 25, 2017

View reviewed changes

reidy-p force-pushed the from_items_error branch from 306be68 to 6e6a5e0 Compare November 26, 2017 00:01

jreback approved these changes Nov 26, 2017

View reviewed changes

reidy-p added 6 commits November 26, 2017 11:17

EHN: Improve from_items error message (pandas-dev#17312)

a499d32

Using is_list_like to check value

1465999

Move check and add another test

aec9b33

add comments and tests

7e63564

Change TypeErrors to ValueErrors

6d7b4cb

fix line lengths

fc7fd26

reidy-p force-pushed the from_items_error branch from 36a31ec to fc7fd26 Compare November 26, 2017 11:17

jreback merged commit f6fe089 into pandas-dev:master Nov 26, 2017

reidy-p deleted the from_items_error branch December 10, 2017 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EHN: Improve from_items error message (#17312) #17881

EHN: Improve from_items error message (#17312) #17881

reidy-p commented Oct 15, 2017 •

edited

Loading

codecov bot commented Oct 15, 2017

codecov bot commented Oct 15, 2017 •

edited

Loading

gfyoung Oct 16, 2017

jorisvandenbossche Oct 16, 2017

reidy-p Oct 16, 2017

jreback Oct 16, 2017

jreback Oct 16, 2017

reidy-p Oct 16, 2017

jreback Oct 28, 2017

jreback Oct 16, 2017

reidy-p Oct 16, 2017

jreback Oct 28, 2017

jreback left a comment

jreback Oct 28, 2017

jreback Oct 28, 2017

jreback Oct 28, 2017

jreback commented Oct 28, 2017

reidy-p commented Oct 31, 2017

jreback commented Nov 25, 2017

jreback commented Nov 25, 2017

jreback Nov 25, 2017

jreback Nov 25, 2017

jreback commented Nov 26, 2017

reidy-p commented Nov 26, 2017

jreback commented Nov 26, 2017

jreback commented Nov 26, 2017

EHN: Improve from_items error message (#17312) #17881

EHN: Improve from_items error message (#17312) #17881

Conversation

reidy-p commented Oct 15, 2017 • edited Loading

codecov bot commented Oct 15, 2017

Codecov Report

codecov bot commented Oct 15, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 28, 2017

reidy-p commented Oct 31, 2017

jreback commented Nov 25, 2017

jreback commented Nov 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 26, 2017

reidy-p commented Nov 26, 2017

jreback commented Nov 26, 2017

jreback commented Nov 26, 2017

reidy-p commented Oct 15, 2017 •

edited

Loading

codecov bot commented Oct 15, 2017 •

edited

Loading