-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EHN: Improve from_items error message (#17312) #17881
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17881 +/- ##
==========================================
+ Coverage 91.23% 91.24% +<.01%
==========================================
Files 163 163
Lines 50102 50106 +4
==========================================
+ Hits 45712 45719 +7
+ Misses 4390 4387 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17881 +/- ##
==========================================
+ Coverage 91.3% 91.32% +0.02%
==========================================
Files 163 163
Lines 49781 49789 +8
==========================================
+ Hits 45451 45471 +20
+ Misses 4330 4318 -12
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
@@ -1258,6 +1258,13 @@ def from_items(cls, items, columns=None, orient='columns'): | |||
""" | |||
keys, values = lzip(*items) | |||
|
|||
import array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : I feel like we must have some function checking if an object is array-like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't is_list_like
enough in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_list_like
seems to work fine. I'll update the PR.
pandas/core/frame.py
Outdated
@@ -1258,6 +1258,11 @@ def from_items(cls, items, columns=None, orient='columns'): | |||
""" | |||
keys, values = lzip(*items) | |||
|
|||
for val in values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO, this is completely non-performant.
you need to catch the error and then do the check.
pandas/core/frame.py
Outdated
try: | ||
return cls._from_arrays(arrays, columns, None) | ||
|
||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you not simply doing this inside ._from_arrays
? you only would need this once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where orient == 'index'
we don't get to ._from_arrays
because the previous line
data = [lib.maybe_convert_objects(v) for v in arr]
throws the error:
TypeError: Argument 'objects' has incorrect type (expected numpy.ndarray, got int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see what you are doing. can you add a comment to each section about what you are guarding against here.
we normally don't like to try/except around multiple statements but this is really a 'more informative message' guard.
@@ -1205,6 +1205,18 @@ def test_constructor_from_items(self): | |||
columns=['one', 'two', 'three']) | |||
tm.assert_frame_equal(rs, xp) | |||
|
|||
# GH 17312 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a new tests, also tests DataFrame(dict(...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by the second point here. From what I can tell from_items
only works with a list of (key, value) tuples and not a dict? So how do I do a test with a dict?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying, split off these added cases into a new tests.
also test DataFrame(dict(....))
with the same input, its a dict from the tuples (it will yield the same error messages)
e.g. on current master
In [1]: DataFrame.from_items([('A', 1), ('B', 4)])
ValueError: If using all scalar values, you must pass an index
In [3]: DataFrame(dict({'A': 1, 'B': 4}))
ValueError: If using all scalar values, you must pass an index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline
pandas/core/frame.py
Outdated
try: | ||
return cls._from_arrays(arrays, columns, None) | ||
|
||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see what you are doing. can you add a comment to each section about what you are guarding against here.
we normally don't like to try/except around multiple statements but this is really a 'more informative message' guard.
pandas/core/frame.py
Outdated
elif orient == 'index': | ||
if columns is None: | ||
raise TypeError("Must pass columns with orient='index'") | ||
|
||
keys = _ensure_index(keys) | ||
try: | ||
keys = _ensure_index(keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think _ensure_index can raise here
@@ -1205,6 +1205,18 @@ def test_constructor_from_items(self): | |||
columns=['one', 'two', 'three']) | |||
tm.assert_frame_equal(rs, xp) | |||
|
|||
# GH 17312 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying, split off these added cases into a new tests.
also test DataFrame(dict(....))
with the same input, its a dict from the tuples (it will yield the same error messages)
e.g. on current master
In [1]: DataFrame.from_items([('A', 1), ('B', 4)])
ValueError: If using all scalar values, you must pass an index
In [3]: DataFrame(dict({'A': 1, 'B': 4}))
ValueError: If using all scalar values, you must pass an index
can you add a note on 0.22 |
0d4ff3f
to
e57b142
Compare
The current behaviour of
These error messages are not very helpful (
On the current master
However, trying to change these error messages in a similar way to
So in the updated pull request I have only changed the error message for |
can you rebase / update |
e57b142
to
1aa4a56
Compare
one more rebase and should be good |
1aa4a56
to
d299ecd
Compare
pandas/core/frame.py
Outdated
|
||
except ValueError: | ||
if not is_nested_list_like(values): | ||
raise TypeError('The value in each (key, value) pair must ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make these ValueErrors to be consistent with the scalar error
@@ -1204,6 +1205,19 @@ def test_constructor_from_items(self): | |||
columns=['one', 'two', 'three']) | |||
tm.assert_frame_equal(rs, xp) | |||
|
|||
def test_constructor_from_items_scalars(self): | |||
# GH 17312 | |||
with tm.assert_raises_regex(TypeError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. ValueError here
306be68
to
6e6a5e0
Compare
lgtm. ping on green. |
36a31ec
to
fc7fd26
Compare
@jreback thanks. It's green now. |
thanks! |
as an aside I think its ok to deprecate |
git diff upstream/master -u -- "*.py" | flake8 --diff