Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality comparison raises exception #7830

Closed
dhirschfeld opened this issue Jul 24, 2014 · 8 comments · Fixed by #22074
Closed

Equality comparison raises exception #7830

dhirschfeld opened this issue Jul 24, 2014 · 8 comments · Fixed by #22074
Labels
Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@dhirschfeld
Copy link
Contributor

Test case:

def test_datetimeindex__eq__():
    """Equality comparisons should never raise an exception"""
    pd.DatetimeIndex(['01-Jan-2015']) == ()
In [115]: test_datetimeindex__eq__()
Traceback (most recent call last):

  File "<ipython-input-115-9b7967ca9de2>", line 1, in <module>
    test_datetimeindex__eq__()

  File "<ipython-input-114-7ae5783187b8>", line 3, in test_datetimeindex__eq__
    pd.DatetimeIndex(['01-Jan-2015']) == ()

  File "C:\dev\bin\Anaconda\lib\site-packages\pandas\tseries\index.py", line 90, in wrapper
    other = _ensure_datetime64(other)

  File "C:\dev\bin\Anaconda\lib\site-packages\pandas\tseries\index.py", line 112, in _ensure_datetime64
    raise TypeError('%s type object %s' % (type(other), str(other)))

TypeError: <type 'tuple'> type object ()
@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

most ops with a rhs of a non-Index/ndarray don't make any sense.

e.g.

what should

DateTimeIndex([1-Jan-2015']) == 1 do?

even against a list/tuple/ndarray what are they actually checking? that ALL the values match, that some match, that they are the same object?

I don't think these are actually used anywhere

what are you doing ?

@shoyer
Copy link
Member

shoyer commented Jul 25, 2014

I agree that these comparison don't make sense, but in that case, the standard behavior would be to return False (or perhaps better yet, NotImplemented), not to raise an exception.

e.g., look at what numpy does for your example:

In [19]: DatetimeIndex(['1-Jan-2015']).values == 1
Out[19]: False

I agree there is ambiguity over whether such methods should return a single value or an array.

@dhirschfeld
Copy link
Contributor Author

I had a generic method which could take a scalar argument or list of arguments and I wanted to test for the case where no arguments were passed. The type of the arguments could be anything and the method broke when a DatetimeIndex was passed.

I know under normal circumstances array equality checks the equality of elements, but it seems that in this case numpy does simply return a False

In [17]: randn(10) == ()
Out[17]: False

It turns out it was a badly designed method and so I've improved it and no longer have the problem however perhaps it would be appropriate to assume that any equality comparison which raises an error should return False - i.e.

def __eq__(self):
    try:
        ...
    except Exception:
        return False

@jreback jreback added Excel and removed Excel labels Jul 25, 2014
@jreback jreback added this to the 0.15.0 milestone Jul 25, 2014
@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

@dhirschfeld care to submit a PR for this?

@dhirschfeld
Copy link
Contributor Author

Can do, but it will have to wait a couple of weeks as I'll be on holiday 😄

@jtratner
Copy link
Contributor

jtratner commented Sep 7, 2014

@dhirschfeld not 100% sure going the 'catch all exceptions' route is the best bet. If we return a naked bool from any equality method, it leads to a different class of hidden errors (where the expectation is that all pandas ops return a PandasObject that you can keep working with) and pushes into all the ambiguities with the truthiness of an array.

E.g., here's some edge-case numpy behavior:

In [3]: arr1 = np.array([1, 2, 3, 4, 5, 6])

In [10]: arr2 = np.array([1, 2])

In [11]: arr1 == arr1
Out[11]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

In [12]: arr1 == arr2
Out[12]: False

which leads down the road of:

In [13]: bool(arr1 == arr1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-dcde701319c0> in <module>()
----> 1 bool(arr1 == arr1)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [14]: bool(arr1 == arr2)
Out[14]: False

so maybe we should shoot for "Let's give a better exception" vs. falling back on to False and swallowing a different class of errors. NotImplemented hits the same ambiguity, because it ends up as False if both side return NotImplemented:

In [18]: class MyObject(object):
   ....:     def __eq__(self, *args, **kwargs):
   ....:         return NotImplemented
   ....:

In [19]: o1 = MyObject()

In [20]: o2 = MyObject()

In [21]: o1 == o2
Out[21]: False

In [22]: o1 == 1
Out[22]: False

@chris-b1
Copy link
Contributor

xref to some discussion on the numpy approach to this issue
numpy/numpy#6784

@jreback
Copy link
Contributor

jreback commented Jul 6, 2018

as of:

In [15]: pd.__version__
Out[15]: '0.24.0.dev0+243.g30eb48cc4'

In [9]: pd.DatetimeIndex(['01-Jan-2015']) == ()
Out[9]: array([False])

In [12]: pd.DatetimeIndex(['01-Jan-2015']) == 'foo'
ValueError: could not convert string to Timestamp

In [13]: pd.DatetimeIndex(['01-Jan-2015']) == []
ValueError: Lengths must match to compare

so this still needs some work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants