Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: comparing multicolumn dataframe with datetime64 values to series gives TypeError #9006

Open
jorisvandenbossche opened this issue Dec 4, 2014 · 6 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 4, 2014

When trying to compare a dataframe to a column/series (I know, in the following case not useful due to the alignement of the series with the columns of the dataframe and not the rows, but it is something typical users will try), I get the correct results if there are strings in the dataframe and series, but a TypeError when the dataframe contains datetime values:

In [1]: from io import StringIO

In [2]: s = """id       date birth_date_1 birth_date_2
   ...: 1 2000-01-01   2000-01-03   2000-01-05
   ...: 1 2000-01-07   2000-01-03   2000-01-05
   ...: 2 2000-01-02   2000-01-10   2000-01-01
   ...: 2 2000-01-05   2000-01-10   2000-01-01"""

In [3]: df = pd.read_csv(StringIO(s), sep='\s+')

In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
       0      1      2      3 birth_date_1 birth_date_2
0  False  False  False  False         True         True
1  False  False  False  False         True         True
2  False  False  False  False         True         True
3  False  False  False  False         True         True

In [7]: df = pd.read_csv(StringIO(s), sep='\s+', parse_dates=[1,2,3])

In [8]: df[['birth_date_1','birth_date_2']] > df['date']
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in handle_error()

    954             if raise_on_error:
    955                 raise TypeError('Could not operate %s with block values
%s'
--> 956                                 % (repr(other), str(detail)))
    957             else:
    958                 # return the values

TypeError: Could not operate array(['2000-01-01T01:00:00.000000000+0100',
       '2000-01-07T01:00:00.000000000+0100',
       '2000-01-02T01:00:00.000000000+0100',
       '2000-01-05T01:00:00.000000000+0100'], dtype='datetime64[ns]') with block
 values invalid type promotion
@jorisvandenbossche jorisvandenbossche changed the title BUG: BUG: comparing dataframe with datetime64 values to series gives TypeError Dec 4, 2014
@jorisvandenbossche
Copy link
Member Author

Although I am not sure this is the correct result:

In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
       0      1      2      3 birth_date_1 birth_date_2
0  False  False  False  False         True         True
1  False  False  False  False         True         True
2  False  False  False  False         True         True
3  False  False  False  False         True         True

There are no overlapping elements between the dataframe and series, but why then sometimes True and sometimes False?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

this is quite tricky; datetimes are not handled in a multi-column vectorized way correctly

xref to #8554. I think I can fix this but its a bit tricky.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Dec 4, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 4, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jbrockmendel
Copy link
Member

@jorisvandenbossche I'm not entirely clear on what the issue is here. Is it about broadcasting? Maybe it has been resolved in the interim?

@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 23, 2019
@mroeschke
Copy link
Member

I think the first case raises a sensible error now (not date parsed)

TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'

The 2nd case doesn't seem to raise a sensible error as there is no float column being compared

TypeError: '<' not supported between instances of 'Timestamp' and 'float'
In [60]: pd.__version__
Out[60]: '1.1.0.dev0+1027.g767335719'

@mroeschke mroeschke changed the title BUG: comparing dataframe with datetime64 values to series gives TypeError BUG: comparing multicolumn dataframe with datetime64 values to series gives TypeError Mar 31, 2020
@mroeschke mroeschke added Bug Error Reporting Incorrect or improved errors from pandas and removed Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Apr 11, 2021
@jbrockmendel
Copy link
Member

IIUC reindexing is introducing float (all-nan) columns, which then raise on comparison. That automatic reindexing was deprecated in #36795. we could try to get something in for 1.4 to give a better exception message, but i dont think its worth the trouble

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

This now correctly raises because automatic alignment deprecation has been enforced. Is there another bug after that surfaces if we manually align before the comparison?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants