Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can not assign a datetime index into a dataframe #1135

Closed
mrocklin opened this issue Mar 7, 2019 · 2 comments
Closed

[BUG] Can not assign a datetime index into a dataframe #1135

mrocklin opened this issue Mar 7, 2019 · 2 comments
Labels
bug Something isn't working dask Dask issue Python Affects Python cuDF API.

Comments

@mrocklin
Copy link
Collaborator

mrocklin commented Mar 7, 2019

Things seem to be unhappy when we assign a datetime index into a dataframe.

This comes up when I try to do indexed joins

import pandas as pd
df = pd.date_range('2000', '2001', freq='1M').to_frame()

import cudf
gdf = cudf.from_pandas(df)
gdf['foo'] = gdf.index
---------------------------------------------------------------------------
GDFError                                  Traceback (most recent call last)
<ipython-input-6-d57fa9fc0872> in <module>
----> 1 gdf['foo'] = gdf.index

~/cudf/python/cudf/dataframe/dataframe.py in __setitem__(self, name, col)
    268             self._cols[name] = self._prepare_series_for_add(col)
    269         else:
--> 270             self.add_column(name, col)
    271
    272     def __delitem__(self, name):

~/cudf/python/cudf/dataframe/dataframe.py in add_column(self, name, data, forceindex)
    794         if isinstance(data, GeneratorType):
    795             data = Series(data)
--> 796         series = self._prepare_series_for_add(data, forceindex=forceindex)
    797         series.name = name
    798         self._cols[name] = series

~/cudf/python/cudf/dataframe/dataframe.py in _prepare_series_for_add(self, col, forceindex)
    770         empty_index = len(self._index) == 0
    771         series = Series(col)
--> 772         if forceindex or empty_index or self._index.equals(series.index):
    773             if empty_index:
    774                 self._index = series.index

~/cudf/python/cudf/dataframe/index.py in equals(self, other)
    184         if len(self) != len(other):
    185             return False
--> 186         return (self == other)._values.all()
    187
    188     def join(self, other, method, how='left', return_indexers=False):

~/cudf/python/cudf/dataframe/index.py in __eq__(self, other)
    164
    165     def __eq__(self, other):
--> 166         return self._apply_op('__eq__', other)
    167
    168     def __ne__(self, other):

~/cudf/python/cudf/dataframe/index.py in _apply_op(self, fn, other)
    124         op = getattr(idx_series, fn)
    125         if other is not None:
--> 126             return as_index(op(other))
    127         else:
    128             return as_index(op())

~/cudf/python/cudf/dataframe/series.py in __eq__(self, other)
    459
    460     def __eq__(self, other):
--> 461         return self._unordered_compare(other, 'eq')
    462
    463     def __ne__(self, other):

~/cudf/python/cudf/dataframe/series.py in _unordered_compare(self, other, cmpops)
    443         nvtx_range_push("CUDF_UNORDERED_COMP", "orange")
    444         other = self._normalize_binop_value(other)
--> 445         outcol = self._column.unordered_compare(cmpops, other._column)
    446         result = self._copy_construct(data=outcol)
    447         result.name = None

~/cudf/python/cudf/dataframe/datetime.py in unordered_compare(self, cmpop, rhs)
    148             lhs, rhs,
    149             op=_unordered_impl[cmpop],
--> 150             out_dtype=np.bool
    151         )
    152

~/cudf/python/cudf/dataframe/datetime.py in binop(lhs, rhs, op, out_dtype)
    214     masked = lhs.has_null_mask or rhs.has_null_mask
    215     out = columnops.column_empty_like(lhs, dtype=out_dtype, masked=masked)
--> 216     null_count = _gdf.apply_binaryop(op, lhs, rhs, out)
    217     out = out.replace(null_count=null_count)
    218     nvtx_range_pop()

~/cudf/python/cudf/_gdf.py in apply_binaryop(binop, lhs, rhs, out)
    100     args = (lhs.cffi_view, rhs.cffi_view, out.cffi_view)
    101     # apply binary operator
--> 102     binop(*args)
    103     # validity mask
    104     if out.has_null_mask:

~/miniconda/envs/cudf/lib/python3.7/site-packages/libgdf_cffi/wrapper.py in wrap(*args)
     25                     if errcode != self._api.GDF_SUCCESS:
     26                         errname, msg = self._get_error_msg(errcode)
---> 27                         raise GDFError(errname, msg)
     28
     29                 wrap.__name__ = fn.__name__

GDFError: GDF_UNSUPPORTED_DTYPE
@mrocklin mrocklin added bug Something isn't working Needs Triage Need team to review and classify labels Mar 7, 2019
@mrocklin mrocklin added the dask Dask issue label Mar 7, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 7, 2019
@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 7, 2019

I believe this should be isolated to DatetimeIndex types and I'm hesitant to try to get this working for 0.6. This may be fixed by #892, but if not we should push to 0.7.

@thomcom
Copy link
Contributor

thomcom commented May 3, 2019

I'm pretty sure this is fixed. @mrocklin's reproducer now works.

>>> import pandas as pd
... df = pd.date_range('2000', '2001', freq='1M').to_frame()
...
... import cudf
... gdf = cudf.from_pandas(df)
... gdf['foo'] = gdf.index

>>> print(gdf['foo'])
2000-01-31T00:00:00.000   2000-01-31T00:00:00.000
2000-02-29T00:00:00.000   2000-02-29T00:00:00.000
2000-03-31T00:00:00.000   2000-03-31T00:00:00.000
2000-04-30T00:00:00.000   2000-04-30T00:00:00.000
2000-05-31T00:00:00.000   2000-05-31T00:00:00.000
2000-06-30T00:00:00.000   2000-06-30T00:00:00.000
2000-07-31T00:00:00.000   2000-07-31T00:00:00.000
2000-08-31T00:00:00.000   2000-08-31T00:00:00.000
2000-09-30T00:00:00.000   2000-09-30T00:00:00.000
2000-10-31T00:00:00.000   2000-10-31T00:00:00.000
[2 more rows]
Name: foo, dtype: datetime64[ms]

@thomcom thomcom closed this as completed May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dask Dask issue Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

3 participants