Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support logical operators such as & and | for cudf.dataframe.series.Series #1071

Closed
paulhendricks opened this issue Mar 1, 2019 · 1 comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@paulhendricks
Copy link

paulhendricks commented Mar 1, 2019

Is your feature request related to a problem? Please describe.

The ability to use logical operators & and | with cudf.dataframe.series.Series data structures.

For example:

mask_and = mask_1 & mask_2
mask_or = mask_1 | mask_2

The whole reproducible code is below:

import cudf

# Create a test csv file
filename = 'foo.csv'
lines = [
  "num1,datetime,text",
  "123,2018-11-13T12:00:00,abc",
  "456,2018-11-14T12:35:01,def",
  "789,2018-11-15T18:02:59,ghi"
]
with open(filename, 'w') as fp:
    fp.write('\n'.join(lines)+'\n')

# Read the file with cudf using only certain columns (omitting strings)
names = ['num1', 'datetime', 'text']
dtypes = ['int', 'date', 'str']
gdf = cudf.io.csv.read_csv(filename, delimiter=',',
                        names=names, dtype=dtypes,
                        skiprows=1, usecols=['num1', 'datetime'])
print(gdf.head())

mask_1 = gdf['num1'] == 456
print(mask_1)
print(type(mask_1))

specific_date = gdf['datetime'][2]
mask_2 = gdf['datetime'] == specific_date
print(mask_2)
print(type(mask_2))

mask_and = mask_1 & mask_2  # TypeError: unsupported operand type(s) for &: 'Series' and 'Series'
mask_or = mask_1 | mask_2  # TypeError: unsupported operand type(s) for |: 'Series' and 'Series'

import pandas as pd

df = pd.read_csv(filename)

mask_1 = df['num1'] == 456
print(mask_1)
print(type(mask_1))

specific_date = df['datetime'][2]
mask_2 = df['datetime'] == specific_date
print(mask_2)
print(type(mask_2))

mask_and = mask_1 & mask_2  # Works!
mask_or = mask_1 | mask_2  # Works!

Describe the solution you'd like

mask_and = mask_1 & mask_2  # Works!
mask_or = mask_1 | mask_2  # Works!

Describe alternatives you've considered

The following uses NumPy and ought to work:

import numpy as np

mask_and = np.logical_and(mask_1, mask_2)
mask_or = np.logical_or(mask_1, mask_2)

print(gdf[mask_and])
print(gdf[mask_or])

Additional context

Pandas documentation on boolean indexing here: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing

@paulhendricks paulhendricks added Needs Triage Need team to review and classify feature request New feature or request labels Mar 1, 2019
@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 4, 2019
@mrocklin
Copy link
Collaborator

mrocklin commented Mar 5, 2019

Looks like a copy of #49 . Closing this in order to consolidate things there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

3 participants