Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Negating a boolean Series changes its dtype to int8 #12397

Closed
shwina opened this issue Dec 15, 2022 · 4 comments
Closed

[BUG] Negating a boolean Series changes its dtype to int8 #12397

shwina opened this issue Dec 15, 2022 · 4 comments
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented Dec 15, 2022

In Pandas, negating a bool Series behaves as follows:

>>> -pd.Series([True, False])
0    False
1     True
dtype: bool

Whereas in cuDF:

>>> -cudf.Series([True, False]) 
0   -1
1    0
dtype: int8
@shwina shwina added bug Something isn't working Needs Triage Need team to review and classify labels Dec 15, 2022
@shwina
Copy link
Contributor Author

shwina commented Dec 15, 2022

Should we match Pandas behaviour here?

@wence-
Copy link
Contributor

wence- commented Dec 16, 2022

Pandas doesn't promote types in uops, so unsigned types stay unsigned, whereas cudf does:

import pandas as pd
import numpy as np
for dtype in ["bool", "uint8", "int8", "uint64", "int64"]:
    if dtype == "bool":
        dtype_max = True
        dtype_min = False
    else:
        dtype = np.dtype(dtype)
        dtype_max = np.iinfo(dtype).max
        dtype_min = np.iinfo(dtype).min
    s = pd.Series([1], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = pd.Series([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = pd.Series([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)
# bool bool
# bool bool
# bool bool
# uint8 uint8
# uint8 uint8
# uint8 uint8
# int8 int8
# int8 int8
# int8 int8
# uint64 uint64
# uint64 uint64
# uint64 uint64
# int64 int64
# int64 int64
# int64 int64
import cudf
import numpy as np
for dtype in ["bool", "uint8", "int8", "uint64", "int64"]:
    if dtype == "bool":
        dtype_max = True
        dtype_min = False
    else:
        dtype = np.dtype(dtype)
        dtype_max = np.iinfo(dtype).max
        dtype_min = np.iinfo(dtype).min
    s = cudf.Series([1], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = cudf.Series([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = cudf.Series([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)

# bool int8
# bool int8
# bool int8
# uint8 int16
# uint8 int16
# uint8 int16
# int8 int8
# int8 int8
# int8 int8
# uint64 float64
# uint64 float64
# uint64 float64
# int64 int64
# int64 int64
# int64 int64

As we can see, cudf promotes to the next signed type that is wide enough to hold the range of values that the unsigned type has (including promoting uint64 to float64, and losing precision).

Finally, numpy raises TypeError on unary minus of booleans, suggesting one use ~ instead, and otherwise does not promote types

import numpy as np
for dtype in ["uint8", "int8", "uint64", "int64"]:
    dtype = np.dtype(dtype)
    dtype_max = np.iinfo(dtype).max
    dtype_min = np.iinfo(dtype).min
    s = np.asarray([1], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = np.asarray([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)
    s = np.asarray([dtype_max], dtype=dtype)
    print(s.dtype, (-s).dtype)

# uint8 uint8
# uint8 uint8
# uint8 uint8
# int8 int8
# int8 int8
# int8 int8
# uint64 uint64
# uint64 uint64
# uint64 uint64
# int64 int64
# int64 int64
# int64 int64

I think we should match pandas here.

@wence-
Copy link
Contributor

wence- commented Dec 16, 2022

Note in general that type promotion involving booleans probably needs special case handling because I am not sure we do the right thing for binops now either.

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 6, 2023
@vyasr
Copy link
Contributor

vyasr commented May 17, 2024

It looks like we resolved the original issue at some point here:

In [80]: -cudf.Series([True, False])
Out[80]:
0    False
1     True
dtype: bool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

4 participants