-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Problems converting String dtype Series with "nan" to Float (ValueError) #7488
Comments
@dmitra79 this is a design decision of cudf. In your above example:
Pandas by default creates this Series as an Unfortunately, this isn't possible for us to handle on the GPU, so when someone does:
We create this Series as a I would also recommending upgrading to a newer cuDF as v0.18 has recently released and many of these behaviors and error messages have been improved over the last couple of releases. |
@kkraus14 The error gets thrown not when values are set to NAN, but when the casting of the series to float happens (" x.astype('float64') "- which should be pretty unambiguous. Also, this pandas-like code worked in previous versions of cudf without error (I ran into this problem trying to use an older piece of code). |
Apologies, I may have misinterpreted. What is the result of |
For this code:
The output is:
The error is:
|
PS. I just checked with cudf 0.18 (installed from scratch in a new environment) - same issue |
Thanks for the reproducer. We'll look into this. In the meantime, I would suggest using |
Thanks for the suggestion of using "None"! That really helps! |
@davidwendt do you happen to know if there's a way to convert a string to a |
Yes, if a string is "NaN" then |
Looks like Pandas allows case-insensitive conversion for |
This issue has been labeled |
This issue has been labeled |
…ing to `float` (#9613) Fixes: #7488 This PR add's support for strings that are `nan`, `inf` & `-inf` and their case-sensitive variations to be supported while type-casting from string column to `float` dtype. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - David Wendt (https://github.com/davidwendt) - https://github.com/brandon-b-miller URL: #9613
A Series of strings should be convertible to a series of floats, even if some entries are NAN. However, instead a ValueError gets thrown:
The following works fine in Pandas
'''
x= pd.Series(['1.1', '2.3', '', '3'])
x[x=='']=np.NAN
x.astype('float64')
'''
but throws an error in cudf:
'''
x= cudf.Series(['1.1', '2.3', '', '3'])
x[x=='']=np.NAN
x.astype('float64')
'''
ValueError: Could not convert strings to float type due to presence of non-floating values.
Environment overview (please complete the following information)
cudf.version = '0.15.0'
installed with conda
The text was updated successfully, but these errors were encountered: