Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: create UInt64Block #15145

Closed
jreback opened this issue Jan 17, 2017 · 8 comments
Closed

COMPAT: create UInt64Block #15145

jreback opened this issue Jan 17, 2017 · 8 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jan 17, 2017

xref #14937 (comment)

a number of indexing / conversion issues arise because we are treating uint as a direct int, rather than a sub-class. (e.g. if we make UIntBlock a sub-class of IntBlock), I think can easily handle some small overrides to, for instance check for negative values when indexing.

In [1]: df = pd.DataFrame({'A' : np.array([1,2,3],dtype='uint64'), 'B': range(3)})

In [2]: df
Out[2]: 
   A  B
0  1  0
1  2  1
2  3  2

In [4]: df.dtypes
Out[4]: 
A    uint64
B     int64
dtype: object

Buggy

In [5]: df.iloc[1] = -1

In [6]: df
Out[6]: 
                      A  B
0                     1  0
1  18446744073709551615 -1
2                     3  2
In [7]: df.iloc[1] = np.nan

In [8]: df
Out[8]: 
     A    B
0  1.0  0.0
1  NaN  NaN
2  3.0  2.0

This is correct

In [9]: df.A.astype('uint64')
---------------------------------------------------------------------------
ValueError: Cannot convert non-finite values (NA or inf) to integer

However, this is not

In [10]: df.iloc[1] = -1

In [11]: df
Out[11]: 
     A    B
0  1.0  0.0
1 -1.0 -1.0
2  3.0  2.0

In [12]: df.dtypes
Out[12]: 
A    float64
B    float64
dtype: object

In [13]: df.A.astype('uint64')
Out[13]: 
0                       1
1    18446744073709551615
2                       3
Name: A, dtype: uint64

Construction with invalid values

In [1]: Series([-1], dtype='uint64')
Out[1]: 
0    18446744073709551615
dtype: uint64
@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions Difficulty Advanced Dtype Conversions Unexpected or buggy dtype conversions labels Jan 17, 2017
@jreback jreback added this to the 0.20.0 milestone Jan 17, 2017
@jreback
Copy link
Contributor Author

jreback commented Jan 17, 2017

cc @gfyoung

@sinhrks
Copy link
Member

sinhrks commented Jan 21, 2017

am i right that UIntBlock behave as below?

df = pd.DataFrame({'A' : np.array([1,2,3],dtype='uint64'), 'B': range(3)})
df.iloc[1] = -1
df
#      A  B
# 0  1.0  0
# 1 -1.0 -1
# 2  3.0  2

# casted to float64 to contain both int64 and uint64 range
df.dtypes
# A    float64
# B      int64
# dtype: object

df.astype('uint64')
# ValueError: Cannot convert negative values to unsigned integer

@gfyoung
Copy link
Member

gfyoung commented Jan 21, 2017

@sinhrks : That casting to float64 suggests that it's a numpy thing. However, can we agree on what should the correct response be to that (and also the example by @jreback ) ? It seems that we should raise an error saying that we can't set negative values in an unsigned integer column.

@jreback
Copy link
Contributor Author

jreback commented Jan 21, 2017

The .astype should raise (well it should respect the errors setting).

However, thing like indexing would coerce (this is actually a pretty simple thing to do), just have a core.internals.UInt64Block._try_coerce_args which coerces.

@shoyer
Copy link
Member

shoyer commented Jan 21, 2017

I'm really happy to add more native support for more NumPy dtypes in pandas, but I think I'm missing some context. I understand why dtypes like uint8 and uint16 that save significant memory are useful (they are widely used for images), but what are the use cases for uint64? The numbers that uint64 represents that int64 cannot handle are absurdly large -- between 9223372036854775807 and 18446744073709551615!

@jreback
Copy link
Contributor Author

jreback commented Jan 21, 2017

these are commonly used as guids and hash values

@jbrockmendel
Copy link
Member

@jreback is this still a pain point? i think the direction we're moving in is towards fewer Block subclasses, not more

@jreback
Copy link
Contributor Author

jreback commented Jul 23, 2019

yes i think this is covered

@jreback jreback closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

5 participants