Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot get cut() to display desired bins label #15357

Open
qAp opened this issue Feb 9, 2017 · 5 comments
Open

cannot get cut() to display desired bins label #15357

qAp opened this issue Feb 9, 2017 · 5 comments
Labels
cut cut, qcut Docs Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@qAp
Copy link

qAp commented Feb 9, 2017

srs1 are some 1000 random numbers to be binned using the boundaries in absths:

srs1 = pd.Series(np.random.uniform(low=0, high=3e-17, size=(1000,)))
absths = np.array([0., 1.e-22, 1.e-18, 1.e-16])

Bin them, print out the boundaries, and the results for the first 5 numbers in srs1:

ncut1 = pd.cut(srs1, 
              bins=absths, 
              include_lowest=True, 
              precision=16,
              retbins=True)

print(ncut1[1])
print(ncut1[0][:5])

gives

[  0.00000000e+00   1.00000000e-22   1.00000000e-18   1.00000000e-16]
0    (1.0000000000000001e-18, 9.9999999999999998e-17]
1    (1.0000000000000001e-18, 9.9999999999999998e-17]
2    (1.0000000000000001e-18, 9.9999999999999998e-17]
3    (1.0000000000000001e-18, 9.9999999999999998e-17]
4    (1.0000000000000001e-18, 9.9999999999999998e-17]
dtype: category
Categories (3, object): [[0, 1] < (1, 1.0000000000000001e-18] < (1.0000000000000001e-18, 9.9999999999999998e-17]]

The boundary that is meant to be 1e-22 is displayed as 1 in Categories. The keyword argument precision is already set to 16 to display many decimals. Is this a bug or am I not using the function correctly?

Thanks

@jorisvandenbossche
Copy link
Member

There seems to go something wrong with the conversion of the bin edges to a string I think.
Given that we are reworking this to be based on IntervalIndex, this may be fixed by that and @jreback this is maybe a case to test there in the PR? (#15309)

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

if you try with a higher precision, e.g. 22 this works, though I suspect you are actually hitting machine precision limits anyhow. Comparing numbers beyond 1e-15 can be somewhat arbitrary. So sometimes it will work, and if there are too many significant digits it might not work. I would either add a doc-note or raise if precision is too large here (IOW > 15). Yes this might work in #15309 because we are not stringifying but using actual values, but I think the same caveats apply.

So i'll mark this a doc-issue.

@jreback jreback added Docs Numeric Operations Arithmetic, Comparison, and Logical operations Difficulty Novice labels Feb 9, 2017
@jreback jreback added this to the Next Major Release milestone Feb 9, 2017
@anujloomba
Copy link

I was wondering if there was any resolution on this? I am facing a similar issue with labels.

@jorisvandenbossche
Copy link
Member

@anujloomba can you provide a reproducible example that shows your problem?
The problem illustrated above is in fact mostly fixed on master (although the 0 is now not represented correctly)

@jorisvandenbossche
Copy link
Member

@qAp on master I now get:

In [8]: print(ncut1[1])
[  0.00000000e+00   1.00000000e-22   1.00000000e-18   1.00000000e-16]

In [9]: print(ncut1[0][:5])
0    (1e-18, 1.0000000000000001e-16]
1    (1e-18, 1.0000000000000001e-16]
2    (1e-18, 1.0000000000000001e-16]
3    (1e-18, 1.0000000000000001e-16]
4    (1e-18, 1.0000000000000001e-16]
dtype: category
Categories (3, interval[float64]): [(-1e-16, 1.0000000000000002e-22] < (1.0000000000000002e-22, 1e-18] <
                                    (1e-18, 1.0000000000000001e-16]]

Which seems better as it displays the 1e-22 correctly. Although now the left bound of 0 is represented as -1e-16.

@jbrockmendel jbrockmendel added the cut cut, qcut label Jan 11, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cut cut, qcut Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

7 participants