-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Prevent 3D-ndarray for nested tuple labels (#24687) #24732
Conversation
@summonholmes : Thanks for contribution! Given the limited use-case as you point, I wonder what the trade-off might be between degree of use vs. work to maintain. Also, we're going to need at least one test and a |
Codecov Report
@@ Coverage Diff @@
## master #24732 +/- ##
===========================================
- Coverage 92.39% 43.07% -49.32%
===========================================
Files 166 166
Lines 52358 52362 +4
===========================================
- Hits 48374 22555 -25819
- Misses 3984 29807 +25823
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24732 +/- ##
==========================================
- Coverage 92.38% 92.38% -0.01%
==========================================
Files 166 166
Lines 52358 52363 +5
==========================================
+ Hits 48373 48377 +4
- Misses 3985 3986 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really sure what this PR is trying to do
I admit that this specific use case for nested tuple index labels is very esoteric, but in the function
Yes, there is trade off, and moving a check such as this somewhere else might be better. Eventually, these efforts will have to make their way back to the Cython file, lib.pyx. |
@summonholmes your are missing my point |
It's in the bug report that I created, but I'll post a more concise demo here: from pandas._libs import lib
from pandas import DataFrame
from seaborn import light_palette
# Using the nested tuple cluster
broken_cluster = {
(("Turtle", "Chicken"), (("Man", "Monkey"), "Dog")): (0, 28.375, 31.875),
"Tuna": (28.375, 0, 41),
"Moth": (31.875, 41, 0)
}
broken_cluster = DataFrame(broken_cluster, index=broken_cluster.keys())
broken_cluster.style.background_gradient(
cmap=light_palette("indigo", as_cmap=True))
# Without using the nested tuple cluster
working_cluster = {
"S": (0, 28.375, 31.875),
"Tuna": (28.375, 0, 41),
"Moth": (31.875, 41, 0)
}
working_cluster = DataFrame(working_cluster, index=working_cluster.keys())
working_cluster.style.background_gradient(
cmap=light_palette("indigo", as_cmap=True))
# Highlight mins
# The culprit:
lib.clean_index_list([(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog'))])[0] Output of
I welcome any feedback, and any indication of what I might be doing wrong. I used tuples in this program for the sake of Pythonic automation, nothing else. You might also wish to see #24688 |
this has NO support in pandas whatsoever. If you can raise an error in a performant way great would take it |
What's the issue number? It's not really clear to me what the changes here are doing. e.g. why check specifically against a shape of |
The original issue I was addressing with this PR was #24687, and after further testing I've determined that nested tuples can generate even more anomalous shapes than
Please correct me if I'm wrong, I'd be happy to work on a fix. You're saying that an error should be raised, as soon as a tuple is assigned to a column or index label? I'm assuming that the approach required is closely related to #24688 and #24702. |
looks like this was completly reverted, closing. |
Being a very rare issue encountered with nested tuples as column and index labels, here is the fix I've managed to come up with - for the time being. While
clean_index_list()
inpandas/_libs/lib.pyx
is responsible for returning an invalid result (a 3D ndarray where the inner dimensions should be nested tuples), debugging Cython is very challenging for me. And yes, use of tuples in this way is extremely uncommon. So far, the code has run successfully on two distance matrices.