Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Styler background_gradient cmap with tuple indices sometimes resulting in ValueError #24687

Closed
summonholmes opened this issue Jan 9, 2019 · 4 comments
Labels

Comments

@summonholmes
Copy link

summonholmes commented Jan 9, 2019

Full demo of the issue (Breakage occurs prior to highlighting cells, as shown in the pictures)

from pandas import DataFrame
from seaborn import light_palette

# Initial Distance Matrix
cytochrome_c = {
    "Turtle": (0, 19, 27, 8, 33, 18, 13),
    "Man": (19, 0, 31, 18, 36, 1, 13),
    "Tuna": (27, 31, 0, 26, 41, 32, 29),
    "Chicken": (8, 18, 26, 0, 31, 17, 14),
    "Moth": (33, 36, 41, 31, 0, 35, 28),
    "Monkey": (18, 1, 32, 17, 35, 0, 12),
    "Dog": (13, 13, 29, 14, 28, 12, 0),
}
cytochrome_c = DataFrame(cytochrome_c, index=cytochrome_c.keys())
cytochrome_c.style.background_gradient(
    cmap=light_palette("indigo", as_cmap=True))
# Highlight mins

picture1

# Intermediate Distance Matrix - Cluster name changed to 'S'
working_cluster = {
    "S": (0, 28.375, 31.875),
    "Tuna": (28.375, 0, 41),
    "Moth": (31.875, 41, 0)
}
working_cluster = DataFrame(working_cluster, index=working_cluster.keys())
working_cluster.style.background_gradient(
    cmap=light_palette("indigo", as_cmap=True))
# Highlight mins

picture4

# Final Distance Matrix - Cluster name left alone
final_cluster = {
    ((("Turtle", "Chicken"), (("Man", "Monkey"), "Dog")), "Tuna"): (0,
                                                                    36.4375),
    "Moth": (36.4375, 0)
}
final_cluster = DataFrame(final_cluster, index=final_cluster.keys())
final_cluster.style.background_gradient(
    cmap=light_palette("indigo", as_cmap=True))
# Highlight mins

picture3

# Intermediate Distance Matrix - Cluster name causes ValueError
broken_cluster = {
    (("Turtle", "Chicken"), (("Man", "Monkey"), "Dog")): (0, 28.375, 31.875),
    "Tuna": (28.375, 0, 41),
    "Moth": (31.875, 41, 0)
}
broken_cluster = DataFrame(broken_cluster, index=broken_cluster.keys())
broken_cluster.style.background_gradient(
    cmap=light_palette("indigo", as_cmap=True))
# Highlight mins

When running the 'broken_cluster', the output is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.local/miniconda3/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/.local/miniconda3/lib/python3.7/site-packages/pandas/io/formats/style.py in _repr_html_(self)
    149     def _repr_html_(self):
    150         """Hooks into Jupyter notebook rich display system."""
--> 151         return self.render()
    152 
    153     @Appender(_shared_docs['to_excel'] % dict(

~/.local/miniconda3/lib/python3.7/site-packages/pandas/io/formats/style.py in render(self, **kwargs)
    444         * table_attributes
    445         """
--> 446         self._compute()
    447         # TODO: namespace all the pandas keys
    448         d = self._translate()

~/.local/miniconda3/lib/python3.7/site-packages/pandas/io/formats/style.py in _compute(self)
    512         r = self
    513         for func, args, kwargs in self._todo:
--> 514             r = func(self)(*args, **kwargs)
    515         return r
    516 

~/.local/miniconda3/lib/python3.7/site-packages/pandas/io/formats/style.py in _apply(self, func, axis, subset, **kwargs)
    545                                                        expect=expected_shape))
    546             raise ValueError(msg)
--> 547         self._update_ctx(result)
    548         return self
    549 

~/.local/miniconda3/lib/python3.7/site-packages/pandas/io/formats/style.py in _update_ctx(self, attrs)
    468         for row_label, v in attrs.iterrows():
    469             for col_label, col in v.iteritems():
--> 470                 i = self.index.get_indexer([row_label])[0]
    471                 j = self.columns.get_indexer([col_label])[0]
    472                 for pair in col.rstrip(";").split(";"):

~/.local/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   3257                                  'backfill or nearest reindexing')
   3258 
-> 3259             indexer = self._engine.get_indexer(target._ndarray_values)
   3260 
   3261         return _ensure_platform_int(indexer)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_indexer()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.lookup()

ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
<pandas.io.formats.style.Styler at 0x1a18aa92b0>

Problem description

The pandas styler will sometimes fail with a ValueError when the column and index labels are tuples. This issue is reproducible with the data provided above.

This problem is fully demonstrated in my UPGMA repository, here.

Software Versions:

conda 4.5.12
pandas                    0.23.4           py37h6440ff4_0
python                    3.7.0                hc167b69_0
seaborn                   0.9.0                    py37_0

Expected Output

picture2

@WillAyd
Copy link
Member

WillAyd commented Jan 9, 2019

Using a tuple as a label is generally not supported, but if you want to take a look PRs are always welcome

@gfyoung gfyoung added the Visualization plotting label Jan 9, 2019
@gfyoung
Copy link
Member

gfyoung commented Jan 9, 2019

cc @TomAugspurger

@WillAyd WillAyd added this to the Contributions Welcome milestone Jan 9, 2019
@summonholmes
Copy link
Author

summonholmes commented Jan 10, 2019

For conda pandas 0.23.4, I added print(target._ndarray_values) on line 3259 before the return statement in function def get_indexer(self, target, method=None, limit=None, tolerance=None) for file 'miniconda3⁩/envs⁩/⁨pandas_stable/lib/python3.7/site-packages/pandas⁩/core⁩/indexes⁩/base.py'. I noticed that on a successful run, the entries will look like this:

[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[]
['Moth']
[(((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna'), 'Moth')]
['Moth']
[(((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna'), 'Moth')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')]
['Moth']
['Moth']

picture5

On the failure, notice the last line:

[(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')) 'Tuna']
[(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')) 'Tuna']
[(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')) 'Tuna']
['Moth']
['Tuna']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
['Tuna']
[((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna') 'Moth']
[(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')) 'Tuna' 'Moth']
[(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')) 'Tuna' 'Moth']
[[['Turtle' 'Chicken']
  [('Man', 'Monkey') 'Dog']]]

Note the nested list.

The original index prints as Index([[['Turtle', 'Chicken'], [('Man', 'Monkey'), 'Dog']]], dtype='object'.

I'm taking a look at def _ensure_index(index_like, copy=False): to see if I can find out why it's failing. So far, it looks like the Cython function clean_index_list() is encountering an edge case. You can reproduce this edge case by running the following code:

from pandas._libs import lib

lib.clean_index_list([(('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog'))])

@summonholmes
Copy link
Author

While this may have been the first issue opened regarding the tuple labeling problem, discussion is more relevant in #24688, #24702, and #24783. Closing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants