Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does CategoricalIndex.get_value call _convert_scalar_indexer? #31683

Closed
jbrockmendel opened this issue Feb 5, 2020 · 5 comments
Closed

Comments

@jbrockmendel
Copy link
Member

cc @jreback @jorisvandenbossche @TomAugspurger the meat of CategoricalIndex.get_value reads:

def get_value(self, series, key):
        k = key
        try:
            k = self._convert_scalar_indexer(k, kind="getitem")
            indexer = self.get_loc(k)
            return series.take([indexer])[0]
        except (KeyError, TypeError):
            pass

        # we might be a positional inexer
        return Index.get_value(self, series, key)

The thing is, if we actually track down that _convert_scalar_indexer call, with kind="getitem" it always passes through to the base class method, which ends up checking:

            if kind == "getitem" and is_float(key):
                if not self.is_floating():
                    self._invalid_indexer("label", key)

But AFAICT CategoricalIndex.is_floating() is always False, so it looks like the call in get_value is a no-op, which I expect was not the original intention.

Also possibly undesired:

idx = pd.Index(range(4)).astype("f8") / 2
cidx = pd.CategoricalIndex(idx)

ser1 = pd.Series(range(4), index=idx)
ser2 = pd.Series(range(4), index=cidx)

>>> ser1[1.0] 
2
>>> ser2[1.0]
[...] TypeError [...]
>>> ser1[2]
[...] KeyError [...]
>>> ser2[2]
2

Can anyone confirm if the no-op call and/or different __getitem__ behavior is intentional?

@jorisvandenbossche
Copy link
Member

In general, indexing on CategoricalIndex with numeric categories is a thorny topic .. See also #15470 and #14865

But AFAICT CategoricalIndex.is_floating() is always False, so it looks like the call in get_value is a no-op, which I expect was not the original intention.

If is_floating() is always False, this is not a no-op? (it is if not self.is_floating():)
But, about the snippet itself, I think that is to avoid falling back to integer positional indexing if the key is a float?
And that is being used for categorical index (you see it come up in the traceback):

In [15]: s = pd.Series(range(5), index=pd.Categorical(['a', 'b', 'c', 'd', 'e']))   

In [16]: s[0.0]    
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-004bb52c0037> in <module>
----> 1 s[0.0]

~/scipy/pandas/pandas/core/series.py in __getitem__(self, key)
    853         key_is_scalar = is_scalar(key)
    854         if key_is_scalar:
--> 855             key = self.index._convert_scalar_indexer(key, kind="getitem")
    856 
    857         if key_is_scalar or isinstance(self.index, MultiIndex):

~/scipy/pandas/pandas/core/indexes/category.py in _convert_scalar_indexer(self, key, kind)
    665             except TypeError:
    666                 self._invalid_indexer("label", key)
--> 667         return super()._convert_scalar_indexer(key, kind=kind)
    668 
    669     @Appender(Index._convert_list_indexer.__doc__)

~/scipy/pandas/pandas/core/indexes/base.py in _convert_scalar_indexer(self, key, kind)
   3125             if kind == "getitem" and is_float(key):
   3126                 if not self.is_floating():
-> 3127                     self._invalid_indexer("label", key)
   3128 
   3129             elif kind == "loc" and is_float(key):

~/scipy/pandas/pandas/core/indexes/base.py in _invalid_indexer(self, form, key)
   3290         """
   3291         raise TypeError(
-> 3292             f"cannot do {form} indexing on {type(self)} with these "
   3293             f"indexers [{key}] of {type(key)}"
   3294         )

TypeError: cannot do label indexing on <class 'pandas.core.indexes.category.CategoricalIndex'> with these indexers [0.0] of <class 'float'>

@jbrockmendel
Copy link
Member Author

If is_floating() is always False, this is not a no-op?

Yep, good catch. A hopefully-more-accurate statement is that when i remove the _convert_scalar_indexer call from CategoricalIndex.get_value, it doesnt cause any test failures

@jbrockmendel
Copy link
Member Author

OK, it looks like get_value is called from two places, one of which is inside Series.__getitem__ which now calls _convert_scalar_indexer before we get here, the other from Series._get_value, which is called from loc/iloc/at/iat.__getitem__. Will check to see if we are just missing tests for the latter case

@jbrockmendel
Copy link
Member Author

Yikes, the deeper i dig into this, the worse it gets. _AtIndexer does its own validation instead of deferring to the index objects, so raises slightly different exceptions

@jbrockmendel
Copy link
Member Author

Closed by #31724.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants