Indexing categories #387

henryiii · 2020-07-02T13:05:49Z

When filling, an int or str category will put values not in the axes either in a new bin (growth), a flow bin, or ignore it if overflow is off. However, the Python index method does the same thing - for example:

h = bh.Histogram(bh.axis.StrCategory(["a", "b"]))
h.fill(["c"])
print(h[bh.loc("d")]) # Returns 1.0

Also, if growth was on, this will not grow the axes, though at least one user expected that in the past.
Should Python indexes into non-existing string or int categories throw a TypeError or KeyError? I think this would be less error-prone. You could still use bh.overflow to access the flow bin explicitly.

Unlike axes with both an underflow and overflow, the categories represent a set of items, and Python tools for this sort of thing (like a dict) generally throw errors when you try to access a non-existent item (but not when setting it, so fill is still consistent).

@HDembinski ?

The text was updated successfully, but these errors were encountered:

HDembinski · 2020-07-03T06:11:39Z

Throwing a ValueError or KeyError may help.

HDembinski · 2020-07-03T06:13:10Z

I am not sure which one. A histogram with a category axis is not a dict, and we should not stress that analogy, so I am rather leaning toward ValueError.

henryiii · 2020-07-05T03:37:55Z

As you correctly inferred, I meant ValueError above, not TypeError. :)

Let me check a couple of other packages, specifically Pandas, to see what they throw (when I'm not on iOS, Pythonista doesn't include Pandas, sadly).

HDembinski · 2020-07-05T07:29:44Z

Good idea.

henryiii · 2020-07-07T17:07:22Z

Pandas does seem to throw a KeyError, both for a DataFrame:

d = pd.DataFrame({"col_one": [1]})
d["col_two"] # throws a key error

Or for a Series (which is arguably closer to what we have):

s = pd.Series([1,2,3], ["a", "b", "c"])
s["d"] # Throws key error

henryiii · 2020-07-09T15:43:12Z

Xarray (from the SciPy tutorial today) also uses KeyError (though it's using Pandas Indexers in the backend, so not shocking that it matches Pandas). Even for floating point coordinates. Xarray is very close to what we do. I would vote for KeyError.

HDembinski · 2020-07-11T08:27:42Z

Ok.

nsmith- · 2021-02-24T23:58:36Z

Just leaving a comment to say I ran into this (unexpected to me) behavior and look forward to the KeyError.

henryiii added this to the 1.0.0 milestone Jul 11, 2020

henryiii modified the milestones: 1.0.0, 0.13.0, 1.1.0 Feb 9, 2021

henryiii modified the milestones: 1.1.0, 1.2.0 Jul 7, 2021

henryiii mentioned this issue Jul 17, 2021

[BUG] Error message when category slicing info isn't present in the axies scikit-hep/hist#260

Closed

henryiii mentioned this issue Oct 11, 2021

[FEATURE] More informative indexing errors (for category indexing) scikit-hep/hist#330

Closed

henryiii mentioned this issue Jan 17, 2022

fix: protect Cat indexing from missing values #689

Merged

henryiii closed this as completed in #689 Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing categories #387

Indexing categories #387

henryiii commented Jul 2, 2020

HDembinski commented Jul 3, 2020

HDembinski commented Jul 3, 2020 •

edited

Loading

henryiii commented Jul 5, 2020

HDembinski commented Jul 5, 2020

henryiii commented Jul 7, 2020

henryiii commented Jul 9, 2020 •

edited

Loading

HDembinski commented Jul 11, 2020

nsmith- commented Feb 24, 2021

Indexing categories #387

Indexing categories #387

Comments

henryiii commented Jul 2, 2020

HDembinski commented Jul 3, 2020

HDembinski commented Jul 3, 2020 • edited Loading

henryiii commented Jul 5, 2020

HDembinski commented Jul 5, 2020

henryiii commented Jul 7, 2020

henryiii commented Jul 9, 2020 • edited Loading

HDembinski commented Jul 11, 2020

nsmith- commented Feb 24, 2021

HDembinski commented Jul 3, 2020 •

edited

Loading

henryiii commented Jul 9, 2020 •

edited

Loading