-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: allow storing ExtensionArrays in the Index #34159
Conversation
This is the idea behind ExtensionIndex |
I agree with @jbrockmendel here, this is the exactly the point of ExtensionIndex. trying to modify Index doesn't make much sense. |
Sorry, I should have been more explicit regarding ExtensionIndex in my reasoning above:
|
@jbrockmendel to make it concrete: are there (technical) reasons for you that are specific to Index and that make it more important to use a custom ExtensionIndex subclass instead of handling this in the base Index class? For example, both have a Other example: the (a pre-cursor PR to this one could be to remove the duplicated methods between Index and ExtensionIndex) |
Nothing that I have a particularly strong opinion about. If the implementation in this PR turns out to be elegant (unless asked, ill wait to review in detail until green), i wouldn't have a problem with it. That said, ATM whenever we have an Index object we know it is object-dtype. I expect there are places in the code where we are relying on that. (might be worth making an ObjectIndex? orthogonal discussion) |
@jorisvandenbossche yeah this is almost certainly not going to 100% overlap with an object dtyped Index base, because that serves 2 master right now, the object dtyped and a base index. If we would create a ObjectIndex then maybe what you are doing would work, but I find it easier / more compatible to simply have an ExtensionArrayIndex that can hold any type of EA. |
pandas/core/indexes/base.py
Outdated
arr = self._values | ||
idx = arr._concat_same_type([arr[:loc], item, arr[loc:]]) | ||
else: | ||
arr = np.asarray(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arr = self._values could go outside the if/elif
pandas/core/indexes/base.py
Outdated
if isinstance(self._values, np.ndarray): | ||
return self._values | ||
else: | ||
return np.asarray(self._values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt np.asarray(self._values)
work unconditionally?
pandas/core/indexes/base.py
Outdated
@@ -346,11 +346,13 @@ def __new__( | |||
ea_cls = dtype.construct_array_type() | |||
data = ea_cls._from_sequence(data, dtype=dtype, copy=False) | |||
else: | |||
data = np.asarray(data, dtype=object) | |||
# TODO clean-up with extract_array ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@jorisvandenbossche im working on getting this feature out the door; i think we can close this to clear the queue |
closing as duplicate of #37869 can move discussion there. |
@jorisvandenbossche I suggested closing this not based on the Index[EA] vs ExtensionIndex difference, but bc you don't seem to be working on it. If we get the other one working well, we can revisit the possibility of getting Index[EA] working. (FWIW as I see more and more of the DTI/TDI/PI code getting shared, I've found myself thinking about the possibility those may not need to be subclasses at all [in this daydream, the subclass-specific stuff would be accessed by a .dt accessor xref #17134], which bears a resemblance to the Index[EA] design) |
@jorisvandenbossche pls update or close |
We first need to agree on how to do this (see discussion above, and also the overview issue I created at #39133) |
Appears this PR draft has been dormant for a while so closing. Feel free to reopen when you have time to continue |
Note for future reference: this work is now actually done in #43930 |
I started experimenting with allowing EAs to be used in the Index (so we can put nullable int/bool, string dtype etc in the Index).
My approach is to use the base
Index
class for this (and not a custom subclass):For now, it just integrates it, so an EA can be put in an index and tested operations work. This PR does not (yet) handle ensuring that operations relying on the index' Engine are efficient (currently it simply uses
np.asarray(EA)
as the engine values, which eg will be object dtype for Int64Dtype)cc @jbrockmendel @TomAugspurger @jreback