CLN: remove methods of ExtensionIndex that duplicate base Index #34163

jorisvandenbossche · 2020-05-13T21:05:59Z

jorisvandenbossche · 2020-05-13T21:06:50Z

pandas/core/indexes/extension.py

@@ -223,29 +219,14 @@ def __getitem__(self, key):
        deprecate_ndim_indexing(result)
        return result

-    def __iter__(self):
-        return self._data.__iter__()


Should be equivalent to

pandas/pandas/core/base.py

Lines 1034 to 1051 in 9c08fe1

def __iter__(self):

"""

Return an iterator of the values.

These are each a scalar type, which is a Python scalar

(for str, int, float) or a pandas scalar

(for Timestamp/Timedelta/Interval/Period)

Returns

-------

iterator

"""

# We are explicitly making element iterators.

if not isinstance(self._values, np.ndarray):

# Check type instead of dtype to catch DTA/TDA

return iter(self._values)

else:

return map(self._values.item, range(self._values.size))

Sure (alternatively could remove L1047-1049 from the base class implementation.

alternatively could remove L1047-1049 from the base class implementation

No, because Series also uses that

jorisvandenbossche · 2020-05-13T21:07:02Z

pandas/core/indexes/extension.py

    # ---------------------------------------------------------------------

-    def __array__(self, dtype=None) -> np.ndarray:
-        return np.asarray(self._data, dtype=dtype)


This is identical with the Index one

jorisvandenbossche · 2020-05-13T21:07:43Z

pandas/core/indexes/extension.py

-
-        if self.hasnans:
-            return self._shallow_copy(self._data[~self._isnan])
-        return self._shallow_copy()


The only difference here with the Index one is the use of self._data vs self._values, which as far as I know shouldn't matter?

not sure if it matters for this method, but the distinction would matter for MultiIndex, which does not have _data.

In this case, MultiIndex actually overrides dropna, so that shouldn't even matter here.

But indeed, in general it's only for MI that using _values vs _data matters, for all the others it's the same (I think?), which I suppose is the reason in the base class there is more usage of _values.

jorisvandenbossche · 2020-05-13T21:09:28Z

pandas/core/indexes/extension.py

-            fill_value=fill_value,
-            na_value=self._na_value,
-        )
-        return type(self)(taken, name=self.name)


Same here ( self._data vs self._values), and the base class just has an extra if not self._can_hold_na: branch

pandas/pandas/core/indexes/base.py

Lines 688 to 703 in 9c08fe1

if self._can_hold_na:

taken = self._assert_take_fillable(

self._values,

indices,

allow_fill=allow_fill,

fill_value=fill_value,

na_value=self._na_value,

)

else:

if allow_fill and fill_value is not None:

cls_name = type(self).__name__

raise ValueError(

f"Unable to fill values because {cls_name} cannot contain NA"

)

taken = self._values.take(indices)

return self._shallow_copy(taken)

jorisvandenbossche · 2020-05-13T21:10:53Z

pandas/core/indexes/extension.py

-            self._validate_index_level(level)
-
-        result = self._data.unique()
-        return self._shallow_copy(result)


The base class ends up dispatching to IndexOpsMixin unique:

pandas/pandas/core/base.py

Lines 1257 to 1270 in 9c08fe1

def unique(self):

values = self._values

if hasattr(values, "unique"):

result = values.unique()

if self.dtype.kind in ["m", "M"] and isinstance(self, ABCSeries):

# GH#31182 Series._values returns EA, unpack for backward-compat

if getattr(self.dtype, "tz", None) is None:

result = np.asarray(result)

else:

result = unique1d(values)

return result

The hasattr(values, "unique") could probably be made more explicit / cleaner to check for EA, but basically this should also be the same

could probably be made more explicit / cleaner

sounds worthwhile, yah

jorisvandenbossche · 2020-05-14T20:28:43Z

@jbrockmendel all good?

jbrockmendel · 2020-05-14T21:36:36Z

pandas/core/base.py

@@ -1257,8 +1257,7 @@ def value_counts(
    def unique(self):
        values = self._values

-        if hasattr(values, "unique"):
-
+        if not isinstance(self._values, np.ndarray):


can re-use values here

jbrockmendel · 2020-05-14T21:37:24Z

all good?

yep

CLN: remove methods of ExtensionIndex that duplicate base Index

4d897f8

jorisvandenbossche commented May 13, 2020

View reviewed changes

jreback added Clean Index Related to the Index class or subclasses labels May 13, 2020

check the values instead of hasattr

244ddc7

jbrockmendel reviewed May 14, 2020

View reviewed changes

reuse values

cb05175

jorisvandenbossche merged commit c10020f into pandas-dev:master May 15, 2020

jorisvandenbossche deleted the dedup-index-extindex branch May 15, 2020 07:58

jorisvandenbossche added this to the 1.1 milestone May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: remove methods of ExtensionIndex that duplicate base Index #34163

CLN: remove methods of ExtensionIndex that duplicate base Index #34163

jorisvandenbossche commented May 13, 2020

jorisvandenbossche May 13, 2020

jbrockmendel May 13, 2020

jorisvandenbossche May 14, 2020

jorisvandenbossche May 13, 2020

jorisvandenbossche May 13, 2020

jbrockmendel May 13, 2020

jorisvandenbossche May 14, 2020

jorisvandenbossche May 13, 2020

jorisvandenbossche May 13, 2020

jbrockmendel May 13, 2020

jorisvandenbossche commented May 14, 2020

jbrockmendel May 14, 2020

jbrockmendel commented May 14, 2020

	def __iter__(self):
	"""
	Return an iterator of the values.

	These are each a scalar type, which is a Python scalar
	(for str, int, float) or a pandas scalar
	(for Timestamp/Timedelta/Interval/Period)

	Returns
	-------
	iterator
	"""
	# We are explicitly making element iterators.
	if not isinstance(self._values, np.ndarray):
	# Check type instead of dtype to catch DTA/TDA
	return iter(self._values)
	else:
	return map(self._values.item, range(self._values.size))

	if self._can_hold_na:
	taken = self._assert_take_fillable(
	self._values,
	indices,
	allow_fill=allow_fill,
	fill_value=fill_value,
	na_value=self._na_value,
	)
	else:
	if allow_fill and fill_value is not None:
	cls_name = type(self).__name__
	raise ValueError(
	f"Unable to fill values because {cls_name} cannot contain NA"
	)
	taken = self._values.take(indices)
	return self._shallow_copy(taken)

	def unique(self):
	values = self._values

	if hasattr(values, "unique"):

	result = values.unique()
	if self.dtype.kind in ["m", "M"] and isinstance(self, ABCSeries):
	# GH#31182 Series._values returns EA, unpack for backward-compat
	if getattr(self.dtype, "tz", None) is None:
	result = np.asarray(result)
	else:
	result = unique1d(values)

	return result

CLN: remove methods of ExtensionIndex that duplicate base Index #34163

CLN: remove methods of ExtensionIndex that duplicate base Index #34163

Conversation

jorisvandenbossche commented May 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented May 14, 2020

Choose a reason for hiding this comment

jbrockmendel commented May 14, 2020