Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Prune unnecessary indexing code #27576

Merged
merged 12 commits into from
Jul 25, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

No description provided.

@@ -2077,6 +2072,8 @@ def _getitem_tuple(self, tup: Tuple):

# if the dim was reduced, then pass a lower-dim the next time
if retval.ndim < self.ndim:
# TODO: this is never reached in tests; can we confirm that
# it is impossible?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz any thoughts on showing this is unreachable? If we can confirm this is the case, then this chunk of the code can be shared with the other _getitem_tuple method above

(pinging you since you wrote the wiki page on simplifying this code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this also wasnt hit in 0.24.2

Copy link

@ghost ghost Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for context, example case which triggers on that commit but not on master

Code
import numpy as np
from pandas import *


mi_int = DataFrame(
    np.random.randn(3, 3),
    columns=[[2, 2, 4], [6, 8, 10]],
    index=[[4, 4, 8], [8, 10, 12]],
)

# this triggers the code path when testing against 9f0dc3befb
rs = mi_int.iloc[:, 2]  

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that path is ever reached. It should be reached by something like df.loc[1, [0, 1]], which however gets captured in the call above to self._getitem_lowerdim(tup) (which doesn't make sense, because it is named "lowerdim", but then it resorts to _is_nested_tuple_indexer). So to sum up: I don't know exactly what the aim of that axis -= 1 is, I don't think it is ever reached, but I think it should be reached in principle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but no major objections to removing for now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pilkibun thanks for tracking that down

@toobaz how about we fix getitem_lowerdim to make sense before removing this check. simplifying this code is going to take a few passes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pilkibun there is another un-hit block on lines 440-450. any idea what that was intended to catch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was related to Panel indexing, can probably go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz how about we fix getitem_lowerdim to make sense before removing this check. simplifying this code is going to take a few passes

No objection

@@ -2152,7 +2149,7 @@ def _convert_to_indexer(
)


class _ScalarAccessIndexer(_NDFrameIndexer):
class _ScalarAccessIndexer(_NDFrameIndexerBase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only _NDFrameIndexer method these use is _tuplify, which is why this PR takes that out and makes it a function instead of a method. Much easier to reason about these classes now

pass
elif is_timedelta64_dtype(self.dtype):
# reassign a null value to iNaT
if is_valid_nat_for_dtype(value, self.dtype):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer reachable since we fixed a bug in the _set_with_engine call above in #27536.

@@ -3464,3 +3462,22 @@ def _sparsify(label_list, start=0, sentinel=""):

def _get_na_rep(dtype):
return {np.datetime64: "NaT", np.timedelta64: "NaT"}.get(dtype, "NaN")


def maybe_droplevels(index, key):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -2077,6 +2072,8 @@ def _getitem_tuple(self, tup: Tuple):

# if the dim was reduced, then pass a lower-dim the next time
if retval.ndim < self.ndim:
# TODO: this is never reached in tests; can we confirm that
# it is impossible?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was related to Panel indexing, can probably go.

@@ -2320,6 +2314,12 @@ def _convert_key(self, key, is_setter: bool = False):
return key


def _tuplify(ndim: int, loc) -> tuple:
tup = [slice(None, None) for _ in range(ndim)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings added. will take another look at removing the discussed lines in the next pass

@jreback jreback added Clean Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 25, 2019
@jreback jreback added this to the 1.0 milestone Jul 25, 2019
@@ -2750,7 +2747,8 @@ def get_loc_level(self, key, level=0, drop_level=True):
(1, None)
"""

def maybe_droplevels(indexer, levels, drop_level):
# different name to distinguish from maybe_droplevels
def maybe_mi_droplevels(indexer, levels, drop_level: bool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future maybe pull this out to a module level function

@jreback jreback merged commit ebcfee4 into pandas-dev:master Jul 25, 2019
@jreback
Copy link
Contributor

jreback commented Jul 25, 2019

thanks

@jbrockmendel jbrockmendel deleted the mindex branch July 25, 2019 22:36
quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants