Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add examples for MultiIndex.get_locs + cleanups #17675

Merged
merged 3 commits into from
Sep 30, 2017

Conversation

topper-123
Copy link
Contributor

  • [x ] passes git diff upstream/master -u -- "*.py" | flake8 --diff

Adds examples to MultiIndex.get_locs and also some changes to .get_loc and .get_loc_values.

boolean mask array, otherwise it is always a slice or int.
Get location for a label or a tuple of labels as an integer, slice or
boolean mask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this Note in Notes.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17675 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17675      +/-   ##
==========================================
- Coverage   91.27%   91.25%   -0.02%     
==========================================
  Files         163      163              
  Lines       49765    49765              
==========================================
- Hits        45421    45412       -9     
- Misses       4344     4353       +9
Flag Coverage Δ
#multiple 89.04% <100%> (ø) ⬆️
#single 40.33% <33.33%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.7% <ø> (ø) ⬆️
pandas/core/indexes/multi.py 96.9% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2a0251...070e8f3. Read the comment docs.

@topper-123 topper-123 force-pushed the multiindex.get_locs branch 5 times, most recently from 2a0d882 to 20ed680 Compare September 26, 2017 08:57
@topper-123
Copy link
Contributor Author

I've made some changes and put them in a new commit as it contains some (very minor) corrections to previous commits also.

#17655 had the reference to api.types.CategoricalDType wrong (pandas. was prepended which made the link unusable). Given @TomAugspurger excellent new commit on CategoricalDType, I've changed the location + added a small text description.

#17653 had too many back-ticks in the links in section "Examples". I've removed those.

@@ -223,7 +223,7 @@ class Categorical(PandasObject):

See also
--------
pandas.api.types.CategoricalDtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this property xref here? (CategoricalDtype is NOT in the main namespace)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've changed it to api.types.CategoricalDtype. I had misunderstood the location.

Copy link
Contributor Author

@topper-123 topper-123 Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more into it, I actually think CategoricalDtype isn't shown at all in the generated API docs, so the reference will be not work in any case. Is this correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is not that 'pandas' is there (that is perfectly OK, and in this case I even prefer to show the full path for clarity, as api is not a commonly used submodule of pandas), the issue is that this page does not exist, so it cannot link to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put pandas. back in then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. You can also add it to the api.rst pages so the link will work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change uploaded.

There seems to be something in the API.rst (line 649), but it mustn't work. @TomAugspurger has started looking into it, I'll wait till he figures it out.

If I get to know what I should add to API.rst, I can add it to this PR.

Copy link
Contributor

@TomAugspurger TomAugspurger Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In api.rst it should be pandas.api.type.CategoricalDtype (missing the pandas).

But the autoclass fit the best with the rest of the autosummary's, so I may change it. For now, I'd just leave your reference as pandas.api.type.CategoricalDtype, and I'll make sure they're valid before the release.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 26, 2017 via email

get_locs : Given a tuple of slices/lists/labels/boolean indexer to a
level-wise spec, produce an indexer to extract those
locations.
get_locs : Get location for a label/slice/list/mask or a sequence of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_locs -> MultiIndex.get_locs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


Parameters
----------
key : tuple of (slices/list/labels)
seq : label/slice/list/mask or a sequence of such.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to explain how the 'sequence of such' should be interpreted

(I should check the code myself to actually know it, but are the different elements of the list mapped to the different levels?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add here "(one for each level)" as you did in get_loc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a new proposal. This is very difficult to explain in short sentences, I find.


Returns
-------
locs : integer list of locations or boolean indexer suitable
locs : integer, slice or boolean mask suitable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct: you changed 'integer list of locations' to 'integer', but from the examples it seems to return an array of integers, so the 'list' was more correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, tup indicates that this method takes a tuple, but it also takes other sequences like lists. In an example I tried it actually choked on tuple, because it thought the tuple was the label of a level. My experience from the last days it therefore is that it's probaby best to use lists rather than tuples (but tuple may work, depending on context).

On the other hand get_loc (without s) demands a tuple and chokes on lists, so it's not so easy to remember what is accepted where.

I can say "list or such", to discourage using tuples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my comment was about the return value, not the input (and there indeed tuples vs lists sometimes are slightly different but not always, which can be confusing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I actually couldn't manage to get it to return a boolean mask.

Could you confirm that? I haven't changed that part, but if it's not true, it'd be better it didn't say that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looking at the code, I actually think it can only return an integer array. That would mean that the explanation is just wrong.

But I am not too familiar with this. (@jreback ?)

@topper-123
Copy link
Contributor Author

Is there anything I should do here wrt. the CircleCI tests failing? The test failure doesn't seem to related to my PR.

@jorisvandenbossche
Copy link
Member

No, you don't need to care, there are some problems in general: #17702

@topper-123 topper-123 force-pushed the multiindex.get_locs branch 2 times, most recently from ddd9eab to 9d3df60 Compare September 28, 2017 10:16
>>> mi = pd.MultiIndex.from_arrays([list('abb'), list('def')])

>>> mi.get_locs('b')
slice(1, 3, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run this locally, I actually get an array as result:

In [109]: mi = pd.MultiIndex.from_arrays([list('abb'), list('def')])

In [110]: mi.get_locs('b')
Out[110]: array([1, 2])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, me too, actually. It seems the func then only return arrays?

@topper-123 topper-123 force-pushed the multiindex.get_locs branch 3 times, most recently from cba6753 to 1b0ab88 Compare September 29, 2017 23:18
@topper-123
Copy link
Contributor Author

I've looked some more at the return value. The return is always index._values and index is always numerical. So the method does indeed only return integer arrays. I've changed the PR correspondingly.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final minor comments!

Notes
------
The key cannot be a slice, list of same-level labels, a boolean mask,
or a sequence of such. If you want to use those, use :meth:`get_locs`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_locs -> MultiIndex.get_locs (I think, it would actually be interesting to know whether it works like this if it is a method of the same class. Did you build it locally and checked? But I think this will not work, because in the generated rst docstring pages, a .. currentmodule:: pandas is included)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've Changed it.

It it possible to to build just the API docs or maybe a subgroup thereof? Working with doc is a but cumbersome, when building it takes so much time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you pass the arg —�single api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can indeed be very cumbersome. It's a very long time ago that I tried the --single flag. It was also meant to eg run only a single .rst file. But I can remember it also didn't work smoothly.

If you think of other ways to make this easier, very welcome! (I once had a script to build only a single docstring, I could try to look that up, but not sure that such links then would work anyhow)

key : tuple of (slices/list/labels)
seq : label/slice/list/mask or a sequence of such (one for each used
level. If a level should not be used, set it to
``slice(None)``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line type specifications don't really work (does not render correctly in sphinx, unless you use \ to indicate line continuation). But you can also spit this up:

seq : label/slice/list/mask or a sequence of such (one for each level)
    More explanation ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@jorisvandenbossche jorisvandenbossche merged commit b8467c0 into pandas-dev:master Sep 30, 2017
@jorisvandenbossche
Copy link
Member

@topper-123 Thanks a lot!

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 2, 2017
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
@topper-123 topper-123 deleted the multiindex.get_locs branch October 9, 2017 21:00
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants