-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: fix DataFrame.isin docstring and doctests #22767
Conversation
Hello @Moisan! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #22767 +/- ##
=======================================
Coverage 92.18% 92.18%
=======================================
Files 169 169
Lines 50820 50820
=======================================
Hits 46850 46850
Misses 3970 3970
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good changes, added some ideas to improve the docstring even further.
pandas/core/frame.py
Outdated
@@ -7458,23 +7458,24 @@ def isin(self, values): | |||
1 False False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you start the example before the "when values is a list" with the constructor, and displaying the content of the original Dataframe (i.e. >>> df
)
Also, I think the docstring would be clearer if we use a more real world example. For example, if the index is an animal name, and the columns are num_legs
and num_wings
, a df.isin([2])
having a True
in falcon/wings
is very easy to see what it's doing. And not so much in a foo
, A
, 1, 2, 3
kind of example.
pandas/core/frame.py
Outdated
DataFrame of booleans | ||
DataFrame | ||
DataFrame of boolean showing whether each element in the DataFrame | ||
is contained in values. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a See Also
section? I think DataFrame.eq
and Series.str.isin
are worth referencing.
pandas/core/frame.py
Outdated
@@ -7458,23 +7458,24 @@ def isin(self, values): | |||
1 False False | |||
2 True False | |||
|
|||
When ``values`` is a dict: | |||
When ``values`` is a dict. Note that B doesn't match the 1 here. | |||
|
|||
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if we can use the same DataFrame for all the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements! Added some more comments
pandas/core/frame.py
Outdated
which must match. If `values` is a DataFrame, | ||
then both the index and column labels must match. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
DataFrame of boolean showing whether each element in the DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean -> booleans
pandas/core/frame.py
Outdated
|
||
When ``values`` is a dict: | ||
When ``values`` is a dict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add something to this sentence to explain what is happening when values is a dict. Something like "When values is a dict, we can pass values to check for each column separately:"
(similar for the example below as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Happy to merge it as it is, but added a couple of ideas of things that could avoid confusion. Just in case you think they are good ideas.
pandas/core/frame.py
Outdated
>>> df.isin([2]) | ||
num_legs num_wings | ||
falcon True True | ||
dog False False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using df.isin([0, 2])
? I like the example, but I don't like showing just the case with one value in values
, as it'd be better in that case to simply do df == 2
.
Also, I think adding a bit more description would help (e.g. When values is a list, check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)
)
pandas/core/frame.py
Outdated
2 True True | ||
num_legs num_wings | ||
falcon False True | ||
dog False False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the example, but I don't like that we have to use wrong information (a falcon without legs) ;)
Not sure if better of worse, but what do you think about using a DataFrame with just the num_wings
column instead? And may be we can get rid of the parrot, and show df2
content, so it's clearer to see what's going on?
Also, small detail, but I think we used other
instead of df2
in other docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see :). No problem I agree with the changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for another great docstring @Moisan, keep them coming :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@Moisan Thanks a lot! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Based on #22459. Fix the docstring for DataFrame.isin. I also updated
ci/doctests.sh
.