-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer NA docs #23617
Integer NA docs #23617
Conversation
Codecov Report
@@ Coverage Diff @@
## master #23617 +/- ##
=======================================
Coverage 31.88% 31.88%
=======================================
Files 166 166
Lines 52425 52425
=======================================
Hits 16717 16717
Misses 35708 35708
Continue to review full report at Codecov.
|
Maybe |
@jreback any objections to having this in it's own document? I think our docs would benefit from breaking up some of the extremely long pages :) |
no this is ok |
Sure, then we can check the doc build. |
doc/source/integer_na.rst
Outdated
In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent | ||
missing data. Because ``NaN`` is a float, this forces an array of integers with | ||
any missing values to become floating point. In some cases, this may not matter | ||
much. But if your integer column is, say, and identifier, casting to float can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much. But if your integer column is, say, and identifier, casting to float can | |
much. But if your integer column is, say, an identifier, casting to float can |
doc/source/integer_na.rst
Outdated
missing data. Because ``NaN`` is a float, this forces an array of integers with | ||
any missing values to become floating point. In some cases, this may not matter | ||
much. But if your integer column is, say, and identifier, casting to float can | ||
be problematic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty niche case, but could also make reference to integers not representable in float64 space, e.g.
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just couple of comments, and resolving conflicts is required, but I think it can be merged. @TomAugspurger
Fixed the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -0,0 +1,84 @@ | |||
.. currentmodule:: pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not important, but the .. currentmodule:: pandas
is also included in the {{ header }}
(we only kept it in the api.rst
because the autosummaries need it before the header is rendered)
be problematic. Some integers cannot even be represented as floating point | ||
numbers. | ||
|
||
Pandas can represent integer data with possibly missing values using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll know better, but I had the impression that we use lowercase pandas
even at the beginning of sentences.
@@ -751,3 +742,19 @@ However, these can be filled in using :meth:`~DataFrame.fillna` and it will work | |||
|
|||
reindexed[crit.fillna(False)] | |||
reindexed[crit.fillna(True)] | |||
|
|||
Pandas provides a nullable integer dtype, but you must explicitly request it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, if correct
@TomAugspurger can you merge master |
Will do. Waiting for pandas.array to be merged first, since I use it here.
…On Fri, Dec 28, 2018 at 2:35 PM Jeff Reback ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> can you merge master
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23617 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIhlFWxNsqm3njlhHiwjz2RilOdbdks5u9oCJgaJpZM4YYL1e>
.
|
@TomAugspurger right hahah I approved so went to dependent PR. ok then, this looks ok. |
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=6029 some pesky warnings |
this is passing now. @TomAugspurger did you have any other changes here? |
Thanks for fixing it up. |
* upstream/master: BUG: output formatting with to_html(), index=False and/or index_names=False (pandas-dev#22579, pandas-dev#22747) (pandas-dev#22655) MAINT: Port _timelex in codebase (pandas-dev#24520) Implement unique+array parts of 24024 (pandas-dev#24527) Integer NA docs (pandas-dev#23617)
* wip * DOC: Integer NA Closes pandas-dev#22003 * subsection * update * fixup * add back construction for docs
* wip * DOC: Integer NA Closes pandas-dev#22003 * subsection * update * fixup * add back construction for docs
Closes #22003
In the issue @chris-b1 said
Do people have thoughts on that?
Earlier @jreback suggested putting all this in
missing_data.rst
, rather than a new page. But that page is already quite long. I think it's OK to spread out a little.I tried to track down references to missing data and integers in the docs to add links to integer-NA. Let me know if you see any I missed.
Note: this uses
pd.array
and some types added to the API in #23581, so not all the examples will run.