Integer NA docs #23617

TomAugspurger · 2018-11-10T21:14:15Z

Closes #22003

In the issue @chris-b1 said

More clear separation from numpy dtype - worry that 'int64' vs 'Int64' will be especially confusing for new people? Consider a different name altogether? (NullableInt64?)

Do people have thoughts on that?

Earlier @jreback suggested putting all this in missing_data.rst, rather than a new page. But that page is already quite long. I think it's OK to spread out a little.

I tried to track down references to missing data and integers in the docs to add links to integer-NA. Let me know if you see any I missed.

Note: this uses pd.array and some types added to the API in #23581, so not all the examples will run.

Closes pandas-dev#22003

codecov · 2018-11-10T22:45:14Z

Codecov Report

Merging #23617 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23617   +/-   ##
=======================================
  Coverage   31.88%   31.88%           
=======================================
  Files         166      166           
  Lines       52425    52425           
=======================================
  Hits        16717    16717           
  Misses      35708    35708

Flag	Coverage Δ
#multiple	`30.29% <ø> (ø)`	⬆️
#single	`31.88% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4cd00b...0c9995f. Read the comment docs.

WillAyd · 2018-11-12T06:11:51Z

Maybe IntNA64? Perhaps not terribly readable but aligns with how we've referenced it

TomAugspurger · 2018-11-13T16:43:56Z

@jreback any objections to having this in it's own document? I think our docs would benefit from breaking up some of the extremely long pages :)

jreback · 2018-11-13T16:57:53Z

no this is ok
though we should merge the pd.array change first no?

TomAugspurger · 2018-11-13T16:59:10Z

Sure, then we can check the doc build.

chris-b1 · 2018-11-18T15:40:59Z

doc/source/integer_na.rst

+In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
+missing data. Because ``NaN`` is a float, this forces an array of integers with
+any missing values to become floating point. In some cases, this may not matter
+much. But if your integer column is, say, and identifier, casting to float can


Suggested change

much. But if your integer column is, say, and identifier, casting to float can

much. But if your integer column is, say, an identifier, casting to float can

chris-b1 · 2018-11-18T15:44:10Z

doc/source/integer_na.rst

+missing data. Because ``NaN`` is a float, this forces an array of integers with
+any missing values to become floating point. In some cases, this may not matter
+much. But if your integer column is, say, and identifier, casting to float can
+be problematic.


Pretty niche case, but could also make reference to integers not representable in float64 space, e.g.
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

datapythonista

Just couple of comments, and resolving conflicts is required, but I think it can be merged. @TomAugspurger

doc/source/integer_na.rst

doc/source/whatsnew/v0.24.0.rst

TomAugspurger · 2018-12-20T18:16:19Z

Fixed the merge conflicts.

datapythonista

lgtm

datapythonista · 2018-12-20T18:37:34Z

doc/source/integer_na.rst

@@ -0,0 +1,84 @@
+.. currentmodule:: pandas


not important, but the .. currentmodule:: pandas is also included in the {{ header }} (we only kept it in the api.rst because the autosummaries need it before the header is rendered)

datapythonista · 2018-12-20T18:38:16Z

doc/source/integer_na.rst

+be problematic. Some integers cannot even be represented as floating point
+numbers.
+
+Pandas can represent integer data with possibly missing values using


You'll know better, but I had the impression that we use lowercase pandas even at the beginning of sentences.

datapythonista · 2018-12-20T18:38:54Z

doc/source/missing_data.rst

@@ -751,3 +742,19 @@ However, these can be filled in using :meth:`~DataFrame.fillna` and it will work

   reindexed[crit.fillna(False)]
   reindexed[crit.fillna(True)]
+
+Pandas provides a nullable integer dtype, but you must explicitly request it


same, if correct

jreback · 2018-12-28T20:35:17Z

@TomAugspurger can you merge master

TomAugspurger · 2018-12-28T20:36:47Z

Will do. Waiting for pandas.array to be merged first, since I use it here.

…

On Fri, Dec 28, 2018 at 2:35 PM Jeff Reback ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> can you merge master — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23617 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIhlFWxNsqm3njlhHiwjz2RilOdbdks5u9oCJgaJpZM4YYL1e> .

jreback · 2018-12-28T21:13:22Z

@TomAugspurger right hahah I approved so went to dependent PR. ok then, this looks ok.

jreback · 2018-12-29T14:34:24Z

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=6029 some pesky warnings

jreback · 2019-01-01T01:11:46Z

this is passing now. @TomAugspurger did you have any other changes here?

TomAugspurger · 2019-01-01T14:22:29Z

Thanks for fixing it up.

* upstream/master: BUG: output formatting with to_html(), index=False and/or index_names=False (pandas-dev#22579, pandas-dev#22747) (pandas-dev#22655) MAINT: Port _timelex in codebase (pandas-dev#24520) Implement unique+array parts of 24024 (pandas-dev#24527) Integer NA docs (pandas-dev#23617)

* wip * DOC: Integer NA Closes pandas-dev#22003 * subsection * update * fixup * add back construction for docs

TomAugspurger added 3 commits November 10, 2018 07:26

wip

06f8568

DOC: Integer NA

9e505a8

Closes pandas-dev#22003

subsection

4ae4f8d

TomAugspurger added the Docs label Nov 10, 2018

TomAugspurger added this to the 0.24.0 milestone Nov 10, 2018

TomAugspurger mentioned this pull request Nov 12, 2018

General repr format for our internal ExtensionArrays #22846

Closed

chris-b1 reviewed Nov 18, 2018

View reviewed changes

TomAugspurger mentioned this pull request Nov 28, 2018

API: Public data for Series and Index: .array and .to_numpy() #23623

Merged

2 tasks

TomAugspurger added 2 commits December 8, 2018 07:41

Merge remote-tracking branch 'upstream/master' into integer-na-docs

a6a7ba7

update

8d1d026

TomAugspurger mentioned this pull request Dec 13, 2018

RLS: 0.24.0 #24060

Closed

datapythonista reviewed Dec 16, 2018

View reviewed changes

doc/source/integer_na.rst Outdated Show resolved Hide resolved

doc/source/integer_na.rst Outdated Show resolved Hide resolved

doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into integer-na-docs

15a7b65

datapythonista approved these changes Dec 20, 2018

View reviewed changes

fixup

51c4353

Merge remote-tracking branch 'upstream/master' into integer-na-docs

0ef696d

jreback added 2 commits December 31, 2018 18:56

Merge branch 'master' into PR_TOOL_MERGE_PR_23617

d2a624d

add back construction for docs

0c9995f

TomAugspurger merged commit 4b6be69 into pandas-dev:master Jan 1, 2019

TomAugspurger deleted the integer-na-docs branch January 1, 2019 14:22

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Integer NA docs (pandas-dev#23617)

b6b343d

* wip * DOC: Integer NA Closes pandas-dev#22003 * subsection * update * fixup * add back construction for docs

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Integer NA docs (pandas-dev#23617)

b02a547

* wip * DOC: Integer NA Closes pandas-dev#22003 * subsection * update * fixup * add back construction for docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer NA docs #23617

Integer NA docs #23617

TomAugspurger commented Nov 10, 2018

codecov bot commented Nov 10, 2018 •

edited

Loading

WillAyd commented Nov 12, 2018

TomAugspurger commented Nov 13, 2018

jreback commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

chris-b1 Nov 18, 2018

chris-b1 Nov 18, 2018

datapythonista left a comment

TomAugspurger commented Dec 20, 2018

datapythonista left a comment

datapythonista Dec 20, 2018

datapythonista Dec 20, 2018

datapythonista Dec 20, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 via email

jreback commented Dec 28, 2018

jreback commented Dec 29, 2018

jreback commented Jan 1, 2019

TomAugspurger commented Jan 1, 2019

	much. But if your integer column is, say, and identifier, casting to float can
	much. But if your integer column is, say, an identifier, casting to float can

Integer NA docs #23617

Integer NA docs #23617

Conversation

TomAugspurger commented Nov 10, 2018

codecov bot commented Nov 10, 2018 • edited Loading

Codecov Report

WillAyd commented Nov 12, 2018

TomAugspurger commented Nov 13, 2018

jreback commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

chris-b1 Nov 18, 2018

Choose a reason for hiding this comment

chris-b1 Nov 18, 2018

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 20, 2018

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista Dec 20, 2018

Choose a reason for hiding this comment

datapythonista Dec 20, 2018

Choose a reason for hiding this comment

datapythonista Dec 20, 2018

Choose a reason for hiding this comment

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 via email

jreback commented Dec 28, 2018

jreback commented Dec 29, 2018

jreback commented Jan 1, 2019

TomAugspurger commented Jan 1, 2019

codecov bot commented Nov 10, 2018 •

edited

Loading