Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer NA docs #23617

Merged
merged 10 commits into from
Jan 1, 2019
Merged

Conversation

TomAugspurger
Copy link
Contributor

Closes #22003

In the issue @chris-b1 said

More clear separation from numpy dtype - worry that 'int64' vs 'Int64' will be especially confusing for new people? Consider a different name altogether? (NullableInt64?)

Do people have thoughts on that?

Earlier @jreback suggested putting all this in missing_data.rst, rather than a new page. But that page is already quite long. I think it's OK to spread out a little.

I tried to track down references to missing data and integers in the docs to add links to integer-NA. Let me know if you see any I missed.

Note: this uses pd.array and some types added to the API in #23581, so not all the examples will run.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Nov 10, 2018
@codecov
Copy link

codecov bot commented Nov 10, 2018

Codecov Report

Merging #23617 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23617   +/-   ##
=======================================
  Coverage   31.88%   31.88%           
=======================================
  Files         166      166           
  Lines       52425    52425           
=======================================
  Hits        16717    16717           
  Misses      35708    35708
Flag Coverage Δ
#multiple 30.29% <ø> (ø) ⬆️
#single 31.88% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4cd00b...0c9995f. Read the comment docs.

@WillAyd
Copy link
Member

WillAyd commented Nov 12, 2018

Maybe IntNA64? Perhaps not terribly readable but aligns with how we've referenced it

@TomAugspurger
Copy link
Contributor Author

@jreback any objections to having this in it's own document? I think our docs would benefit from breaking up some of the extremely long pages :)

@jreback
Copy link
Contributor

jreback commented Nov 13, 2018

no this is ok
though we should merge the pd.array change first no?

@TomAugspurger
Copy link
Contributor Author

Sure, then we can check the doc build.

In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
missing data. Because ``NaN`` is a float, this forces an array of integers with
any missing values to become floating point. In some cases, this may not matter
much. But if your integer column is, say, and identifier, casting to float can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
much. But if your integer column is, say, and identifier, casting to float can
much. But if your integer column is, say, an identifier, casting to float can

missing data. Because ``NaN`` is a float, this forces an array of integers with
any missing values to become floating point. In some cases, this may not matter
much. But if your integer column is, say, and identifier, casting to float can
be problematic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty niche case, but could also make reference to integers not representable in float64 space, e.g.
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

@TomAugspurger TomAugspurger mentioned this pull request Dec 13, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just couple of comments, and resolving conflicts is required, but I think it can be merged. @TomAugspurger

doc/source/integer_na.rst Outdated Show resolved Hide resolved
doc/source/integer_na.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

Fixed the merge conflicts.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -0,0 +1,84 @@
.. currentmodule:: pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important, but the .. currentmodule:: pandas is also included in the {{ header }} (we only kept it in the api.rst because the autosummaries need it before the header is rendered)

be problematic. Some integers cannot even be represented as floating point
numbers.

Pandas can represent integer data with possibly missing values using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll know better, but I had the impression that we use lowercase pandas even at the beginning of sentences.

@@ -751,3 +742,19 @@ However, these can be filled in using :meth:`~DataFrame.fillna` and it will work

reindexed[crit.fillna(False)]
reindexed[crit.fillna(True)]

Pandas provides a nullable integer dtype, but you must explicitly request it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, if correct

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

@TomAugspurger can you merge master

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 28, 2018 via email

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

@TomAugspurger right hahah I approved so went to dependent PR. ok then, this looks ok.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2018

@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

this is passing now. @TomAugspurger did you have any other changes here?

@TomAugspurger
Copy link
Contributor Author

Thanks for fixing it up.

@TomAugspurger TomAugspurger merged commit 4b6be69 into pandas-dev:master Jan 1, 2019
@TomAugspurger TomAugspurger deleted the integer-na-docs branch January 1, 2019 14:22
thoo added a commit to thoo/pandas that referenced this pull request Jan 1, 2019
* upstream/master:
  BUG: output formatting with to_html(), index=False and/or index_names=False (pandas-dev#22579, pandas-dev#22747) (pandas-dev#22655)
  MAINT: Port _timelex in codebase (pandas-dev#24520)
  Implement unique+array parts of 24024 (pandas-dev#24527)
  Integer NA docs (pandas-dev#23617)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* wip

* DOC: Integer NA

Closes pandas-dev#22003

* subsection

* update

* fixup

* add back construction for docs
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* wip

* DOC: Integer NA

Closes pandas-dev#22003

* subsection

* update

* fixup

* add back construction for docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants