API: added array #23581

TomAugspurger · 2018-11-08T21:50:33Z

Adds

a new top-level pd.array method for creating arrays
all our extension dtypes to the top-level API
adds pd.arrays for all our EAs

TODO

Add the actual array classes somewhere to the public API (pandas.arrays?)
API docs for the rest of the arrays and dtypes.

Closes #22860

supersedes #23532.

pep8speaks · 2018-11-08T21:50:39Z

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 28, 2018 at 22:13 Hours UTC

TomAugspurger · 2018-11-09T16:07:56Z

I'd like to restructure the documentation a bit. I want to collect similar things together, but we have two kinds of similarity:

First, we could group by "topic" (the type of data).

Categorical
- dtype
- Categorical
- CategoricalIndex?
Period
- dtype
- scalar
- array
- PeriodIndex(?) or leave that under indexes probably
Integer
- ...
Interval
- ...
Sparse
- ...

and then eventually DatetimeArray & TimedeltaArray.

Alternatively, we could group by "kind" first. So we'd have

Dtypes
- CategoricalDtype
- PeriodDtype
- ...
Scalars
- Period
- ...
Arrays
- Categorical
- PeriodArray
- ...

Do people have a preference between "by topic" and "by kind" (cc. @jorisvandenbossche @jreback @jbrockmendel @datapythonista)?

jorisvandenbossche · 2018-11-09T16:13:08Z

I'd like to restructure the documentation a bit.

I suppose you are only talking about the api.rst page?

TomAugspurger · 2018-11-09T16:14:50Z

Yes, sorry.

jorisvandenbossche · 2018-11-09T16:18:42Z

Not a strong opinion, but for adding new docs I would go by kind.
But that is in the assumption that the above is only for listing the arrays / dtypes, and that there are still separate sections about each topic anyway? (eg with all Interval-specific or datetime-like methods/attributes)

TomAugspurger · 2018-11-09T16:34:09Z

The nice thing about "by topic" is that we can give a high-level summary of what, e.g. "Period" is for. If we have things grouped "by kind" (dtypes, scalars, arrays), then we'd need to repeat that description, or just have it once.

jorisvandenbossche · 2018-11-09T16:58:47Z

The nice thing about "by topic" is that we can give a high-level summary of what, e.g. "Period" is for

But in those places you would also put all the custom attributes and methods?

I actually think we can be duplicated somewhat. Even if there are topical sections, I think it is still nice to have a short table of all dtypes and one for all array types.

TomAugspurger · 2018-11-09T22:41:08Z

Did a bit of inference in fe06de4. There are a few TODOs.

I think we don't handle intervals yet. #23553

pandas/core/arrays/array_.py

setup.cfg

codecov · 2018-11-10T13:16:56Z

Codecov Report

Merging #23581 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23581      +/-   ##
==========================================
+ Coverage   92.31%   92.32%   +0.01%     
==========================================
  Files         165      166       +1     
  Lines       52194    52240      +46     
==========================================
+ Hits        48182    48231      +49     
+ Misses       4012     4009       -3

Flag	Coverage Δ
#multiple	`90.74% <100%> (+0.01%)`	⬆️
#single	`43.07% <28.94%> (+0.09%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/period.py	`98.42% <ø> (ø)`	⬆️
pandas/core/arrays/interval.py	`93.04% <ø> (ø)`	⬆️
pandas/core/dtypes/dtypes.py	`95.33% <100%> (ø)`	⬆️
pandas/core/arrays/__init__.py	`100% <100%> (ø)`	⬆️
pandas/core/api.py	`100% <100%> (ø)`	⬆️
pandas/core/arrays/array_.py	`100% <100%> (ø)`
pandas/core/arrays/base.py	`98.23% <0%> (+0.03%)`	⬆️
pandas/core/arrays/sparse.py	`92.17% <0%> (+0.06%)`	⬆️
pandas/core/arrays/numpy_.py	`93.51% <0%> (+0.46%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1af4f5...1b9e251. Read the comment docs.

doc/source/api.rst

doc/source/whatsnew/v0.24.0.txt

pandas/arrays/__init__.py

pandas/core/arrays/array_.py

pandas/tests/arrays/test_array.py

jreback · 2018-11-11T14:39:43Z

pandas/tests/arrays/test_array.py

+     pd.IntervalArray.from_tuples([(1, 2), (3, 4)])),
+    ([0, 1], 'Sparse[int64]', pd.SparseArray([0, 1], dtype='int64')),
+    ([1, None], 'Int16', integer_array([1, None], dtype='Int16')),
+


can you also send in pd.Series / pd.Index of these types (or is this tested below)?

do you test pass thru to ndarray? (e.g. maybe use any_numpy_dtype fixture with the dtype specified)

It is tested below. Added a couple here as well though.

Not sure how any_numpy_dtype would be used here. We would need data to go with it.

@TomAugspurger right you would need to construct a dummy array and test that its passing in thru

Still not sure what this would look like, or what the test is for. I'm not especially worried about specific numpy types not getting through, and we do test that path.

jreback · 2018-11-11T14:41:32Z

pandas/tests/arrays/test_period.py


+
+def test_registered():


should this be an extension test? (or maybe in test_dtypes below)

I'm not really sure how to write a base test for that...

jreback · 2018-11-11T14:42:45Z

looks good @TomAugspurger mostly some docs & clarification comments, generally looks good though.

TomAugspurger · 2018-11-12T12:05:55Z

From #23581 (comment)

#23581 (comment)

No, we don't because I hadn't really considered what to do about that. Do people have thoughts on how to handle 2-D (or n-d) input inside pd.array? In general, it shouldn't be up to pd.array what is and isn't valid input. That's up to the individual array constructors. But is 2-D special since NumPy handles it but EAs don't?

TomAugspurger · 2018-12-11T14:15:59Z

Added a doc note about just doing 1-d arrays.

Holding off on NumPy / 2D changes for now, in case we decide we like Series.array always being an EA (will have a PR in an hour or so).

TomAugspurger · 2018-12-11T15:26:57Z

#24227 for those following along with the "always return an ExtensionArray" discussion. Let's keep that discussion over there.

But, specific to the pd.array function If we go with #24227, would we return a NumPyBackedExtensionArray, or raise, for cases like

pd.array([1, 2, 3])

?

jreback · 2018-12-11T15:46:19Z

But, specific to the pd.array function If we go with #24227, would we return a NumPyBackedExtensionArray, or raise, for cases like

pd.array([1, 2, 3])

I guess you have to raise on this until we have numpy backed EA (which I don't think we should try to put into 0.24.0).

Though I am ok with just returning a numpy array for now.

TomAugspurger · 2018-12-28T19:11:51Z

932e119 has the changes for PandasArray. API-wise, this means that pd.array always returns an ExtensionArray.

This implies that pd.array raises for non-1d input (scalars or 2+ dimensions).

jreback

small comments

jreback · 2018-12-28T19:27:45Z

doc/source/whatsnew/v0.24.0.rst

+
+A new top-level method :func:`array` has been added for creating 1-dimensional arrays (:issue:`22860`).
+This can be used to create any :ref:`extension array <extending.extension-types>`, including
+extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`.


don't you now have a ref in basics where this should point?

pandas/core/arrays/array_.py

* doc ref * use extract_array * use PandasArray._from_sequence

jreback · 2018-12-28T19:52:18Z

@TomAugspurger I am ok with merging this on green. Can follow up from the prior @jorisvandenbossche comment, which was about how we are handling string dtypes I think? can you create an issue for that discussion

TomAugspurger · 2018-12-28T19:58:57Z

Thanks for the review.

I think that https://github.com/pandas-dev/pandas/pull/23581/files#diff-69ac57923b848af43df327c311b79db4R90 handles @jorisvandenbossche's comments regarding string aliases for dtypes. In a world where we have a StringArray backed by apache arrow

pd.array(['a', 'b'], dtype=None/str)

would return a StringArray. But

pd.array(['a', 'b'], dtype=np.dtype(str))

would continue to return a PandasArray backed by an ndarray with dtype np.dtype("<U").

We should maybe emphasize more that if the underlying memory layout really matters to you, then you shouldn't be using pd.array. But I think this is a relatively rare case, and don't want to bog down users with low-level details...

jreback · 2018-12-28T20:07:19Z

#23581 (comment) sounds ok to me.

TomAugspurger · 2018-12-28T22:47:28Z

All green now.

jreback · 2018-12-28T22:58:54Z

awesome as always @TomAugspurger

jbrockmendel · 2023-04-04T19:43:06Z

pandas/core/arrays/array_.py

+
+    # this returns None for not-found dtypes.
+    if isinstance(dtype, compat.string_types):
+        dtype = registry.find(dtype) or dtype


@TomAugspurger do you remember if there was any particular reason for using this pattern instead of dtype = pandas_dtype(dtype)?

I don't recall. I wonder if this predates pandas_dtype handling extension dtypes.

added array

bfefc96

TomAugspurger added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 8, 2018

TomAugspurger mentioned this pull request Nov 9, 2018

Public API for extension arrays / types. #23532

Closed

TomAugspurger added 3 commits November 9, 2018 09:43

Merge remote-tracking branch 'upstream/master' into pd.array

51480a3

update registry test

dcb7931

update doc examples

a635649

wip

fb0d8bc

TomAugspurger added 3 commits November 9, 2018 16:11

Merge remote-tracking branch 'upstream/master' into pd.array

d58a320

inference

fe06de4

ia updates

72f7f06

TomAugspurger added 2 commits November 9, 2018 21:17

test fixup

c02e183

isort

a2d3146

jreback requested changes Nov 10, 2018

View reviewed changes

pandas/core/arrays/array_.py Show resolved Hide resolved

setup.cfg Outdated Show resolved Hide resolved

fixups

37901b0

TomAugspurger changed the title ~~[WIP]API: added array~~ API: added array Nov 10, 2018

TomAugspurger mentioned this pull request Nov 10, 2018

Integer NA docs #23617

Merged

jreback added this to the 0.24.0 milestone Nov 11, 2018

jreback requested changes Nov 11, 2018

View reviewed changes

docs on raising

faf114d

TomAugspurger mentioned this pull request Dec 11, 2018

NumPyBackedExtensionArray #24227

Merged

3 tasks

TomAugspurger added 4 commits December 12, 2018 09:56

Merge remote-tracking branch 'upstream/master' into pd.array

3186ded

Merge remote-tracking branch 'upstream/master' into pd.array

1c4da0e

Merge remote-tracking branch 'upstream/master' into pd.array

36c6f00

Updates for PandasArray

932e119

update docstring

45d07eb

jreback requested changes Dec 28, 2018

View reviewed changes

Updates

d1aba73

* doc ref * use extract_array * use PandasArray._from_sequence

jreback approved these changes Dec 28, 2018

View reviewed changes

TomAugspurger added 2 commits December 28, 2018 14:49

Merge remote-tracking branch 'upstream/master' into pd.array

981f735

fixed test expected

1f3bb50

jreback mentioned this pull request Dec 28, 2018

REF: DatetimeLikeArray #24024

Merged

12 tasks

TomAugspurger added 2 commits December 28, 2018 16:13

doc lint

c8d3960

Merge remote-tracking branch 'upstream/master' into pd.array

1b9e251

jreback merged commit 77f4b0f into pandas-dev:master Dec 28, 2018

jschendel mentioned this pull request Dec 28, 2018

DOC: Use top-level pd.IntervalArray in doc examples #24475

Merged

TomAugspurger deleted the pd.array branch December 29, 2018 02:40

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: added array (pandas-dev#23581)

0553b8b

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: added array (pandas-dev#23581)

89cd33c

jbrockmendel reviewed Apr 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: added array #23581

API: added array #23581

TomAugspurger commented Nov 8, 2018 •

edited

Loading

pep8speaks commented Nov 8, 2018 •

edited

Loading

TomAugspurger commented Nov 9, 2018 •

edited

Loading

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

codecov bot commented Nov 10, 2018 •

edited

Loading

jreback Nov 11, 2018

jreback Nov 11, 2018

TomAugspurger Nov 12, 2018

jreback Nov 12, 2018

TomAugspurger Nov 13, 2018

jreback Nov 11, 2018

TomAugspurger Nov 12, 2018

jreback commented Nov 11, 2018

TomAugspurger commented Nov 12, 2018

TomAugspurger commented Dec 11, 2018

TomAugspurger commented Dec 11, 2018

jreback commented Dec 11, 2018

TomAugspurger commented Dec 28, 2018

jreback left a comment

jreback Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

jbrockmendel Apr 4, 2023

TomAugspurger Apr 5, 2023 •

edited

Loading



		def test_registered():

API: added array #23581

API: added array #23581

Conversation

TomAugspurger commented Nov 8, 2018 • edited Loading

pep8speaks commented Nov 8, 2018 • edited Loading

Comment last updated on December 28, 2018 at 22:13 Hours UTC

TomAugspurger commented Nov 9, 2018 • edited Loading

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

jorisvandenbossche commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

codecov bot commented Nov 10, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 11, 2018

TomAugspurger commented Nov 12, 2018

TomAugspurger commented Dec 11, 2018

TomAugspurger commented Dec 11, 2018

jreback commented Dec 11, 2018

TomAugspurger commented Dec 28, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

Choose a reason for hiding this comment

TomAugspurger Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

TomAugspurger commented Nov 8, 2018 •

edited

Loading

pep8speaks commented Nov 8, 2018 •

edited

Loading

TomAugspurger commented Nov 9, 2018 •

edited

Loading

codecov bot commented Nov 10, 2018 •

edited

Loading

TomAugspurger Apr 5, 2023 •

edited

Loading