REF: Simplify Datetimelike constructor dispatching #23140

jbrockmendel · 2018-10-13T22:13:14Z

Implement several missing tests, particularly for TimedeltaArray

Move several things to DatetimeLikeArrayMixin that will need to be there eventually.

Misc cleanups.

pep8speaks · 2018-10-13T22:13:20Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/arrays/period.py !
There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimelike.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/core/indexes/period.py !
There are no PEP8 issues in the file pandas/core/indexes/timedeltas.py !
There are no PEP8 issues in the file pandas/io/pytables.py !
There are no PEP8 issues in the file pandas/tests/arrays/test_datetimelike.py !
There are no PEP8 issues in the file pandas/tests/indexes/datetimelike.py !

codecov · 2018-10-14T01:17:48Z

Codecov Report

Merging #23140 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23140      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50959    50986      +27     
==========================================
+ Hits        46980    47009      +29     
+ Misses       3979     3977       -2

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.28% <49.18%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/period.py	`93.22% <ø> (-0.02%)`	⬇️
pandas/core/arrays/datetimelike.py	`94.97% <100%> (+0.07%)`	⬆️
pandas/core/indexes/datetimelike.py	`98.25% <100%> (+0.02%)`	⬆️
pandas/core/arrays/timedeltas.py	`94.47% <100%> (+0.5%)`	⬆️
pandas/compat/numpy/function.py	`87.97% <100%> (+1.31%)`	⬆️
pandas/core/indexes/timedeltas.py	`90.65% <100%> (-0.12%)`	⬇️
pandas/core/arrays/period.py	`95.97% <100%> (+0.4%)`	⬆️
pandas/io/pytables.py	`92.44% <100%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`96.47% <100%> (-0.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 913f71f...b5827c7. Read the comment docs.

jreback · 2018-10-14T13:13:43Z

pandas/core/arrays/datetimelike.py

@@ -332,6 +344,9 @@ def _validate_frequency(cls, index, freq, **kwargs):
            # Frequency validation is not meaningful for Period Array/Index
            return None

+        # DatetimeArray may pass `ambiguous`, nothing else allowed


why is this? can you comment

Will clarify this comment. kwargs gets passed below to cls._generate_range, and the only kwarg that is valid there is "ambiguous", and that is only for DatetimeArray.

jreback · 2018-10-14T13:14:48Z

pandas/core/arrays/period.py

            values = dt64arr_to_periodarr(values, freq)

+        elif is_object_dtype(values) or isinstance(values, (list, tuple)):


shouldn't this be is_list_like? (for the isinstance check)

This is specifically for object dtype (actually, I need to add dtype=object to the np.array call below) since we're calling libperiod.extract_ordinals, which expects object dtype.

specifically what happens if other non ndarray list likes hit this path? do they need handling?

They do need handling, but we're not there yet. The thought process for implementing these constructors piece-by-piece is

a) The DatetimeIndex/TimedeltaIndex/PeriodIndex constructors are overgrown; let's avoid that in the Array subclasses.
b) Avoid letting the implementations get too far ahead of the tests

Other question: where was this handled previously?

It's hard for me to say what's better in the abstract.

From the WIP PeriodArray PR, I found that having to think carefully about what type of data I had forced some clarity in the code. I liked having to explicitly reach for that _from_periods constructor.

Regardless, I think our two goals with the array constructors should be

Maximizing developer happiness (i.e. not users at the moment)

Making it easier to reuse code between Index & Array subclasses

If you think we're likely to end up in a situation where being able to pass an array of objects to the main __init__ will make things easier, then by all means.

i am a bit puzzled why you would handle lists and and ndarray differently (tom and joris); these are clearly doing the same thing and we have a very similar handling for list likes throughout pandas

separating these is a non starter - even having a separate constructor is also not very friendly. pandas does inference on the construction which is one of the big selling points. trying to change this, esp at the micro level is a huge mental disconnect.

if you want to propose something like that pls do it in other issues.

i am a bit puzzled why you would handle lists and and ndarray differently (tom and joris)

I don't think we are.

But, my only argument was

From the WIP PeriodArray PR, I found that having to think carefully about what type of data I had forced some clarity in the code. I liked having to explicitly reach for that _from_periods constructor.

If that's not persuasive then I'm not going to argue against handling them in the init.

having to think carefully

+1

Maximizing developer happiness

+1

Making it easier to reuse code

+1

If you think we're likely to end up in a situation where being able to pass an array of objects to the main

Yes, I think we should be pretty forgiving about what gets accepted into __init__ (for all three of Period/Datetime/Timedelta Arrays). Definitely don't want the start, end, periods currently in the Index subclass constructors. I think by excluding those we'll keep these constructors fairly straightforward.

i am a bit puzzled why you would handle lists and and ndarray differently

It's not about lists vs arrays, it's about arrays of Period objects vs arrays of ordinal integers, which is something very different.

I think we should be pretty forgiving about what gets accepted into init

Being forgiving is exactly what lead to the complex Period/DatetimeIndex constructors. I think we should not make the same choice for our Array classes.
Of course it doesn't need to be that complex, as I think there are here two main usecases discussed: an array of scalar objects (eg Periods or Timestamps), or an array of the underlying storage type (eg datetime64 or ordinal integers).

I personally also think it makes the code clearer to even separate those two concepts (basically what we also did with IntegerArray), but maybe let's open an issue to further discuss that instead of here in a hidden review comment thread? (i can only open one later today )

jreback · 2018-10-14T13:15:23Z

pandas/core/arrays/period.py

            values = dt64arr_to_periodarr(values, freq)

+        elif is_object_dtype(values) or isinstance(values, (list, tuple)):
+            # e.g. array([Period(...), Period(...), NaT])
+            values = np.array(values)


what if this is an int array? or is that prohibited? (except via _from_ordinals)

Then it gets passed through simple_new unchanged.

jreback · 2018-10-14T13:16:21Z

pandas/core/indexes/datetimelike.py

@@ -430,6 +430,10 @@ def min(self, axis=None, *args, **kwargs):
        --------
        numpy.ndarray.min
        """
+        if axis is not None and axis >= self.ndim:
+            raise ValueError("`axis` must be fewer than the number of "


don't do this here, rather this should be in valididate_* functions (if you think this is really necessary and you have a test for it)

jreback · 2018-10-14T13:16:45Z

pandas/core/indexes/datetimelike.py

@@ -458,6 +462,10 @@ def argmin(self, axis=None, *args, **kwargs):
        --------
        numpy.ndarray.argmin
        """
+        if axis is not None and axis >= self.ndim:
+            raise ValueError("`axis` must be fewer than the number of "


same for all of these

jreback · 2018-10-14T13:17:06Z

pandas/core/indexes/datetimes.py

-            return cls._generate_range(start, end, periods, name, freq,
-                                       tz=tz, normalize=normalize,
-                                       closed=closed, ambiguous=ambiguous)
+            out = cls._generate_range(start, end, periods,


out -> result

Will update.

jreback · 2018-10-14T13:17:37Z

pandas/tests/arrays/test_datetimelike.py

@@ -45,6 +45,19 @@ def datetime_index(request):
    return pi


+@pytest.fixture
+def timedelta_index(request):


eventually promote these to conftest

Agreed. For now this is a pretty bare-bones version to get the ball rolling.

…dlike8

…ike8

jreback · 2018-10-14T16:58:44Z

pandas/core/arrays/datetimelike.py

@@ -344,7 +344,8 @@ def _validate_frequency(cls, index, freq, **kwargs):
            # Frequency validation is not meaningful for Period Array/Index
            return None

-        # DatetimeArray may pass `ambiguous`, nothing else allowed
+        # DatetimeArray may pass `ambiguous`, nothing else will be accepted
+        # by cls._generate_range below


why wouldn’t u just pop the kwarg for key and pass it directly?

Hmm actually that ends up being appreciably more verbose. We have to do separate cls._generate_range calls for TimedeltaArray vs DatetimeArray

jreback · 2018-10-14T16:59:19Z

pandas/core/arrays/period.py

            values = dt64arr_to_periodarr(values, freq)

+        elif is_object_dtype(values) or isinstance(values, (list, tuple)):


specifically what happens if other non ndarray list likes hit this path? do they need handling?

jreback · 2018-10-14T17:00:13Z

pandas/core/indexes/datetimelike.py

-            raise ValueError("`axis` must be fewer than the number of "
-                             "dimensions ({ndim})".format(ndim=self.ndim))
-
+        _validate_minmax_axis(axis)


not what i mean
add this specifically to no.validate_* there are mechanisms for this already

I see, done.

jreback · 2018-10-14T17:00:27Z

pandas/core/indexes/datetimelike.py

+    Raises
+    ------
+    ValueError
+    """


see my comment above

jorisvandenbossche · 2018-10-14T05:52:12Z

pandas/core/indexes/datetimes.py

-        # TODO: Remove this when we have a DatetimeTZArray
-        # Necessary to avoid recursion error since DTI._values is a DTI
-        # for TZ-aware
-        return self._ndarray_values.size


Why are you removing those? Those will need to be added back once we do the actual index/array split anyway, as they will be calling in the underlying array?

Why are you removing those? Those will need to be added back

Because I am OK with needing to add them back in a few days (hopefully)

But can you then try to explain me what the advantage is of moving it now?

To make it clear what still needs to be moved/implemented at the Array level. e.g. Tom's PeriodArray PR implements some things in PeriodArray that should instead be in DatetimeLikeArrayMixin. Moving these prevents this kind of mixup.

Because there are already a bunch of things that are going to need to be inherited from self.values, its better to get them all in one place and do that all at once.

Because in the next pass I'll be implementing a decorator to do something like:

# TODO: enable this decorator once Datetime/Timedelta/PeriodIndex .values # points to a pandas ExtensionArray # @inherit_from_values(["ndim", "shape", "size", "nbytes", # "asi8", "freq", "freqstr"]) class DatetimeIndexOpsMixin(DatetimeLikeArrayMixin):

Moving these prevents this kind of mixup.

As long as one of the index classes is still inheriting from the ArrayMixin, there will be wrong / strange mixups, that need to be cleaned up

Because in the next pass I'll be implementing a decorator to do something like:

But how would you do that if the underlying values don't yet have those attributes, because it is not yet our internal array class?

And why not move them when implementing such a decorator? Then you actually have overview of the full changes.

You have sufficiently frustrated me into reverting this so we can move this down the field.

@jorisvandenbossche if you're still up, can you take a look at the newest push and verify that the parts you have a problem with have been removed?

…ike8

jorisvandenbossche · 2018-10-15T07:37:52Z

pandas/core/arrays/datetimelike.py

@@ -211,6 +219,10 @@ def astype(self, dtype, copy=True):
    # ------------------------------------------------------------------
    # Null Handling

+    def isna(self):
+        # EA Interface
+        return self._isnan


Is it needed to have the _isnan concept on the arrays? We use it in some internal methods on the Index class, but for Arrays it seems to me additional complexity compared to simply defining isna appropriately on each Array ?

Discussed elsewhere; can we mark as resolved?

jorisvandenbossche · 2018-10-15T07:43:33Z

pandas/core/indexes/datetimelike.py

@@ -430,6 +430,7 @@ def min(self, axis=None, *args, **kwargs):
        --------
        numpy.ndarray.min
        """
+        nv.validate_minmax_axis(axis)
        nv.validate_min(args, kwargs)


Is there reason not to add the axis validation to the existing validate_min ?

exactly I don't want another function, rather you can simply check this in side the function which is already there.

Done. I'm not wild about the fact that the nv.validate_(min|max|argmin|argmax) functions now implicitly assume they are only being called on 1-dim objects, but at least the assumption is correct for now.

Hmm, yeah, that makes sense.
And adding them in a single validation is actually also mixing two kinds of validation: validation of arguments that are purely for numpy compat (things like out), opposed to validation of valid arguments for pandas (axis in the Series and Index methods is also there for consistency with DataFrame, than for compat with numpy)

jorisvandenbossche · 2018-10-15T07:43:38Z

pandas/core/arrays/period.py

            values = dt64arr_to_periodarr(values, freq)

+        elif is_object_dtype(values) or isinstance(values, (list, tuple)):


Other question: where was this handled previously?

jreback · 2018-10-15T12:38:38Z

pandas/core/indexes/datetimelike.py

@@ -430,6 +430,7 @@ def min(self, axis=None, *args, **kwargs):
        --------
        numpy.ndarray.min
        """
+        nv.validate_minmax_axis(axis)
        nv.validate_min(args, kwargs)


exactly I don't want another function, rather you can simply check this in side the function which is already there.

jbrockmendel · 2018-10-15T15:49:36Z

can u just make the validation for axis generic

See joris's comment above.

…ike8

jbrockmendel · 2018-10-18T04:39:35Z

The non-controversial parts of this have been ported to separate PRs. Closing.

jbrockmendel added 8 commits October 13, 2018 13:16

Avoid non-public constructors

f13cc58

simplify and de-duplicate _generate_range

4188ec7

Check for invalid axis kwarg

7804f1b

Move some EA properties up to mixins

a4775f4

implement basic TimedeltaArray tests

8ee34fa

clean up PeriodArray constructor, with tests

78943c1

make PeriodArray.__new__ more grown-up

aa71383

Remove unused kwargs from TimedeltaArray.__new__

eae8389

jbrockmendel added 2 commits October 13, 2018 16:35

revert change that broke tests

e871733

Fixup whitespace

7840f91

jbrockmendel mentioned this pull request Oct 14, 2018

REF: Simplify Period/Datetime Array/Index constructors #23093

Merged

jreback requested changes Oct 14, 2018

View reviewed changes

jreback added Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode Period Period data type labels Oct 14, 2018

jbrockmendel added 4 commits October 14, 2018 08:16

helper function for axis validation

ec50b0b

suggested clarifications

eb7a6b6

Merge branch 'dlike8' of https://github.com/jbrockmendel/pandas into …

32c6391

…dlike8

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

c903917

…ike8

jreback requested changes Oct 14, 2018

View reviewed changes

move axis validation to nv

b97ec96

jorisvandenbossche requested changes Oct 14, 2018

View reviewed changes

jbrockmendel added 3 commits October 14, 2018 12:48

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

11db555

…ike8

revert some removals

147de57

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

7c4d281

…ike8

jorisvandenbossche reviewed Oct 15, 2018

View reviewed changes

jreback requested changes Oct 15, 2018

View reviewed changes

catch too-negative values

b90f421

Roll validate_minmax_axis into existing validate functions

dc4f474

fixup typo

46d5e64

TomAugspurger mentioned this pull request Oct 16, 2018

Datetimelike Array Refactor #23185

Closed

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

b5827c7

…ike8

This was referenced Oct 17, 2018

validate min/max axis #23206

Merged

TST: bare-bones fixture for timedelta array tests #23207

Merged

jorisvandenbossche mentioned this pull request Oct 17, 2018

API: Index and Array constructors design #23212

Closed

jbrockmendel mentioned this pull request Oct 18, 2018

CLN: de-duplicate generate_range #23218

Merged

jbrockmendel closed this Oct 18, 2018

jbrockmendel deleted the dlike8 branch October 18, 2018 04:39

		values = dt64arr_to_periodarr(values, freq)

		elif is_object_dtype(values) or isinstance(values, (list, tuple)):

REF: Simplify Datetimelike constructor dispatching #23140

REF: Simplify Datetimelike constructor dispatching #23140

Conversation

jbrockmendel commented Oct 13, 2018

pep8speaks commented Oct 13, 2018

codecov bot commented Oct 14, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Oct 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 15, 2018

jbrockmendel commented Oct 18, 2018

codecov bot commented Oct 14, 2018 •

edited

Loading

jreback Oct 16, 2018 •

edited

Loading