Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: 2D compat for DTA tz_localize, to_period #37950

Merged
merged 5 commits into from
Dec 17, 2020

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@@ -80,6 +81,26 @@
DatetimeLikeArrayT = TypeVar("DatetimeLikeArrayT", bound="DatetimeLikeArrayMixin")


def ravel_compat(meth):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually could be in a more general location, maybe pandas/core/array_alogs/ somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe core.arraylike? For now im just excited about getting __array_function__ working on NDArrayBackedExtensionArray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, wait, the implementation here is specific to NDArrayBackedExtensionArray, will move it there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved + green

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Nov 19, 2020
@jbrockmendel
Copy link
Member Author

btw who should i bug about resample/BinGrouper?

@jreback
Copy link
Contributor

jreback commented Nov 19, 2020

btw who should i bug about resample/BinGrouper?

me or @mroeschke

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the rationale?

@@ -681,12 +681,13 @@ def value_counts(self, dropna=False):

cls = type(self)

result = value_counts(values, sort=False, dropna=dropna)
result = value_counts(values.ravel("K"), sort=False, dropna=dropna)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the ravel here? (value_counts is always done on a 1D array AFAIK)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this one cannot be ravel_compat (instead assert values.ndim == 1 here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated+green

@jbrockmendel
Copy link
Member Author

Can you explain the rationale?

Two short-term motivations. First, a number of places in DatetimeLikeBlock would be simpler if we just stored the EA as self.values, which requires more robust 2D support. Second, I've got a branch implementing __array_function__ and got some failures (dont remember which off the top of my head) that pushed this up the priority list.

@jorisvandenbossche
Copy link
Member

a number of places in DatetimeLikeBlock would be simpler if we just stored the EA as self.values, which requires more robust 2D support

Can you give a bit more details about which places that would be?
I am personally -1 on storing the EA on 2D blocks. The 2D EA was a compromise for the block-wise ops, but expanding its usage first requires discussion IMO.

Second, I've got a branch implementing __array_function__ and got some failures

How is to_period or tz_localize related to __array_function__?

Also, IMO we should only add this for those functions for which we actually start using it. None of the changes here are actually already used somewhere?

@jbrockmendel
Copy link
Member Author

a number of places in DatetimeLikeBlock would be simpler if we just stored the EA as self.values, which requires more robust 2D support

Can you give a bit more details about which places that would be?

Most of these have # TODO(EA2D): comments: iget, diff, shift, set_inplace, get_values, quantile, fillna

@jorisvandenbossche
Copy link
Member

Most of these have # TODO(EA2D): comments: iget, diff, shift, set_inplace, get_values, quantile, fillna

I know you have been adding a lot of those comments, but having those comments doesn't mean there is consensus on further expanding the 2D capability of the datetimelike arrays ..

We could also make the DatetimeBlock a 1D non-consolidatable block, and that would also solve the datetime-specific TODO(EA2D) comments

@jbrockmendel
Copy link
Member Author

We could also make the DatetimeBlock a 1D non-consolidatable block

We're going to do essentially that in ArrayManager, which we're implementing so we can try it out and see what the performance/complexity impact is. Fleshing out the 2D support for datetimelike arrays is part of making it possible to also try out the EA2D approach.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

We could also make the DatetimeBlock a 1D non-consolidatable block

We're going to do essentially that in ArrayManager, which we're implementing so we can try it out and see what the performance/complexity impact is. Fleshing out the 2D support for datetimelike arrays is part of making it possible to also try out the EA2D approach.

I agree here. the array manager perf implications will determine our ultimate direction. e.g. if we get good performance then we should certainly move everything as it makes things code-wise way simpler. if not we have to re-think and potentially support 2d EA.

@@ -681,12 +681,13 @@ def value_counts(self, dropna=False):

cls = type(self)

result = value_counts(values, sort=False, dropna=dropna)
result = value_counts(values.ravel("K"), sort=False, dropna=dropna)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this one cannot be ravel_compat (instead assert values.ndim == 1 here)

@jreback jreback added this to the 1.3 milestone Dec 17, 2020
@jreback jreback merged commit f197ca5 into pandas-dev:master Dec 17, 2020
@jreback
Copy link
Contributor

jreback commented Dec 17, 2020

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the dta-tda-2d branch December 17, 2020 17:47
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants