REF: make _aggregate_series_pure_python extraction behave like the cython version #29641

jbrockmendel · 2019-11-15T17:58:00Z

After some investigation it turns out that the DTA casting in _try_cast is made necessary because we incorrectly pass datetime64tz to _aggregate_series_fast, which calls the cython libreduction.SeriesGrouper, which casts the input to ndarray, losing the timezone. By not going through the cython path for datetime64tz, we avoid the need to re-cast.

That in turn surfaces a new problem, which is that _aggregate_series_pure_python checks the first group's result slightly differently than the cython version does (see libreduction._extract_result). This changes the way the pure-python version does it to more closely match the cython version. I intend to make these match more precisely in an upcoming pass.

If merged, makes #29589 unnecessary.

cc @jreback @jorisvandenbossche @WillAyd

…thon version

pep8speaks · 2019-11-15T17:58:03Z

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-17 16:02:37 UTC

jorisvandenbossche · 2019-11-15T19:44:52Z

Thanks for the explanation!
What is the impact on performance of not using the cython path for datetimetz ?

Can you add the examples I gave in the other PR as test cases?

jbrockmendel · 2019-11-15T22:02:43Z

Can you add the examples I gave in the other PR as test cases?

Good idea, will do.

What is the impact on performance of not using the cython path for datetimetz ?

No idea.

pandas/tests/groupby/aggregate/test_other.py

…ster-gb7

jreback · 2019-11-16T21:02:33Z

pandas/core/groupby/ops.py

+                    if len(res) == 1:
+                        # e.g. test_agg_lambda_with_timezone lambda e: e.head(1)
+                        # FIXME: are we potentially losing import res.index info?
+                        res = getattr(res, "_values", res)


can you just do
res = np.array(res)[0]? or
res = res.item() (though I think we have deprecated .item(), but are bringing it back.

np.array won't work because itll lose the timezone. im thinking next(iter(res)) with a comment saying its often equivalent to res[0]

I think .item() would be good here as well

.item() will be good if/when we un-deprecate it

is the current next(iter(res)) acceptable for the time being? extract_array(res)[0] would also work.

I'm eager to see this go in because I think we can get rid of a bunch more _try_cast calls

yep just mark it with the issue (or fix me) so it’s clear

…ster-gb7

WillAyd

lgtm

jreback · 2019-11-18T00:27:32Z

thanks.

are these incorrect on 0.25.3? if so, need a release note, pls add in a followup.

jbrockmendel · 2019-11-18T01:35:15Z

are these incorrect on 0.25.3? if so, need a release note, pls add in a followup

will do

jorisvandenbossche · 2019-11-18T07:43:40Z

pandas/core/groupby/ops.py

+
+                        # TODO: use `.item()` if/when we un-deprecate it.
+                        # For non-Series we could just do `res[0]`
+                        res = next(iter(res))


Why is this actually needed? For any later iteration, the len-1 res is just assigned below with result[label] = res, so why does it need to be unpacked for the first group?

We might be able to get rid of this, but at this stage the goal is just to make the behavior match libreduction._extract_result

But in the cython version that calls _extract_result, this is done for each group, not just the first (so in that sense it still doesn't match that)

But in the cython version that calls _extract_result, this is done for each group, not just the first (so in that sense it still doesn't match that)

This is correct. The more closely matching behavior is that only the first group is checked for array-like (there's also a discrepancy in what types of arraylikes are checked) (there's also^2 a discrepancy in that Reducer.get_result does a res = res.values check that is similar to _extract_result but not quite the same)

…thon version (pandas-dev#29641)

jorisvandenbossche · 2019-11-18T15:32:43Z

What is the impact on performance of not using the cython path for datetimetz ?

No idea.

Could you then check it?

…thon version (pandas-dev#29641)

)

REF: make _aggregate_series_pure_python extraction behave like the cy…

daaddad

…thon version

jbrockmendel mentioned this pull request Nov 15, 2019

REF: remove unnecessary _try_cast calls #29642

Merged

blackify

18bb6c9

jbrockmendel mentioned this pull request Nov 15, 2019

REF: Check before calling DTA._from_sequence instead of catching #29589

Closed

add tests

201bc27

jorisvandenbossche reviewed Nov 16, 2019

View reviewed changes

pandas/tests/groupby/aggregate/test_other.py Show resolved Hide resolved

jbrockmendel added 2 commits November 16, 2019 07:54

Merge branch 'master' of https://github.com/pandas-dev/pandas into fa…

f4799aa

…ster-gb7

added test

61f32e5

jreback requested changes Nov 16, 2019

View reviewed changes

jreback added Groupby Refactor Internal refactoring of code labels Nov 16, 2019

jreback added this to the 1.0 milestone Nov 16, 2019

jbrockmendel added 4 commits November 16, 2019 14:25

Merge branch 'master' of https://github.com/pandas-dev/pandas into fa…

2f96ac7

…ster-gb7

avoid _values getattr pattern

fb9f171

Merge branch 'master' of https://github.com/pandas-dev/pandas into fa…

30c1245

…ster-gb7

todo comment

d542515

WillAyd approved these changes Nov 18, 2019

View reviewed changes

jreback approved these changes Nov 18, 2019

View reviewed changes

jreback merged commit e1cadfa into pandas-dev:master Nov 18, 2019

jbrockmendel deleted the faster-gb7 branch November 18, 2019 01:35

jorisvandenbossche reviewed Nov 18, 2019

View reviewed changes

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019

REF: make _aggregate_series_pure_python extraction behave like the cy…

3c6ca03

…thon version (pandas-dev#29641)

jbrockmendel mentioned this pull request Nov 18, 2019

REF: dont _try_cast for user-defined functions #29698

Merged

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: make _aggregate_series_pure_python extraction behave like the cy…

7f8b11a

…thon version (pandas-dev#29641)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: make _aggregate_series_pure_python extraction behave like the cy…

070ea56

…thon version (pandas-dev#29641)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this pull request Jan 18, 2020

Edited validate_rst_title_capitalization.py for review (pandas-dev#29641

de06ec8

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: make _aggregate_series_pure_python extraction behave like the cython version #29641

REF: make _aggregate_series_pure_python extraction behave like the cython version #29641

jbrockmendel commented Nov 15, 2019

pep8speaks commented Nov 15, 2019 •

edited

Loading

jorisvandenbossche commented Nov 15, 2019

jbrockmendel commented Nov 15, 2019

jreback Nov 16, 2019

jbrockmendel Nov 16, 2019

WillAyd Nov 17, 2019

jbrockmendel Nov 17, 2019

jbrockmendel Nov 17, 2019

jreback Nov 17, 2019

jbrockmendel Nov 18, 2019

WillAyd left a comment

jreback commented Nov 18, 2019

jbrockmendel commented Nov 18, 2019

jorisvandenbossche Nov 18, 2019

jbrockmendel Nov 18, 2019

jorisvandenbossche Nov 18, 2019

jbrockmendel Nov 18, 2019

jorisvandenbossche commented Nov 18, 2019

REF: make _aggregate_series_pure_python extraction behave like the cython version #29641

REF: make _aggregate_series_pure_python extraction behave like the cython version #29641

Conversation

jbrockmendel commented Nov 15, 2019

pep8speaks commented Nov 15, 2019 • edited Loading

Comment last updated at 2019-11-17 16:02:37 UTC

jorisvandenbossche commented Nov 15, 2019

jbrockmendel commented Nov 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Nov 18, 2019

jbrockmendel commented Nov 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 18, 2019

pep8speaks commented Nov 15, 2019 •

edited

Loading