Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify to_pydatetime() #17592

Merged
merged 6 commits into from
Sep 22, 2017
Merged

Conversation

jbrockmendel
Copy link
Member

Up until #17331, Timestamp.microsecond was very slow. So previously it was faster to go through convert_to_tsobject to get a new datetime instance than it was to just return
datetime(self.year, self.month, self.day, self.hour, self.minute, self.second, self.microsecond, self.tzinfo)

Now it is slightly faster (and much simpler) to do it this way.

Before:

In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.to_pydatetime()
The slowest run took 13.88 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.44 µs per loop

After:

In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.to_pydatetime()
The slowest run took 8.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.29 µs per loop

With a tz-aware Timestamp the speedup is much bigger:

Before:

In [4]: import pytz
In [5]: ts2 = ts.replace(tzinfo=pytz.timezone('US/Pacific'))
In [6]: %timeit ts2.to_pydatetime()
The slowest run took 39.71 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 68.9 µs per loop

After:

In [7]: import pytz
In [8]: ts2 = ts.replace(tzinfo=pytz.timezone('US/Pacific'))
In [9]: %timeit ts2.to_pydatetime()
The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.47 µs per loop

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance labels Sep 19, 2017
@codecov
Copy link

codecov bot commented Sep 19, 2017

Codecov Report

Merging #17592 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17592      +/-   ##
==========================================
- Coverage   91.19%   91.18%   -0.02%     
==========================================
  Files         163      163              
  Lines       49627    49627              
==========================================
- Hits        45259    45250       -9     
- Misses       4368     4377       +9
Flag Coverage Δ
#multiple 88.96% <ø> (ø) ⬆️
#single 40.19% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3087ef...235c8d0. Read the comment docs.

dts.us, ts.tzinfo)

return datetime(self.year, self.month, self.day,
self.hour, self.minute, self.second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indendation

@chris-b1
Copy link
Contributor

LGTM, minus formatting nit.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you verify that there is an asv for this (and add one if not).

@jreback jreback added this to the 0.21.0 milestone Sep 20, 2017
@jbrockmendel
Copy link
Member Author

There is not, just added one.

@pep8speaks
Copy link

pep8speaks commented Sep 20, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 21, 2017 at 22:48 Hours UTC

@jbrockmendel
Copy link
Member Author

Looks like someone added one overnight, will add to_pydatetime timing to it.

@jbrockmendel
Copy link
Member Author

Is there an error message in the circleci log that I'm missing?

@jreback
Copy link
Contributor

jreback commented Sep 21, 2017

looks fine

you can click thru to circleci to see errors

@jbrockmendel
Copy link
Member Author

looks fine

you can click thru to circleci to see errors

I don't ask just because I want to waste your time. I'm saying I don't see any error message in the log.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2017

I don't ask just because I want to waste your time. I'm saying I don't see any error message in the log.

I know, but some people don't actually realize you can click thru. yeah it prob timed out or something.

Wait till I merge #17619 and rebase

@jreback jreback merged commit 49cfdd7 into pandas-dev:master Sep 22, 2017
@jreback
Copy link
Contributor

jreback commented Sep 22, 2017

thanks!

@jbrockmendel jbrockmendel deleted the to_pydatetime branch October 30, 2017 16:23
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants