-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: changed default value of cache parameter to True in to_datetime function #26043
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26043 +/- ##
==========================================
- Coverage 91.9% 91.89% -0.01%
==========================================
Files 175 175
Lines 52485 52485
==========================================
- Hits 48235 48230 -5
- Misses 4250 4255 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26043 +/- ##
==========================================
+ Coverage 91.79% 91.86% +0.07%
==========================================
Files 180 179 -1
Lines 50934 50728 -206
==========================================
- Hits 46753 46600 -153
+ Misses 4181 4128 -53
Continue to review full report at Codecov.
|
@jreback apparently, I didn't understand enough last time about the reasons for the problems in CI builds #25990 (comment) and the problem was different (maybe in the master, from which I made my branch) Asv's result will be later. |
One downside of changing the default to Additionally, can you show ASVs/performance benchmarks for converting a small number of arguments? I suspect there will be a performance with a small amount of arguments and |
For a vision of the situation as a whole:
@mroeschke I'll see what can be done with #22305. It seems that I ran into another kind of error, has it been mentioned before? Under a small number of arguments you understand 10 - 100? |
What other error did you run into? And sure around 50 argument where there are no duplicate arguments to parse. |
not that branch, therefore I will do |
29530f4
to
317267c
Compare
@mroeschke I have provided a workaround for #22305 in #26078 PR. Can you see? |
First of all, asv test: class ToDatetimeCacheSmallCount(object):
params = [True, False]
param_names = ['cache']
def setup(self, cache):
N = 50
rng = date_range(start='1/1/2000', periods=N)
self.unique_date_strings = rng.strftime('%Y-%m-%d').tolist()
def time_unique_date_strings(self, cache):
to_datetime(self.unique_date_strings, cache=cache)
Also I decide perform tests with 5000 elements(to be more confident in numbers)
|
I do not know yet what the error is and I want first to do rebase from master |
317267c
to
d45c434
Compare
When I run |
can you merge master and update |
d45c434
to
12eac47
Compare
What is the performance impact for the (rather typical I think) case with all unique datetimes? @anmyachev can you provide some timings for that? (or point to the benchmark result in one of your previous comments that represent that case) |
See also comment of @TomAugspurger on the read_csv PR: #25990 (comment) |
@jorisvandenbossche result for case with all unique datetimes (decrease in performance about 2 times):
Benchmark for asv: class ToDatetimeCacheSmallCount(object):
params = ([True, False], [50, 500, 5000, 100000])
param_names = ['cache', 'count']
def setup(self, cache, count):
rng = date_range(start='1/1/1971', periods=count)
self.unique_date_strings = rng.strftime('%Y-%m-%d').tolist()
def time_unique_date_strings(self, cache, count):
to_datetime(self.unique_date_strings, cache=cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a whatsnew note in the performance section
is this orthogonal to #26097 |
Not sure... that PR is trying to address a bug that manifests when |
12eac47
to
81f54c0
Compare
small comment, pls merge master and ping on green. |
5210268
to
98e18a8
Compare
lgtm. ping on green. |
Hello @anmyachev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
kwds['cache_dates'] = do_cache | ||
read_csv(self.data(self.StringIO_input), header=None, | ||
parse_dates=[0], **kwds) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger ok method of handling ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although... I worry it would incorrectly catch a TypeError in the function? The other way might be to check pandas.__version__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, let me see what i can do
patched an edge case that was showing up the asvs |
thanks @anmyachev |
git diff upstream/master -u -- "*.py" | flake8 --diff