-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Fix performance regression in infer_dtype #30202
Conversation
Fixes performance regression introduced in aaaac86
Profiling the following function (before/after): def slow_infer_dtype():
df = pd.DataFrame(np.ones((100000, 1000)))
for col in df.columns:
infer_dtype(df[col], skipna=True)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
We should maybe add a asv benchmark to ensure we catch such regression in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an asv benchmark (put in dtypes.py), ideally with a fair number of types that we can test
Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>
I seem to have some issues getting asv running. ASV consistently fails to compile the cython extensions. |
nice speedup! |
@groutr one more linting issue:
|
Were you able to get the benchmarks running? Roughly how long do your new ones take? For testing you can do |
@TomAugspurger oh, sorry, I didn't know we ran black on asv_bench too.
|
Ah one more error linting error @groutr, sorry :) You might want to setup pre-commit: https://dev.pandas.io/docs/development/contributing.html#python-pep8-black @datapythonista I don't recall, but before moving CI checks to GitHub actions, did we run all the checks and only report on failures at the end? So if there are both black and isort issues, you'd get both reported, instead of just the first failure? Sorry if that's a known issue. |
I'm quite sure that when we moved to github actions all checks were running even if they were failures, but that stopped working. I reported it to github one or two weeks ago, but haven't heard from them yet. |
@groutr can you show the asv for master vs this branch. |
EDIT: Please disregard this comment. I had written the benchmarking code incorrectly. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speedup! Can you add a whatsnew note to 1.0 performance improvements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
thanks |
Fixes major performance regression in
infer_dtype
introduced inaaaac86
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff