-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: For GH23814, return early in Categorical.__init__ #23888
Changes from 12 commits
5349052
a7df6ea
f7be8f3
c727003
2eaa8fd
eecede1
ae90f93
90af4a5
4f16c8b
89871a0
3e96734
f6d10b8
2173c89
325be92
9e270e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -314,6 +314,16 @@ class Categorical(ExtensionArray, PandasObject): | |
def __init__(self, values, categories=None, ordered=None, dtype=None, | ||
fastpath=False): | ||
|
||
# GH23814, for perf, if no optional params used and values already an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think we can just move this down to where the fastpath check is now; you can add this on i think. this constructor is already amazing too complicated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think at that point, the arg dtype, and maybe categories, will be set. I wanted to only use this early return if none of the optional args were specified (I believe @TomAugsperger was suggesting this in the issue thread). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @eoveson I would still like to investiagte consolidating some of this code. This is a very complicated constructor and more code is not great here. See if you can add it lower down, even if its slightly lower perf. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback , Ok let me look into this and see if I can consolidate some of the code.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, please check it out when you get a chance |
||
# instance of Categorical, simply return the same dtype/codes | ||
if categories is None and ordered is None and dtype is None: | ||
if isinstance(values, (ABCSeries, ABCIndexClass)): | ||
values = values._values | ||
if isinstance(values, type(self)): | ||
self._dtype = values.dtype | ||
self._codes = values.codes.copy() | ||
return | ||
|
||
# Ways of specifying the dtype (prioritized ordered) | ||
# 1. dtype is a CategoricalDtype | ||
# a.) with known categories, use dtype.categories | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say constructor rather than referring to
__init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback , updated doc string, and also added asv test that exercises the code (first one didn't, but left it since still useful) (you can see my comment about asv results to gfyoung)