-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Categoricals shouldn't allow non-strings when object dtype is passed (#13919) #14047
Conversation
you already have #14027 you don't need a 2nd |
|
||
# this however will raise as cannot be sorted | ||
self.assertRaises( | ||
TypeError, lambda: Categorical.from_array(arr, ordered=True)) | ||
|
||
def test_constructor_object_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add tests:
- unicode (for py2, include 2 bytes)
- str and unicode mixed (for py2, it's often the case in asian countries)
- bool (not sure it is used as category)
- period with mixed freq
I think some of them can't be covered because of infer_dtype
spec and current impl.
@@ -191,6 +192,8 @@ class Categorical(PandasObject): | |||
If an explicit ``ordered=True`` is given but no `categories` and the | |||
`values` are not sortable. | |||
|
|||
If an `object` dtype is passed and `values` contains dtypes other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixed dtypes
This change will always fail to build because it conflicts with the way Regardless of the way I implement it, ensuring that Categoricals aren't mixed dtypes will always raise when a MultiIndex is constructed from some listlike with mixed dtypes :\ Do I disregard this and keep working/pushing to this PR? |
you would have 2 show an example |
#13854 refactors those routines build your fix on top of this PR |
The refactored routines still call |
can you rebase / update? |
Categoricals should never be mixed type if you have an example pls show it |
that should raise its not valid |
@jreback So back to my original problem. I can't think of a clean way to allow My immediate idea is to pass in some parameter to indicate that a MultiIndex is being created, but as you said in another one of my PRs -- you can't pass in an extra parameter to determine state |
@jreback from a Personally, I don't think we should check this at Categorical construction, I would rather check for this in the hdf code itself. |
git diff upstream/master | flake8 --diff
Why this change is needed: Categorical variables are by definition single types, so to allow them to take on different types of values is misleading. Object dtypes should only be allowed when ALL strings or ALL periods are passed (due to the way there are handled internally).
The result of this PR will raise a
TypeError
when a categorical is created that has an object dtype but doesn't contain allstring
or allperiod
values.