Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding levels in CategoricalValue call #364

Open
bkamins opened this issue Sep 5, 2021 · 4 comments
Open

Adding levels in CategoricalValue call #364

bkamins opened this issue Sep 5, 2021 · 4 comments

Comments

@bkamins
Copy link
Member

bkamins commented Sep 5, 2021

@nalimilan In the following example:

julia> x = categorical([1,2,3])
3-element CategoricalArray{Int64,1,UInt32}:
 1
 2
 3

julia> CategoricalValue(3, x)
CategoricalValue{Int64, UInt32} 3

julia> CategoricalValue(4, x)
ERROR: ArgumentError: level 4 not found in source pool

The question is if maybe we should allow CategoricalValue(4, x) to add level to pool of x (if this would be allowed in assignment to x) rather than erroring?

I am not fully convinced though.

@bkamins
Copy link
Member Author

bkamins commented Sep 5, 2021

Alternatively maybe we could allow categorical(scalar) = CategoricalValue(scalar, categorical([scalar])) (+ adding kwarg support I am writing here for brevity)

@nalimilan
Copy link
Member

nalimilan commented Sep 5, 2021

Did this came up in a real use case? Throwing an error has the advantage of being explicit. Otherwise an ordered pool would be made unordered when adding the level, so if you're calling < you'd get an error immediately, but the pool would have been mutated, which can be annoying.

categorical(scalar) is something that we discussed it before IIRC. I guess it could be useful to fix #363 if we allowed x[1] < categorical(scalar) to work for any pool, as there wouldn't be transitivity issues given that categorical(scalar1) < categorical(scalar2) would not be allowed. Is that the kind of thing you had in mind? EDIT: this would actually break transitivity when comparing categorical(scalar) to values from different pools, except if we remembered the pool the first time a comparison is made so that we throw an error for subsequent comparisons with other pools.

@bkamins
Copy link
Member Author

bkamins commented Sep 5, 2021

Did this came up in a real use case?

The use case is in JuliaData/DataFrames.jl#2828 where you now have to write e.g.:

unstack(df, :variable, :value, fillvalue=CategoricalValue(0, categorical([0])))

to keep the unstacked columns categorical.

The question is if maybe we should allow CategoricalValue(4, x) to add level to pool of x

After thinking about it I am not a fan of this

categorical(scalar) is something that we discussed it before IIRC

Yes, and there is no rush to decide what we should do as these things are tricky. My intention was just to have a shorter version of CategoricalValue(scalar, categorical([scalar])) which is verbose and non-obvious.

@nalimilan
Copy link
Member

Another situation where it could be useful is e.g. in [v == "a" ? categorical("b") : v for v in cat_array], to ensure that the result is categorical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants