-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@groupby failed on CategoricalString with missing value #278
Comments
Similar behavior found on
It fails with the same ERROR message:
This example works without any error if The same kind of error persists with simpler types of filtering.
Also fails with the same ERROR message. Conclusively, Thanks. |
Same for df |> @select(:price) or df |> @filter(true) (with the above So it seems that the problem occurs when transforming the |
This is actually unrelated to Query.jl, one can get the same error with: using IteratorInterfaceExtensions
it = getiterator(df)
first(it) The implementation for that lives in Tables.jl. The root problem though seems to be that both DataValues.jl and CategoricalArrays.jl add methods to I'm a bit lost what to do here, maybe @quinnj has an idea? Also CCing @nalimilan, the resident categorical expert. Is this a case that needs special casing in Tables.jl? |
I haven't been able to reproduce locally (even after updating all packages). Is there anything special I need to do? I have DataValues 0.4.12. |
Here is what I'm using:
And the code: julia> using DataFrames, IteratorInterfaceExtensions, DataValues
julia> df = DataFrame(fruit= ["Apple", missing, "Cherry", "Apple", "Banana", "Cherry"], price= [1.2, 2.0, 0.4, 1.2, 2.0, 0.4]);
julia> categorical!(df, :fruit);
julia> it = getiterator(df);
julia> first(it)
ERROR: MethodError: convert(::Type{DataValue{CategoricalString{UInt32}}}, ::CategoricalString{UInt32}) is ambiguous. Candidates:
convert(::Type{S}, x::T) where {S, T<:(Union{CategoricalString{R}, CategoricalValue{T,R} where T} where R)} in CategoricalArrays at C:\Users\david\.julia\packages\CategoricalArrays\xjesC\src\value.jl:91
convert(::Type{DataValue{T}}, x::T) where T in DataValues at C:\Users\david\.julia\packages\DataValues\XQWvG\src\scalar\core.jl:31
Possible fix, define
convert(::Type{DataValue{T<:(Union{CategoricalString{R}, CategoricalValue{T,R} where T} where R)}}, ::T<:(Union{CategoricalString{R}, CategoricalValue{T,R} where T} where R))
Stacktrace:
[1] macro expansion at C:\Users\david\.julia\packages\Tables\FXXeK\src\tofromdatavalues.jl:102 [inlined]
[2] iterate(::Tables.DataValueRowIterator{NamedTuple{(:fruit, :price),Tuple{DataValue{CategoricalString{UInt32}},Float64}},Tables.RowIterator{NamedTuple{(:fruit, :price),Tuple{CategoricalArray{Union{Missing, String},1,UInt32,String,CategoricalString{UInt32},Missing},Array{Float64,1}}}}}, ::Tuple{}) at C:\Users\david\.julia\packages\Tables\FXXeK\src\tofromdatavalues.jl:96 (repeats 2 times)
[3] first(::Tables.DataValueRowIterator{NamedTuple{(:fruit, :price),Tuple{DataValue{CategoricalString{UInt32}},Float64}},Tables.RowIterator{NamedTuple{(:fruit, :price),Tuple{CategoricalArray{Union{Missing, String},1,UInt32,String,CategoricalString{UInt32},Missing},Array{Float64,1}}}}}) at .\abstractarray.jl:342
[4] top-level scope at REPL[11]:1 Make sure you load |
Still no error with these versions: (v1.2) pkg> st
Status `~/.julia/environments/v1.2/Project.toml`
[6e4b80f9] BenchmarkTools v0.4.2
[336ed68f] CSV v0.5.0 [`~/.julia/dev/CSV`]
[324d7699] CategoricalArrays v0.5.5 [`~/.julia/dev/CategoricalArrays`]
[a93c6f00] DataFrames v0.19.2
[e7dc6d0d] DataValues v0.4.12
[82899510] IteratorInterfaceExtensions v1.0.0
[2dfb63ee] PooledArrays v0.5.2+ [`~/.julia/dev/PooledArrays`]
[1a8c2f83] Query v0.12.1
[295af30f] Revise v1.0.2
[bd369af6] Tables v0.2.6 |
@nalimilan: here is a reproducer repo that you can run with this: |
Thanks. So this is indeed a legitimate ambiguity which cannot be fixed separately in either package. I think @quinnj's JuliaLang/julia#32613 should help fixing this, but that won't work for current Julia versions. Maybe we should define that Do you think this is a new problem? I still don't understand why I didn't observe it in my original tests. |
Let's say, we try to
@groupby
on aCategoricalString
columnThe result is exactly as expected.
Next, we make the same column contain some missing values.
Then, if failed with an error message stating somewhat like this.
However, this behavior is weird since
String
column with similar missing values does not issue any complaints.This limitation may have to be fixed, since it is natural that
String
columns are recoded asCategorical
columns before beginning any analysis.Thanks.
The text was updated successfully, but these errors were encountered: