Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addtional Missing gets injected into Schema #373

Open
Moelf opened this issue Jan 6, 2023 · 3 comments · May be fixed by #477
Open

Addtional Missing gets injected into Schema #373

Moelf opened this issue Jan 6, 2023 · 3 comments · May be fixed by #477

Comments

@Moelf
Copy link
Contributor

Moelf commented Jan 6, 2023

In the following example, the column doesn't have Missing but after a round trip through Arrow it gained Missing in the Union

julia> rnt.vector_variant_int64_string
5-element RNTupleField{Vector{Union{Int64, String}}}:
 Union{Int64, String}["one"]
 Union{Int64, String}["one", 2]
 Union{Int64, String}["one", 2, 3]
 Union{Int64, String}["one", 2, 3, 4]
 Union{Int64, String}["one", 2, 3, 4, 5]

julia> DataFrame(rnt).vector_variant_int64_string
5-element Vector{Vector{Union{Int64, String}}}:
 ["one"]
 ["one", 2]
 ["one", 2, 3]
 ["one", 2, 3, 4]
 ["one", 2, 3, 4, 5]

julia> Arrow.write(a, DataFrame(rnt))
"/tmp/jl_Lk5W1G92XO"

julia> Arrow.Table(a)
Arrow.Table with 5 rows, 13 columns, and schema:
 :string                       String
 :vector_int32                 Vector{Int32} (alias for Array{Int32, 1})
 :array_float                  Vector{Float32} (alias for Array{Float32, 1})
 :vector_vector_int32          Vector{Vector{Int32}} (alias for Array{Array{Int32, 1}, 1})
 :vector_string                Vector{String} (alias for Array{String, 1})
 :vector_vector_string         Vector{Vector{String}} (alias for Array{Array{String, 1}, 1})
 :variant_int32_string         Union{Missing, Int32, String}
 :vector_variant_int64_string  Vector{Union{Missing, Int64, String}} (alias for Array{Union{Missing, Int64, String}, 1})
 :tuple_int32_string           NamedTuple{(:_0, :_1), Tuple{Int32, String}}
 :pair_int32_string            NamedTuple{(:_0, :_1), Tuple{Int32, String}}
 :vector_tuple_int32_string    Vector{NamedTuple{(:_0, :_1), Tuple{Int32, String}}} (alias for Array{NamedTuple{(:_0, :_1), Tuple{Int32, String}}, 1})
 :lorentz_vector               NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}
 :array_lv                     Vector{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}} (alias for Array{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}, 1})
@quinnj
Copy link
Member

quinnj commented Jan 7, 2023

Hmmmm, yes, I think I remember that for the Union types, the arrow spec makes it hard because it always allows nulls, so we default to including Missing in the Union to account for this. We can/should figure out how to do this cleaner though.

@Moelf
Copy link
Contributor Author

Moelf commented Jan 8, 2023

I see, but for a column of Vector{Union{T, T2}} you don't need Missing right? because the empty element would just be a

Union{T,T2][]

for example, :vector_variant_int64_string

@Moelf
Copy link
Contributor Author

Moelf commented Feb 1, 2024

Correctness bug, bump?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants