You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming we agree at JuliaData/DataAPI.jl#22 on a common framework for general metadata that can be attached to any object, this issue is about defining an interface within that framework so that Table objects can provide and retrieve metadata. This needs some design in particular for per-column metadata, as implementations need to be able to know which value corresponds to which column, for example to be able to do hcat(df1, df2) preserving column metadata.
Broadly speaking two approaches can be considered:
Store all metadata at the table level. By default metadata fields refer to the whole table (e.g. year of data collection). Per-column metadata fields can be identified with a special prefix like #Tables#, and they are required to be AbstractDict{Symbol}-like objects with keys referring to column names. Convenience functions can be provided (in Tables.jl or by implementations) to avoid the need for users to see this prefix.
Store metadata referring to the whole table on the table, and metadata referring to specific columns to the column object. This approach has the advantage that metadata moves automatically with the column, but it doesn't work for row-oriented tables.
So my preference is for option 1, exactly because we are not guaranteed that a table even has an object representing columns.
Now for the metadata referring to columns stored on table level I think that there just be some convention agreed for their naming scheme, but still we would not e.g. throw errors when these things go out of sync. It is up to the user to manage this if the user wants it handled.
The only way in which these "column" related metadata would be special is how they are handled when mixing tables e.g. via hcat, vcat or joins (as this is the only place where this is relevant I think). There are probably several merging rules but the distinction is that:
normal metadata when conflicting would be discarded, or last one taken (or whatever rule we decide on, by default merge in Base takes the last one)
column level "special" metadata would on the other hand be merged recursively, so e.g. if we get #Tables#label (#Tables# prefix is tentative) key in both tables then we merge the dict-like values that they point to (instead of dropping them or taking the last one - as we would do with normal metadata).
The proposal of this nested nature is because user might want to attach many types of metadata to columns.
Assuming we agree at JuliaData/DataAPI.jl#22 on a common framework for general metadata that can be attached to any object, this issue is about defining an interface within that framework so that Table objects can provide and retrieve metadata. This needs some design in particular for per-column metadata, as implementations need to be able to know which value corresponds to which column, for example to be able to do
hcat(df1, df2)
preserving column metadata.Broadly speaking two approaches can be considered:
#Tables#
, and they are required to beAbstractDict{Symbol}
-like objects with keys referring to column names. Convenience functions can be provided (in Tables.jl or by implementations) to avoid the need for users to see this prefix.See also discussion for the DataFrames implementation at JuliaData/DataFrames.jl#2276.
Cc: @bkamins @pdeffebach @quinnj
The text was updated successfully, but these errors were encountered: