Interface for metadata (including per-column) #176

nalimilan · 2020-06-03T11:55:37Z

Assuming we agree at JuliaData/DataAPI.jl#22 on a common framework for general metadata that can be attached to any object, this issue is about defining an interface within that framework so that Table objects can provide and retrieve metadata. This needs some design in particular for per-column metadata, as implementations need to be able to know which value corresponds to which column, for example to be able to do hcat(df1, df2) preserving column metadata.

Broadly speaking two approaches can be considered:

Store all metadata at the table level. By default metadata fields refer to the whole table (e.g. year of data collection). Per-column metadata fields can be identified with a special prefix like #Tables#, and they are required to be AbstractDict{Symbol}-like objects with keys referring to column names. Convenience functions can be provided (in Tables.jl or by implementations) to avoid the need for users to see this prefix.
Store metadata referring to the whole table on the table, and metadata referring to specific columns to the column object. This approach has the advantage that metadata moves automatically with the column, but it doesn't work for row-oriented tables.

See also discussion for the DataFrames implementation at JuliaData/DataFrames.jl#2276.

Cc: @bkamins @pdeffebach @quinnj

The text was updated successfully, but these errors were encountered:

bkamins · 2020-06-03T13:06:58Z

So my preference is for option 1, exactly because we are not guaranteed that a table even has an object representing columns.

Now for the metadata referring to columns stored on table level I think that there just be some convention agreed for their naming scheme, but still we would not e.g. throw errors when these things go out of sync. It is up to the user to manage this if the user wants it handled.

The only way in which these "column" related metadata would be special is how they are handled when mixing tables e.g. via hcat, vcat or joins (as this is the only place where this is relevant I think). There are probably several merging rules but the distinction is that:

normal metadata when conflicting would be discarded, or last one taken (or whatever rule we decide on, by default merge in Base takes the last one)
column level "special" metadata would on the other hand be merged recursively, so e.g. if we get #Tables#label (#Tables# prefix is tentative) key in both tables then we merge the dict-like values that they point to (instead of dropping them or taking the last one - as we would do with normal metadata).

The proposal of this nested nature is because user might want to attach many types of metadata to columns.

bkamins · 2023-08-03T11:16:48Z

Closing, as this is done.

nalimilan mentioned this issue Jun 3, 2020

metadata method JuliaData/DataAPI.jl#22

Closed

bkamins closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface for metadata (including per-column) #176

Interface for metadata (including per-column) #176

nalimilan commented Jun 3, 2020 •

edited

Loading

bkamins commented Jun 3, 2020

bkamins commented Aug 3, 2023

Interface for metadata (including per-column) #176

Interface for metadata (including per-column) #176

Comments

nalimilan commented Jun 3, 2020 • edited Loading

bkamins commented Jun 3, 2020

bkamins commented Aug 3, 2023

nalimilan commented Jun 3, 2020 •

edited

Loading