Support Vega-Lite's optional encoding types #2584

joelostblom · 2022-03-30T15:18:25Z

Since 4.14, the encoding type is optional in Vega-Lite and inferred according to some simple heuristics if not given explicitly. Altair raises an error if there is no type provided, but maybe we can remove this check now and just let Vega-Lite handle missing types? This could also make error such as misspelling a data frame column name more clear in Altair (which now raises the "field specified without type" error).

Example:

import altair as alt

data = alt.Data(values=[{'x': 'A', 'y': 5},
                        {'x': 'B', 'y': 3},
                        {'x': 'C', 'y': 6},
                        {'x': 'D', 'y': 7},
                        {'x': 'E', 'y': 2}])
alt.Chart(data).mark_bar().encode(
    x='x',
    y='y:Q',
)

ValueError: x encoding field is specified without a type; the type cannot be automatically inferred because the data is not specified as a pandas.DataFrame.

Although the VegaLite spec is valid and produces a sensible figure in this case:

{
  "config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
  "data": {
    "values": [
      {"x": "A", "y": 5},
      {"x": "B", "y": 3},
      {"x": "C", "y": 6},
      {"x": "D", "y": 7},
      {"x": "E", "y": 2}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "x"},
    "y": {"field": "y", "type": "quantitative"}
  },
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}

Open the Chart in the Vega Editor

The text was updated successfully, but these errors were encountered:

jakevdp · 2022-03-30T17:18:48Z

Thanks for raising this! That would be great.

One choice we have to make is whether to continue inferring the dtype from pandas dataframes, or just leave all type inference to Vega-Lite. I lean toward the latter, so that the behavior will be the same regardless of how the data is specified. What do you think?

joelostblom · 2022-03-30T18:25:18Z

I can see benefits of both approaches, but overall I am leaning towards keeping (and extending) the support for pandas data types. If Vega-Lite would be able to infer quantitative and temporal data, then I would be more in favor of relying on its type inference (vega/vega-lite#8081). Here are my thoughts in more detail:

As you said, it would be nice with a consistent syntax regardless of the data source. On the other hand, I think the Vega-Lite type inference is still not on par with what Altair does via pandas, particularly since it is using nominal as the default for all non-aggregated fields, which means that there would be a lot of :Q typing.
I think it is easier to explain that Altair "understands the data type used in pandas" instead of explaining the default rules in Vega-Lite; especially novices might be somewhat intimidated by this:
With the Vega-Lite type inference, it might be confusing when one needs to be explicit about the data type. Now it is easy: "never" if using pandas. Here I could see an argument for requiring "always regardless of data source" since being explicit about the data types might cause people to think more about what they are trying to visualize, but that would also be slightly less convenient to type out.
I think it would be nice to extend support for Altair data types to also include categorical ordering (my attempt in Represent pandas ordered categoricals as ordinal data #2522), since this would make it even more seamless to use pandas with Altair.

joelostblom · 2022-03-30T18:28:39Z

To be clear, I still think it would be a big benefit to support the default Vega-Lite typing inference outside of pandas and I think it would enable us to have a clearer error message for typos in column names when using Altair.

ChristopherDavisUCI · 2022-12-31T17:57:04Z

I was thinking about this a little and unfortunately I don't see a great option. I tried deleting the part of the Altair code that raises an error if there's no type, and for example using data.cars.url vs data.cars() drastically changes the chart.

joelostblom · 2023-01-03T10:31:48Z

That's a good point, in your example it would be difficult to tell what went wrong in the first chart and it would not be intuitive that a change of the type is needed when using the URL since it is not when using the dataframe. If we go ahead with making a change here, we might need to handle URLs and dataframes differently and always require types for URLs still. That could still be worthwhile if it would clear up the error messages.

ChristopherDavisUCI · 2023-02-15T18:28:43Z

I just want to make note of two comments by @mattijn that are possibly relevant to this discussion:
#2868 (comment)
and
#2868 (comment)

joelostblom added the enhancement label Mar 30, 2022

joelostblom mentioned this issue Mar 16, 2023

should the use of timeUnit prevent type: "nominal"? #2971

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Vega-Lite's optional encoding types #2584

Support Vega-Lite's optional encoding types #2584

joelostblom commented Mar 30, 2022

jakevdp commented Mar 30, 2022 •

edited

Loading

joelostblom commented Mar 30, 2022

joelostblom commented Mar 30, 2022

ChristopherDavisUCI commented Dec 31, 2022

joelostblom commented Jan 3, 2023

ChristopherDavisUCI commented Feb 15, 2023

Support Vega-Lite's optional encoding types #2584

Support Vega-Lite's optional encoding types #2584

Comments

joelostblom commented Mar 30, 2022

jakevdp commented Mar 30, 2022 • edited Loading

joelostblom commented Mar 30, 2022

joelostblom commented Mar 30, 2022

ChristopherDavisUCI commented Dec 31, 2022

joelostblom commented Jan 3, 2023

ChristopherDavisUCI commented Feb 15, 2023

jakevdp commented Mar 30, 2022 •

edited

Loading