Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit aesthetics / scales #505

Merged
merged 178 commits into from
Jul 16, 2024
Merged

Explicit aesthetics / scales #505

merged 178 commits into from
Jul 16, 2024

Conversation

jkrumbiegel
Copy link
Member

@jkrumbiegel jkrumbiegel commented Jun 6, 2024

Problem description

Currently, AlgebraOfGraphics does not really have a concept of "aesthetics" as in ggplot, the logic is rather based around shared keyword arguments and conventional use of positional arguments. Arguments 1, 2 and 3 are expected to relate to the X, Y or Z axis. This is not true for lots of plots however, for example HLines has only one argument but it relates to the Y axis. BarPlot, RainCloud, Density, Violin, Errorbars, Rangebars and probably others have two different orientations, and what scales the arguments relate to is dependent on an attribute such as direction or orientation.

The only color attribute that's handled is color, but not others like scatter's strokecolor. This is also because the color handling assumed the related existence of attributes like colormap and colorrange which help transform numbers to colors on Makie's side. For better or worse, these often do not exist for other color attributes like strokecolor though. The only way to currently set these to colors is to pass a vector of them manually.

Another problem with the current implementation is that all layers sharing some variable in their mappings are assumed to be connected. So if you have a line plot with mapping(color = :A) but also a scatter plot with mapping(color = :B), then you will always get a merged legend with lines and scatters overlaid, even if the two are plotting disjoint sets of data and you'd rather prefer to have a separate legend for scatters and lines.

Related issues

These issues are either fixed directly by this PR, or this PR introduces a new way of solving the problems described therein:

#75
#97
#262
#329
#365
#385
#427
#434
#463
#469
#473
#487
#491
#504

Implemented solution

This PR internally introduces the notion of an Aesthetic, examples are AesX, AesY, AesColor, AesMarker and so on. These are decoupled from any specific keywords or argument positions and abstractly represent the visual effect of some plotting function argument. For example, the only argument of HLines has an effect on the AesY aesthetic.

Each plotting function now has to have a declared aesthetic_mapping. Here's an example for Violin, which flips the mapping of its positional arguments depending on the value of the orientation attribute. (Note that another new function mandatory_attributes is used to declare attributes that are strictly necessary to resolve the aesthetic mapping, so AlgebraOfGraphics requires these to be set statically and not pulled in via the theme, as the theme should not semantically change the plots.)

function aesthetic_mapping(::Type{Violin})
    dictionary([
        1 => :orientation => dictionary([
            :horizontal => AesX,
            :vertical => AesY,
        ]),
        2 => :orientation => dictionary([
            :horizontal => AesY,
            :vertical => AesX,
        ]),
        :color => AesColor,
    ])
end

Internally, the fitting of categorical or continuous scales is now routed through these aesthetics. This means the orientation keyword for Violin now has the expected effect on the x and y axes:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :horizontal) |> draw
image
data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :vertical) |> draw
image

We can further combine the Violin plot with an HLine plot to mark certain positions of interest, however when we add a color mapping to get a legend entry, the categories of Violin and HLine merge:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type) *
    visual(HLines) |> draw
image

This can now be handled by separating the two color scales. For this purpose, the scale function can be used to define an identifier, which can then be associated with a mapped variable by extending the => mechanism with a fourth possible option. Note how the legend splits now that the HLines color is mapped to the :second_color scale identifier:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |> draw
image

While the legend is now adequately split, both color scales use the same default colormap. The old system which relied on passing palettes to the palette keyword, keyed by plotting function arguments, cannot handle this problem. Therefore, a new option to draw called scales is introduced, which allows to pass certain options keyed by the default or custom identifiers for each scale (default identifiers are X, Y, Color, and others, capitalized to show that they are not directly mirroring the keywords like color but rather relate to abstract aesthetics).

Here we pass a one-element palette containing only the color red for our new second_color scale:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (; second_color = (; palette = ["red"])))
image

Note that this mechanism also allows to change other attributes like scale labels. We can make use of that to define a label for the y axis, which is unlabelled because Violin and HLines plot different columns there (in principle we could have overridden the axis attribute ylabel here, but this new mechanism works the same across all scales, so it is preferable) .

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (;
        second_color = (; palette = ["red"]),
        Y = (; label = "A custom Y label"),
    ))
image

The new implementation removes some hacks around the handling of unusual plot types like Heatmap, which uses its third positional argument for color. Aside from an aesthetic mapping which maps argument 3 to the AesColor aesthetic, this also required to rewrite the pipeline to avoid early rescaling of input data. While AesColor columns will by default be converted to a Vector{RGBAf}, Heatmap can currently not handle this input so the conversion has to be handled instead by passing colormap and colorrange keywords. Each plot type can define custom to_entry methods in order to compute the plot specification given the raw input data and fitted scales. By default, entries will be passed the aesthetic-converted columns which now makes it possible to use strokecolor in a mapping for Scatter, for example:

data((; x = 1:4, y = 5:8, z = ["A", "B", "C", "D"])) *
    mapping(:x, :y, strokecolor = :z) *
    visual(Scatter, strokewidth = 5, markersize = 30, color = :transparent) |>
    x -> draw(x, scales(Color = (; palette = :tab20)))
image

Another benefit of being able to address scales directly, is the ability to override category values and labels. Currently, one can only use sorter and renamer in mapping to bring categorical values into a certain order and change their labels. However, this is more difficult if multiple mappings are merged where the merged categories cannot be sorted together, or for the case where not all categories that are supposed to be shown are present in the data.

Now, there's a category property with which one can override domain, ordering and labels in one go, while also accessing more flexible label types like LaTeXStrings or rich text:

data((; sex = repeat(["m", "f"], 10), weight = rand(20))) *
    mapping(:sex, :weight, color = :sex) *
    visual(Scatter) |> x -> draw(x, scales(
        X = (;
            categories = ["m" => "male", "d" => L"\sum{diverse}", "f" => rich("female", color = :green)],
        ),
        Color = (;
            categories = [ "f" => rich("female", color = :green), "m" => "male", "d" => L"\sum{diverse}"],
            palette = ["red", "green", "blue"]
        ),
    ))
image

For the x and y axes, the "palette" can be overridden, too, in order to introduce visual groupings:

data((; category = rand(["A1", "A2", "B1", "B2", "B3", missing], 1000))) *
    mapping(:category) *
    frequency() |>
    x -> draw(x, scales(X = (; palette = [1, 2, 3, 5, 6, 8])))
image

Discussion points

  • It's maybe a bit confusing that scales = (; color = (; ... does not mean mapping(color = ...) but it means AesColor (there's a lookup happening internally from symbol to Aesthetic. The problem is that I wanted to keep the generic dict-like configuration structure, so symbols as keys. Maybe it could be scales = (; Color = ... to signify that it's something different.
  • What about multiple signatures for plotting functions, like errorbars having either symmetrical or asymmetrical bars?
  • What about plotting continuous data on top of categorical? Should it be allowed in a "you're responsible" kind of way? It seems useful enough in some scenarios, for example plotting annotation text between categories.

TODOs

  • decide what to do with continuous data plotted onto otherwise categorical scales (currently works but should it?)
  • tighten interface around options passable to scales, currently invalid keywords will be ignored there
  • think about binned scales and related problems, for example contourf doesn't fit into the current scheme
  • fix old docs
  • write new docs
  • fix old tests
  • add tests for new functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants