Explicit aesthetics / scales #505

jkrumbiegel · 2024-06-06T11:50:03Z

Problem description

Currently, AlgebraOfGraphics does not really have a concept of "aesthetics" as in ggplot, the logic is rather based around shared keyword arguments and conventional use of positional arguments. Arguments 1, 2 and 3 are expected to relate to the X, Y or Z axis. This is not true for lots of plots however, for example HLines has only one argument but it relates to the Y axis. BarPlot, RainCloud, Density, Violin, Errorbars, Rangebars and probably others have two different orientations, and what scales the arguments relate to is dependent on an attribute such as direction or orientation.

The only color attribute that's handled is color, but not others like scatter's strokecolor. This is also because the color handling assumed the related existence of attributes like colormap and colorrange which help transform numbers to colors on Makie's side. For better or worse, these often do not exist for other color attributes like strokecolor though. The only way to currently set these to colors is to pass a vector of them manually.

Another problem with the current implementation is that all layers sharing some variable in their mappings are assumed to be connected. So if you have a line plot with mapping(color = :A) but also a scatter plot with mapping(color = :B), then you will always get a merged legend with lines and scatters overlaid, even if the two are plotting disjoint sets of data and you'd rather prefer to have a separate legend for scatters and lines.

Related issues

These issues are either fixed directly by this PR, or this PR introduces a new way of solving the problems described therein:

#75
#97
#262
#329
#365
#385
#427
#434
#463
#469
#473
#487
#491
#504

Implemented solution

This PR internally introduces the notion of an Aesthetic, examples are AesX, AesY, AesColor, AesMarker and so on. These are decoupled from any specific keywords or argument positions and abstractly represent the visual effect of some plotting function argument. For example, the only argument of HLines has an effect on the AesY aesthetic.

Each plotting function now has to have a declared aesthetic_mapping. Here's an example for Violin, which flips the mapping of its positional arguments depending on the value of the orientation attribute. (Note that another new function mandatory_attributes is used to declare attributes that are strictly necessary to resolve the aesthetic mapping, so AlgebraOfGraphics requires these to be set statically and not pulled in via the theme, as the theme should not semantically change the plots.)

function aesthetic_mapping(::Type{Violin})
    dictionary([
        1 => :orientation => dictionary([
            :horizontal => AesX,
            :vertical => AesY,
        ]),
        2 => :orientation => dictionary([
            :horizontal => AesY,
            :vertical => AesX,
        ]),
        :color => AesColor,
    ])
end

Internally, the fitting of categorical or continuous scales is now routed through these aesthetics. This means the orientation keyword for Violin now has the expected effect on the x and y axes:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :horizontal) |> draw

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :vertical) |> draw

We can further combine the Violin plot with an HLine plot to mark certain positions of interest, however when we add a color mapping to get a legend entry, the categories of Violin and HLine merge:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type) *
    visual(HLines) |> draw

This can now be handled by separating the two color scales. For this purpose, the scale function can be used to define an identifier, which can then be associated with a mapped variable by extending the => mechanism with a fourth possible option. Note how the legend splits now that the HLines color is mapped to the :second_color scale identifier:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |> draw

While the legend is now adequately split, both color scales use the same default colormap. The old system which relied on passing palettes to the palette keyword, keyed by plotting function arguments, cannot handle this problem. Therefore, a new option to draw called scales is introduced, which allows to pass certain options keyed by the default or custom identifiers for each scale (default identifiers are X, Y, Color, and others, capitalized to show that they are not directly mirroring the keywords like color but rather relate to abstract aesthetics).

Here we pass a one-element palette containing only the color red for our new second_color scale:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (; second_color = (; palette = ["red"])))

Note that this mechanism also allows to change other attributes like scale labels. We can make use of that to define a label for the y axis, which is unlabelled because Violin and HLines plot different columns there (in principle we could have overridden the axis attribute ylabel here, but this new mechanism works the same across all scales, so it is preferable) .

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (;
        second_color = (; palette = ["red"]),
        Y = (; label = "A custom Y label"),
    ))

The new implementation removes some hacks around the handling of unusual plot types like Heatmap, which uses its third positional argument for color. Aside from an aesthetic mapping which maps argument 3 to the AesColor aesthetic, this also required to rewrite the pipeline to avoid early rescaling of input data. While AesColor columns will by default be converted to a Vector{RGBAf}, Heatmap can currently not handle this input so the conversion has to be handled instead by passing colormap and colorrange keywords. Each plot type can define custom to_entry methods in order to compute the plot specification given the raw input data and fitted scales. By default, entries will be passed the aesthetic-converted columns which now makes it possible to use strokecolor in a mapping for Scatter, for example:

data((; x = 1:4, y = 5:8, z = ["A", "B", "C", "D"])) *
    mapping(:x, :y, strokecolor = :z) *
    visual(Scatter, strokewidth = 5, markersize = 30, color = :transparent) |>
    x -> draw(x, scales(Color = (; palette = :tab20)))

Another benefit of being able to address scales directly, is the ability to override category values and labels. Currently, one can only use sorter and renamer in mapping to bring categorical values into a certain order and change their labels. However, this is more difficult if multiple mappings are merged where the merged categories cannot be sorted together, or for the case where not all categories that are supposed to be shown are present in the data.

Now, there's a category property with which one can override domain, ordering and labels in one go, while also accessing more flexible label types like LaTeXStrings or rich text:

data((; sex = repeat(["m", "f"], 10), weight = rand(20))) *
    mapping(:sex, :weight, color = :sex) *
    visual(Scatter) |> x -> draw(x, scales(
        X = (;
            categories = ["m" => "male", "d" => L"\sum{diverse}", "f" => rich("female", color = :green)],
        ),
        Color = (;
            categories = [ "f" => rich("female", color = :green), "m" => "male", "d" => L"\sum{diverse}"],
            palette = ["red", "green", "blue"]
        ),
    ))

For the x and y axes, the "palette" can be overridden, too, in order to introduce visual groupings:

data((; category = rand(["A1", "A2", "B1", "B2", "B3", missing], 1000))) *
    mapping(:category) *
    frequency() |>
    x -> draw(x, scales(X = (; palette = [1, 2, 3, 5, 6, 8])))

Discussion points

It's maybe a bit confusing that scales = (; color = (; ... does not mean mapping(color = ...) but it means AesColor (there's a lookup happening internally from symbol to Aesthetic. The problem is that I wanted to keep the generic dict-like configuration structure, so symbols as keys. Maybe it could be scales = (; Color = ... to signify that it's something different.
~~What about multiple signatures for plotting functions, like errorbars having either symmetrical or asymmetrical bars?~~
What about plotting continuous data on top of categorical? Should it be allowed in a "you're responsible" kind of way? It seems useful enough in some scenarios, for example plotting annotation text between categories.

TODOs

decide what to do with continuous data plotted onto otherwise categorical scales (currently works but should it?)
tighten interface around options passable to scales, currently invalid keywords will be ignored there
think about binned scales and related problems, for example contourf doesn't fit into the current scheme
fix old docs
write new docs
fix old tests
add tests for new functionality

… it using `visual`

jkrumbiegel added 30 commits May 27, 2024 11:19

introduce positional_mapping field in ProcessedLayer and populate…

d4f08e1

… it using `visual`

fix dictionary specification

1487341

remap positional args where necessary

d555e31

add violin

835d447

add fallback and HLines

759a2ad

also remap labels

c07b4ca

remove positional mapping again

7a471d3

use VisualScale types as keys for scales

0f06183

use visual scales when determining axis labels

f9af04d

first ColorScale

e31af7f

fix continuous colorbar

d1a95eb

determine legendable scales via types

be7dec7

fix layout facetting

e30547e

refactor barplot and violin visual mappings

48b890a

rename VisualScale to Aesthetic to not conflict with cat/cont scales

bf14743

refactor dict type into const

d8f3cac

refactor into MultiAesScaleDict to prepare multiple Aes of same type

e713eb4

add scale_mapping to ProcessedLayer and ScaleID to dissociate

8f3f2ea

add scatter mapping

be83bdb

introduce new legend approach requiring ProcessedLayers in AxisEntries

b50b94b

fix colorbar again

c09184b

fix wrapped layout

b82ef96

fix violin legend

662eee9

remove shows

c016b64

remove display

fa868ee

fix row/col layout

c55e1b7

add scatter marker support back

79413bb

introduce a way to style separate scales

4296e9c

allow specifying scale properties just by symbol

15dc3ad

use symbols to specify default and named scales

41c0efa

jkrumbiegel added 18 commits July 15, 2024 20:46

add visual docstring

244a0ee

add mapping without data example

5e1422f

fix scalar-only mapping without data

144c04b

add quotes to fix yaml parsing

5c64e35

still problems with title format, try without backticks

0f03329

add reference tests

8ea51a3

add more reference tests

5b1d591

qqplot test

3967d55

add DelimitedFiles to test deps

65c846c

arrows reftest

9eb6051

avoid double CI runs on push to PR

1b85be0

cancel in progress

b45f387

remove allequal for 1.6 compat

93fe05c

expectation ref image

5be3528

remove dead code

f18feb2

Add release notes

a79a61b

copy NEWS.md to docs

ab2d500

fix path

3f79402

jkrumbiegel removed a link to an issue Jul 16, 2024

renamer and LaTeXStrings #463

Open

This was linked to issues Jul 16, 2024

Accept a uer-defined domain of variable #469

Closed

[FR] Errorbars #487

Open

jkrumbiegel merged commit e5c05cf into master Jul 16, 2024
4 checks passed

sethaxen mentioned this pull request Aug 11, 2024

Missing aesthetic mapping for Density and ECDFPlot #520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit aesthetics / scales #505

Explicit aesthetics / scales #505

jkrumbiegel commented Jun 6, 2024 •

edited

Loading

Explicit aesthetics / scales #505

Explicit aesthetics / scales #505

Conversation

jkrumbiegel commented Jun 6, 2024 • edited Loading

Problem description

Related issues

Implemented solution

Discussion points

TODOs

jkrumbiegel commented Jun 6, 2024 •

edited

Loading