Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colormap doesn't work on lines #329

Closed
haberdashPI opened this issue Jan 26, 2022 · 22 comments · Fixed by #505
Closed

Colormap doesn't work on lines #329

haberdashPI opened this issue Jan 26, 2022 · 22 comments · Fixed by #505

Comments

@haberdashPI
Copy link
Contributor

haberdashPI commented Jan 26, 2022

At the very least, the obvious thing you would want to do (set colormap in visual) doesn't seem to work.

x = rand(100, 4)
df = DataFrame(t = repeat(axes(x, 1), outer=4), x = vec(x), kind=repeat(["a", "b", "c", "d"], inner=100))
plt = data(df) * mapping(:t, :x, color = :kind) * visual(Lines, colormap = :Paired_4)
save("test.png", draw(plt))

This yield output that is identical to the output when you remove colormap.

In contrast, the following Makie-only version appropriately colors the lines.

x = rand(100, 4)
df = DataFrame(t = repeat(axes(x, 1), outer=4), x = vec(x), kind=repeat(["a", "b", "c", "d"], inner=100))
fig = Figure()
Axis(fig[1,1])
for sdf in groupby(df, :kind)
    lines!(sdf.t, sdf.x, color=sdf.kind, colorrange = (1, 4), colormap=:Paired_4)
end
save("test.png", fig)

Here's the output of ]st

  [cbdf2221] AlgebraOfGraphics v0.6.0
  [13f3f980] CairoMakie v0.6.6
  [a93c6f00] DataFrames v1.3.2
  [ee78f7c6] Makie v0.15.3
@palday
Copy link
Contributor

palday commented Jan 26, 2022

Same seems to hold for Scatter.

@dwinkler1
Copy link

I think I am hitting a similar issue. Here the colormap is completely ignored:

dd = DataFrame(x = [rand(10)..., 10 .+ rand(10)...], c = [repeat('a', 10)..., repeat('b', 10)...])
plt = data(dd) * 
mapping(:x, color = :c) * 
AlgebraOfGraphics.density()
fig = draw(plt * visual(colormap = :reds))

@piever
Copy link
Collaborator

piever commented Feb 17, 2022

I think the issue is that categorical color variables (eg, columns of strings) are converted using the default color palette, so AlgebraOfGraphics does actually not look into colormap at all.

For what you are looking for, you should probably do

draw(plt, palettes=(color=cgrad(:Paired_4),))

(see last example here).

There may still be some subtleties to take care of (for example this by default gives a regular legend rather than a colorbar, I'm not 100% sure which one you'd want), but I think those should be discussed at #183, so I'm closing this one.

@piever piever closed this as completed Feb 17, 2022
@haberdashPI
Copy link
Contributor Author

Long-term is this the right design? At least as I understand it, for continuous scales you would normally want to use colormap. Ideally there would be just one keyword you would use to customize the colors used by a plot.

@piever
Copy link
Collaborator

piever commented Feb 22, 2022

At least as I understand it, for continuous scales you would normally want to use colormap.

Is the confusion that continuous and categorical scales have very different syntaxes? I think that's a fair objection, but it's not super easy to come up with a consistent solution. Do you have something in mind?

IMO, allowing colormap for categorical scales is probably a bad idea: all other categorical scales, like marker or linestyle, don't have something like that, so one would need two syntaxes any way.

Conceptually, colormap does not belong to the single layer but to the whole plot (it'd be confusing to use different colormaps on different layers of the same subplot), so a possible alternative would be to have a

draw(plt; palettes=(...), scales=(...))

where scales denotes the continuous scales to be used. The main difficulty for me is figuring out what exactly is a continuous scale. I imagine it would probably contain information about extrema of the data, nonlinear transformation, how scales are linked in facet plots. If one also allows scales for positional arguments, that could probably also remove the need for the facet = (linkxaxes=...,) syntax.

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Feb 24, 2022

Conceptually, colormap does not belong to the single layer but to the whole plot (it'd be confusing to use different colormaps on different layers of the same subplot),

I don't see why a continuous vs. discrete color space changes this: if it's confusing for continuous spaces it's confusing for discrete spaces as well. You may need different domains (one for each type of attribute) but if they map to color, I think they should all map to the same range (a mapping from number range -> linear color space).

For sake of clarity here, let's call a palette a mapping from an attribute to numbers, and a colormap a mapping from said numbers to a 1d colorspace (e.g. :viridis).

I imagine it would probably contain information about extrema of the data, nonlinear transformation, how scales are linked in facet plots...

I think I agree with most of this. To be concrete about how it relates to what I'm saying above about the distinction between palettes and colormaps: for palettes you'd want to specify the mapping of a domain of values to a range of integers (with some sensible default). Optionally this could also include a specification of some non-linear transform. For color mappings you'd want to specify a mapping from the same domain of numbers used in palettes to another domain of numbers representing the colors (where 0 is the first color and 1 is the last). The color mapping would be shared across all layers, while palettes could be specified on a per layer basis (or, for convenience during draw when you don't need a per-layer specification).

To be concrete, here's an example of a syntax that could makes sense to me:

x = rand(100)
y = rand(100)
z = rand(["a", "b", "c", "d"], 100)
df = (; x, y, z)
plt = data(df) * mapping(:x, :y, color=:z)
colors = [colorant"#E24A33", colorant"#348ABD", colorant"#988ED5", colorant"#777777"]
color_pallete = ["a" => 1, "b" => 2]
draw(plt; color_space=(;range=colors, kind=:discrete), palettes=(;color=color_palette))

And for continuous scales:

using AlgebraOfGraphics, CairoMakie
set_aog_theme!()

df = (x=randn(1000), y=randn(1000))
plt = data(df) * mapping(:x, :y) * AlgebraOfGraphics.density(npoints=50)
draw(plt * visual(Heatmap), color_space=(;range=:viridis, kind=:continuous), palettes=(;color=(0,5) => (0,1))

This would clamp density values > 5 to the same (most extreme) color.

In this particular proposal I'm suggesting a keyword different than colormap because presumably there would need to be some post-processing within AoG to map it into the colormap keyword used by Makie. In an ideal world I think you'd want to just use colormap for both rather than color_space, but I have a sense that might make this harder to actually implement.

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Feb 24, 2022

And, if you like this (or some variant), I'd be happy to take a crack at implementing it, at least for colors. I'm not as sure how it would relate to things like linked axes.

@piever
Copy link
Collaborator

piever commented Feb 24, 2022

Ah, I think I understand it now. You would want to split color palettes in two steps. The palette gives you integers, and the colormap turns them into colors. Reopening this because I think that proposal is not tracked anywhere. It seems like a promising idea, in that Makie would keep taking care of continuous scales, which avoids a fair amount of duplication. AlgebraOfGraphics converts categorical things into numbers.

In this scenario, I suppose the default color palette should return numbers (say [1, 2, 3, 4, 5, 6, 7], but returning numbers is the default, so we may not need a color palette at all). The default colorscheme (I like this name a bit better than color_space) would return our default 7 colors. I think it's cleaner to use different names (colorscheme vs colormap) for ease of themeability as well, as they would have different defaults.

At this point though, one may want to reconsider the word palettes: it's mostly about turning things into integers. So, I'm not sure on the API (see also this comment), but I do agree on the split between the "categorical conversion" (to customize "plot by plot") and an overall theme (just a list of colors). I imagine the role of palettes here would essentially replace these helpers. I was not super happy about those helpers either, as they force you to work with weird custom structs in order to be "row-wise" functions.

It's probably a good occasion to also allow different attributes to use a specific "scheme". For example, how does one inform arrowcolor here to use colorscheme? @jkrumbiegel has been mentioning this in the past, and I think we should support it (hopefully this decoupling makes it easier). We may try and take inspiration from the Makie color cycling.

And, if you like this (or some variant), I'd be happy to take a crack at implementing it, at least for colors. I'm not as sure how it would relate to things like linked axes.

I think your proposal is probably easier to implement than my previous rambling :), no need to worry about linked axes and such here. Especially if you want to keep the legend the same (categorical legend, as opposed to categorical colorbar legend for categorical variables) it should be reasonably straightforward. The processing of extra attributes is done in compute_attributes (added documentation for it in #358).

@piever piever reopened this Feb 24, 2022
@jkrumbiegel
Copy link
Member

For plotting practice, it's probably good to offer the ability to map specific categories to integers, so that in different plots that use the same general categories, but maybe different subsets, the value => integer / color mapping stays the same.

So for example, your mapping is ["a" => 1, "b" => 2, "c" => 3] and then if you have another plot with only "b" and "c" such that ["b" => 2, "c" => 3].

However, it would probably be annoying having to specify this mapping for one-off plots, then it would be fine to just go 1:n with the values that are present in the specific plot. So I think the api should make this part optional. You can specify only the color list, or additionally the value => integer mapping for categorical colors.

For the issue of color parameters that are not named color, yeah maybe something like the cycling technique in Makie can work, where I do :patchcolor => :color or even [:fillcolor, :strokecolor] => :color to assign multiple attributes to the same themeable preset.

@piever
Copy link
Collaborator

piever commented Feb 26, 2022

So for example, your mapping is ["a" => 1, "b" => 2, "c" => 3] and then if you have another plot with only "b" and "c" such that ["b" => 2, "c" => 3].

Yeah, I also think this is an important use case. At the moment you can do ["a" => color1, "b" => color2, "c" => color3], but it feels like defining your categories should be decoupled from the choice of colors.

I was also wondering, should this affect the order in which they show up in the legend? That is, if one passes

["a" => 2, "b" => 1, "c" => 3]

should "b" be the first legend entry? Somehow, it makes sense to me that the colors should always appear in the legend in the same order.

Re: optionality, yes, in this plan, one would have defaults for both the order (just enumerating the sorted unique values) and the color scheme (Wong colors), with the possibility of changing each independently. Mapping categoricalvalue => color (ie "a" => colorant"red") should probably be deprecated.

@haberdashPI
Copy link
Contributor Author

should "b" be the first legend entry?

That is what I intuitively assumed. Seems reasonable, and there should be some way to manipulate order since that's a huge legibility issue.

Perhaps a more principled justification is that the numbers represent some conceptual scale for the values (so it's a natural place for order to be specified). This conceptual scale is then mapped to a particular color space that has perceptual meaning.

However, it would probably be annoying having to specify this mapping for one-off plots,

I agree with @piever that "one would have defaults for both the order. and the color scheme".

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Feb 28, 2022

At this point though, one may want to reconsider the word palettes...

Nice! Yeah, maybe the right name is levels? I like the idea of using this keyword rather than using those helpers.

It's probably a good occasion to also allow different attributes to use a specific "scheme". For example, how does one inform arrowcolor here to use colorscheme

For the issue of color parameters that are not named color, yeah maybe something like the cycling technique in Makie can work, where I do :patchcolor => :color or even [:fillcolor, :strokecolor] => :color to assign multiple attributes to the same themeable preset.

I'm not certain I follow this but if I understand I think the issue here is that AoG doesn't know anything special about these attributes so you want a way to communicate that the integers should be mapped to colors (rather than say, a discrete set of linetype names or whatever).

I'm not sure this is how it should be exposed as an API, but I think what this reveals is that there are really three levels:

  1. mapping from data values to integers, controlled by levels
  2. a specification of what scheme a given keyword should map to
  3. The scheme itself: this maps particular levels to particular outputs

I think one API that could make sense would be that you specify the scheme when choosing the levels. So, for example, both of the calls to draw! below, would be valid ways to specify the scheme.

plt = ... * mapping(arrowcolor=:condition) * ...
draw!(..., levels=(;arrowcolor=scheme(:color)))
draw!(..., levels=(;arrowcolor=scheme(:color, ["b" => 2, "c" => 3])))

While I think it makes sense to have this be configurable, why not have a natural default as well? For example, for any keyword assigned in mapping where occursin(r"color"i, string(keyword)) use the colorscheme rather than some other scheme. If this were implemented, then the below two calls would yield the same behavior as the above two calls.

plt = ... * mapping(arrowcolor=:condition) * ...
draw!(...)
draw!(..., levels=(;arrowcolor=["b" => 2, "c" => 3]))

Most of the time then, a user won't have to worry about this, but they could configure it if needed.

This could lead to conflicts where a keyword maps two scheme's regexs. That could be resolved by requiring an explicit scheme for any such keyword. That is, when a keyword matches 0 or >1 regex some error gets thrown if you are trying to use mapping for that keyword.

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Mar 14, 2022

Is this in a place where I should attempt a PR? Based on the conversation above I was thinking it might make sense to wait for a reply, but since it's been a while I'm wondering if it's worth setting up a short proof-of-concept and iterate from feedback based on something more concrete.

@piever
Copy link
Collaborator

piever commented Mar 15, 2022

It took me a little bit, but I also grew convinced that the levels keyword is the correct approach. I think a PR would be great, but for now I would suggesting leaving out the scheme part (eg, just support levels=(; color=["a" => 1, "b" => 2, "c" => 3])).

Regarding the levels API, maybe it's enough to pass a vector of values, eg color = ["a", "b", "c"], and the order of the values determines the levels. Entries of the data passed to the color mapping that do not belong to ["a", "b", "c"] would get values from 4 onward, in order.

scheme could probably be a different keyword to draw altogether (analogous to cycle in Makie), eg

draw(plt, levels=(; arrowcolor=...), scheme = :arrowcolor => :color)

(with potentially some smart opt-in way to infer sensible things), but I suspect it's best to do that in a separate PR.

@jkrumbiegel
Copy link
Member

Entries of the data passed to the color mapping that do not belong to ["a", "b", "c"] would get values from 4 onward, in order.

Possibly it should throw an informative error in that case, as it seems weird to control only part of the levels (probably one was forgotten in the levels then?). But maybe not, I can imagine scenarios where it might be useful

@piever
Copy link
Collaborator

piever commented Mar 15, 2022

But maybe not, I can imagine scenarios where it might be useful

I had also initially thought one should be strict, but for instance @SimonDanisch asked for this feature: allow to have a couple of levels that are important and "fixed", and others that may vary and are not important (just cycle colors there). We currently support that with the pair syntax in the palettes.

OTOH, I think it's reasonable to assume that the "explicit" levels come before the "cycled" levels in general. That way, we can get rid of the pair syntax and pass a simple list of values.

@jkrumbiegel
Copy link
Member

Although one might want to mix these two cases right, let's say three fixed categories, but using colormap colors 1, 3, and 5. Then you'd still need pair syntax.

@piever
Copy link
Collaborator

piever commented Mar 17, 2022

Thinking more about this, I think there's the extra problem that the "levels" are much more a property of the data than they are a property of the plot attribute. As a practical concern, this makes it harder (and error-prone) to change what attribute is styled by a given variable, and makes it a bit more awkward to style several attributes with the same variable.

Compare

c = var => sorter(levels)
m = mapping(color = c, dodge = c)
draw(m * ...)

m = mapping(marker = c)
draw(m * ...)

with

m = mapping(color = var, dodge = var)
draw(m * ..., levels=(color=levels, dodge=levels))

m = mapping(marker = var)
draw(m * ..., levels=(marker=levels,))

Could it be enough to "power up" the sorter helper so that it can handle everything correctly? I tried to describe that a bit here.

@haberdashPI
Copy link
Contributor Author

Could it be enough to "power up" the sorter helper so that it can handle everything correctly?

Ah okay; so if I understand, you would never map levels to values; the palette would always be an array of colors, and any mapping of strings to numbers occurs in the sorter helper (with leftover levels being sorted by some default scheme, and following after the explicitly specified values).

@piever
Copy link
Collaborator

piever commented Apr 28, 2022

Yes, that's correct. It seemed like the clean way to decouple ordering of categorical variables with setting the theme, though we should definitely think hard as to whether there are some important scenarios that become trickier.

One situation that seems potentially troubling is the following. The user has two categorical variables (say species and island), each with their own color palette, i.e. in all the figures island gets a given color scheme and species gets another one. Before one could just pass pairs with either names of islands or species to the color palette, whereas now things get a bit problematic. Maybe that's a separate problem though, and one should just consider supporting separate palettes, i.e.

palettes=(color=(species=[:cyan, :teal, :blue], island=[:yellow, :orange, :red]),)

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Apr 29, 2022

I like the notion of matching it by variable name rather than individual value. Especially because it's entirely possible that the same string could mean different things in different contexts: e.g. ["bat", "bear", "cat"], vs. ["bat", "racket", "club"].

Side note: my adjustment in #384 doesn't actually remove any functionality. Named palettes are still possible there, just not recommended in the documentation.

@haberdashPI
Copy link
Contributor Author

haberdashPI commented Apr 29, 2022

Having looked through palettes recently, I think I understand how to implement this change; I could submit a second PR for this that also deprecates the old behavior of mapping values to colors. (Or just add it as part of #384).

@jkrumbiegel jkrumbiegel linked a pull request Jul 16, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants