-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for string values, the "proportion" reduce uses counts instead of sums #410
Conversation
@yurivish if you want to review |
src/transforms/group.js
Outdated
@@ -201,10 +202,16 @@ const reduceCount = { | |||
} | |||
}; | |||
|
|||
function sumOrCount(I, V) { | |||
return typeof V.find(v => v != null) === "number" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to inconsistent behavior across series (and facets). Would it be possible to do this earlier, on the first non-null value in the entire channel, as we do for other forms of type inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what I'm doing here: V is the entire channel. What I don't like is I'm doing the same for each series, without caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the clarification.
src/transforms/group.js
Outdated
@@ -213,8 +214,13 @@ const reduceDistinct = { | |||
|
|||
const reduceSum = reduceAccessor(sum); | |||
|
|||
function sumOrCount(I, V) { | |||
if (!("type" in V)) V.type = typeof V.find(v => v != null) === "number"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’ll want a WeakMap here instead of mutating V. (I believe that V can sometimes be an externally-provided value, but even if it isn’t, we generally want to avoid mutating inputs.)
Or another simple approach that I believe will work here is single-value memoization: record the last-passed V value and the last-computed type; if the new V value is the same as the old one, return the previously-computed type.
Also, if we do this, it’ll persist across render calls even if the caller mutates the contained values. That might be surprising, but it’s probably rare enough to not worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we’ll want an isNumeric helper here (in mark.js) modeled after isTemporal and isOrdinal, and we should probably use for-of instead of array.find for consistency:
Lines 332 to 338 in 8aafa7c
export function isOrdinal(values) { | |
for (const value of values) { | |
if (value == null) continue; | |
const type = typeof value; | |
return type === "string" || type === "boolean"; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! and memoization will be useful for all of isOrdinal, isTemporal, isNumeric, not just for isNumeric. In a78d7d4 I've centralized the content type detection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the memoization until we have evidence that suggests that it is a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer to leave the isOrdinal and isTemporal as-is, and just introduce a new isNumeric that follows the similar pattern.
unit test
memoize with a WeakMap
09b2948
to
ea47b5d
Compare
src/transforms/group.js
Outdated
@@ -164,6 +164,7 @@ export function maybeSubgroup(outputs, Z, F, S) { | |||
|
|||
function reduceFunction(f) { | |||
return { | |||
label: f.label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated (and untested), so I’m going to revert this part. Elsewhere we don’t promote the reducer name to the axis label so I think it probably isn’t appropriate to promote it here. (For example, when you use the “sum” reducer, we don’t use or want “sum” as the axis label; we want the name of the column being summed.)
test/plots/software-versions.js
Outdated
text: "first" | ||
}, | ||
{ | ||
x: "key", // this is only strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t strictly needed here because it doesn’t affect the composition of the groups (which here for groupZ is determined only by the fill channel), i.e. removing this key doesn’t affect the chart (but with it removed, we’re no longer testing the feature added by this PR). It’d be nice to have a more explicit test.
I’m wondering if this PR makes sense? If there’s a value input to the proportion reducer, it means we want a weighted proportion. But if those values are ordinal, there’s no way to know what the appropriate weights are to compute the weighted proportion. We could ignore the input value, but it seems undesirable to silently ignore it? Perhaps it would be better to throw an error in this case since the idiomatic fix is to remove the unnecessary |
Closing as argued in #399 (comment) but please continue the discussion if you are not convinced. (Also, it might still be worth adding this test, since it now works in main.) |
the type detection is consistent (done on the whole set), so mixes of nulls and numbers will not give a weight of 1 to nulls.
however it's not done once, since I don't know how to cache the result — but in most cases it will return immediately.
fixes #399
test build at https://observablehq.com/d/1dd4f1c3e491b6d6
maybe it would be good to add a few test cases mixing nulls, undefined, numbers, strings.