Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for string values, the "proportion" reduce uses counts instead of sums #410

Closed
wants to merge 7 commits into from

Conversation

Fil
Copy link
Contributor

@Fil Fil commented May 18, 2021

the type detection is consistent (done on the whole set), so mixes of nulls and numbers will not give a weight of 1 to nulls.

however it's not done once, since I don't know how to cache the result — but in most cases it will return immediately.

fixes #399

test build at https://observablehq.com/d/1dd4f1c3e491b6d6

maybe it would be good to add a few test cases mixing nulls, undefined, numbers, strings.

@Fil Fil requested a review from mbostock May 18, 2021 15:02
@Fil
Copy link
Contributor Author

Fil commented May 18, 2021

@yurivish if you want to review

@@ -201,10 +202,16 @@ const reduceCount = {
}
};

function sumOrCount(I, V) {
return typeof V.find(v => v != null) === "number"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could lead to inconsistent behavior across series (and facets). Would it be possible to do this earlier, on the first non-null value in the entire channel, as we do for other forms of type inference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what I'm doing here: V is the entire channel. What I don't like is I'm doing the same for each series, without caching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the clarification.

@@ -213,8 +214,13 @@ const reduceDistinct = {

const reduceSum = reduceAccessor(sum);

function sumOrCount(I, V) {
if (!("type" in V)) V.type = typeof V.find(v => v != null) === "number";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’ll want a WeakMap here instead of mutating V. (I believe that V can sometimes be an externally-provided value, but even if it isn’t, we generally want to avoid mutating inputs.)

Or another simple approach that I believe will work here is single-value memoization: record the last-passed V value and the last-computed type; if the new V value is the same as the old one, return the previously-computed type.

Also, if we do this, it’ll persist across render calls even if the caller mutates the contained values. That might be surprising, but it’s probably rare enough to not worry about.

Copy link
Member

@mbostock mbostock Jul 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we’ll want an isNumeric helper here (in mark.js) modeled after isTemporal and isOrdinal, and we should probably use for-of instead of array.find for consistency:

plot/src/mark.js

Lines 332 to 338 in 8aafa7c

export function isOrdinal(values) {
for (const value of values) {
if (value == null) continue;
const type = typeof value;
return type === "string" || type === "boolean";
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! and memoization will be useful for all of isOrdinal, isTemporal, isNumeric, not just for isNumeric. In a78d7d4 I've centralized the content type detection.

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the memoization until we have evidence that suggests that it is a bottleneck.

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer to leave the isOrdinal and isTemporal as-is, and just introduce a new isNumeric that follows the similar pattern.

@@ -164,6 +164,7 @@ export function maybeSubgroup(outputs, Z, F, S) {

function reduceFunction(f) {
return {
label: f.label,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated (and untested), so I’m going to revert this part. Elsewhere we don’t promote the reducer name to the axis label so I think it probably isn’t appropriate to promote it here. (For example, when you use the “sum” reducer, we don’t use or want “sum” as the axis label; we want the name of the column being summed.)

text: "first"
},
{
x: "key", // this is only strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t strictly needed here because it doesn’t affect the composition of the groups (which here for groupZ is determined only by the fill channel), i.e. removing this key doesn’t affect the chart (but with it removed, we’re no longer testing the feature added by this PR). It’d be nice to have a more explicit test.

@mbostock
Copy link
Member

mbostock commented Aug 1, 2021

I’m wondering if this PR makes sense? If there’s a value input to the proportion reducer, it means we want a weighted proportion. But if those values are ordinal, there’s no way to know what the appropriate weights are to compute the weighted proportion. We could ignore the input value, but it seems undesirable to silently ignore it? Perhaps it would be better to throw an error in this case since the idiomatic fix is to remove the unnecessary x: "key" from the group transform options.

@mbostock
Copy link
Member

mbostock commented Aug 1, 2021

Closing as argued in #399 (comment) but please continue the discussion if you are not convinced. (Also, it might still be worth adding this test, since it now works in main.)

@mbostock mbostock closed this Aug 1, 2021
@Fil Fil deleted the fil/sum-or-count-399 branch August 4, 2021 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduce: proportion for counts
2 participants