Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/histograms don't match the scatter plot #80

Merged
merged 6 commits into from
Oct 29, 2021
Merged

Fix/histograms don't match the scatter plot #80

merged 6 commits into from
Oct 29, 2021

Conversation

schoinh
Copy link
Contributor

@schoinh schoinh commented Oct 28, 2021

Problem

The histograms on the main plot didn't match what's displayed in the scatter plot at all times because:

The data for the x histogram is just an array of x values like [2.3, null, null, 5, ...], and the data for y histogram is just an array of y values. If a point has an x value but a null y value, that point is not displayed in the plot (I assume Plotly automatically ignores them), but the x histogram doesn’t know that, so it counts it for the histogram. And vise versa for the y histogram.

Resolves: Histograms don't accurately reflect what's displayed in main plot

Solution

  • Made a util function that syncs all the null values between 2 arrays and used it to sync the null values between the x and y values in the getMainPlotData selector.
    • Yes it loops through all the x and y values. Unfortunately this is a different set of x and y values than the x and y values we loop through for the checkboxes (this set of x and y values get filtered based on checkbox show/hide states). I didn't see a noticeable slowdown caused by this change when showing/hiding cells with the checkboxes and when changing axes.

Also:

  • Wrote test for the new util function
  • Fixed some types
  • Fixed some formatting

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires updated or new tests

Steps to Verify:

  1. npm start and load the "cellsystems_fish_v2021.1" dataset

  2. Change the axes to something mutually exclusive, like MYH7 vs HPRT1

  3. You should see nothing in the scatter plot and nothing in the axis histograms (maybe in the future we can put in some sort of text like "no data to display" in situations like this)

    image

  4. Verify that plot and checkboxes still work as before otherwise.

Keyfiles (delete if not relevant):

  1. src/containers/MainPlotContainer/selectors.ts
  2. src/util/index.ts & src/util/test/index.test.ts
    Everything else is formatting fix and type fix

Thanks for contributing!

Comment on lines +20 to +41
const state: State = {
...newMockState,
selection: {
...newMockState.selection,
plotByOnX: "apical-proximity",
},
};
const result: (number | null)[] = getFilteredXValues(state);
const newState = {
...state,
selection: {
...newMockState.selection,
plotByOnX: "cell-segmentation",
},
};
const feature1Values = [-0.25868651080317, -0.1];
const feature2Values = [1, 0];

const state: State = {
...newMockState,
selection: {
...newMockState.selection,
plotByOnX: "apical-proximity",
},
};
const result: number[] = getFilteredXValues(state);
const newState = {
...state,
selection: {
...newMockState.selection,
plotByOnX: "cell-segmentation",
},
};
const feature1Values = [-0.25868651080317, -0.1];
const feature2Values = [1, 0];

const newResult: number[] = getFilteredXValues(newState);
expect(result).to.deep.equal(feature1Values);
expect(newResult).to.deep.equal(feature2Values);
expect(result.length).to.equal(newResult.length);
const newResult: (number | null)[] = getFilteredXValues(newState);
expect(result).to.deep.equal(feature1Values);
expect(newResult).to.deep.equal(feature2Values);
expect(result.length).to.equal(newResult.length);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just all whitespace change

if (array1[i] === null) {
array2[i] = null;
} else if (array2[i] === null) {
array1[i] = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be happening in the selector that gets the values for the histogram? What's the reason for doing it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it in a util function? so it's easier to test 😅 Or were you asking another question?

Copy link
Contributor Author

@schoinh schoinh Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values that go in the histograms are just mainPlotData.x and mainPlotData.y,

makeHistogramPlotX(mainPlotData.x),
so I figured I would just do this null-syncing operation in getMainPlotData selector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, I didn't read this carefully enough and thought the util function was being called from a component, not another selector. What I meant was I was thinking we'd have a "getHistogramData" selector that fed into the main plot data, but this works. I think you could keep the function in the selector file since it's only being called from there, and it doesn't seem like a general function other modules are going to use.

Generally there is the convention that anytime you're reading from state, like in a selector, you make a copy instead of directly manipulating the data, and you only manipulate the data through actions. I can't think of any issue this code as is would cause (you could even make the case we should just be doing this before we save the data in state) but it does look odd to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern here is that you are actually modifying the array. Is that ok? Are you actually changing data that shouldn't be changed? Like if you have (x=3,y=null) and you change the 3 to a null but then want to plot a different Y feature, is the x=3 gone forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll rework this to make copies of the data instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should say that I am not aware of whether you are already starting with copies when you actually arrive in this function, or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not starting with copies. But I should. Either that or produce copies in here

// Only preserve values at indices where both x and y values are not null,
// because a coordinate like (3, null) won't be plotted anyway and produces
// inaccurate histograms.
syncNullValues(xValues, yValues);
// for datasets that have a lot of null values,
// if the whole array is null it throws an error
if (!filter(xValues).length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is something fishy to me about this filter call, too. is this just supposed to be a check for whether the array contains all null values? (what if it contains all 0 values?) filter seems to create a whole new array, which means we're doing a very unnecessary data copy here. I feel like this and the syncNullValues could be combined into something more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about all 0 values (and the data copy). I should be able combine the two operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it is outside the scope of the PR but I made the mistake of trying to actually understand the code. Hopefully it is not too involved to clean it up. :(

@schoinh
Copy link
Contributor Author

schoinh commented Oct 28, 2021

@meganrm @toloudis Let me know if it looks better now.

Copy link
Contributor

@toloudis toloudis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for fixing this!

@schoinh schoinh merged commit 75532e7 into main Oct 29, 2021
@schoinh schoinh deleted the fix/histograms branch October 29, 2021 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants