Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only create MatrixStatsResults on final reduction #38130

Merged

Conversation

polyfractal
Copy link
Contributor

MatrixStatsResults is the "final" result object, and runs an additional computation in it's ctor to calculate covariance, etc. This means it should only run on the final reduction instead of on every reduce. But today each round of reductions will create a new MatrixStatsResults object and execute compute(), which I think is where the error in #37587 is coming from.

I'm not 100% certain I understand how MatrixStats works, but based on the test failure and the suggested fix in this PR, I think that's what's going on.

As an aside, we should probably do some refactoring on MatrixStats, so that it doesn't need to pass around a null MatrixStatsResults to distinguish between "intermediate" and "final" status.

Closes #37587

MatrixStatsResults is the "final" result object, and runs an additional
computation in it's ctor to calculate covariance, etc.  This means
it should only run on the final reduction instead of on every reduce.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/default-distro

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should refactor this so it doesn't rely on the null results value to differentiate between intermediate and final results especially as null results is also used to indicate an empty result too

Copy link
Contributor

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/default-distro

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
MatrixStatsResults is the "final" result object, and runs an additional
computation in it's ctor to calculate covariance, etc.  This means
it should only run on the final reduction instead of on every reduce.
weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
MatrixStatsResults is the "final" result object, and runs an additional
computation in it's ctor to calculate covariance, etc.  This means
it should only run on the final reduction instead of on every reduce.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MatrixStatsAggregatorTests#testTwoFields fails if aggs are reduced
8 participants