-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create MatrixStatsResults on final reduction #38130
Only create MatrixStatsResults on final reduction #38130
Conversation
MatrixStatsResults is the "final" result object, and runs an additional computation in it's ctor to calculate covariance, etc. This means it should only run on the final reduction instead of on every reduce.
Pinging @elastic/es-analytics-geo |
@elasticmachine run elasticsearch-ci/2 |
@elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should refactor this so it doesn't rely on the null results value to differentiate between intermediate and final results especially as null results is also used to indicate an empty result too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
@elasticmachine run elasticsearch-ci/default-distro |
@elasticmachine run elasticsearch-ci/2 |
MatrixStatsResults is the "final" result object, and runs an additional computation in it's ctor to calculate covariance, etc. This means it should only run on the final reduction instead of on every reduce.
MatrixStatsResults is the "final" result object, and runs an additional computation in it's ctor to calculate covariance, etc. This means it should only run on the final reduction instead of on every reduce.
MatrixStatsResults
is the "final" result object, and runs an additional computation in it's ctor to calculate covariance, etc. This means it should only run on the final reduction instead of on every reduce. But today each round of reductions will create a newMatrixStatsResults
object and executecompute()
, which I think is where the error in #37587 is coming from.I'm not 100% certain I understand how MatrixStats works, but based on the test failure and the suggested fix in this PR, I think that's what's going on.
As an aside, we should probably do some refactoring on MatrixStats, so that it doesn't need to pass around a null MatrixStatsResults to distinguish between "intermediate" and "final" status.
Closes #37587