Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

Conversation

dimitris-athanasiou
Copy link
Contributor

…outliers

Also fixes a bug with the matching query for binary soft classification

@dimitris-athanasiou dimitris-athanasiou added the :ml Machine learning label Jun 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

…outliers

Also fixes a bug with the matching query for binary soft classification
@dimitris-athanasiou dimitris-athanasiou force-pushed the auc_roc-cannot-be-calculated-when-there-are-no-outliers branch from b5e4c80 to 15bec23 Compare June 4, 2019 15:11
@dimitris-athanasiou
Copy link
Contributor Author

@elasticmachine test this

@benwtrent benwtrent self-requested a review June 4, 2019 15:42
@dimitris-athanasiou
Copy link
Contributor Author

run tests

double[] tpPercentiles = percentilesArray(classAgg.getAggregations().get(PERCENTILES),
"[" + getMetricName() + "] requires at least one actual_field to have the value [" + classInfo.getName() + "]");
double[] fpPercentiles = percentilesArray(restAgg.getAggregations().get(PERCENTILES),
"[" + getMetricName() + "] requires at least one actual_field not to have the value [" + classInfo.getName() + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"[" + getMetricName() + "] requires at least one actual_field not to have the value [" + classInfo.getName() + "]");
"[" + getMetricName() + "] requires at least one actual_field NOT to have the value [" + classInfo.getName() + "]");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually thinking it might be better to go with "... requires at least one actual_field to have a different value than [x]". I'll commit that.

List<AucRocPoint> aucRocCurve = buildAucRocCurve(tpPercentiles, fpPercentiles);
double aucRocScore = calculateAucScore(aucRocCurve);
return new Result(aucRocScore, includeCurve ? aucRocCurve : Collections.emptyList());
}

private static double[] percentilesArray(Percentiles percentiles) {
private static double[] percentilesArray(Percentiles percentiles, String errorIfUndefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[my random thought]
So if IIUC in the case of all the label ending up having the same value (inlier or outlier), the "evaluate" request will fail.
Have you considered an alternative approach in which the error is returned as part of EvaluationMetricResult? This way, the request could succeed and other metrics (like precision, recall, etc.) could be calculated and returned to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good one, I've thought about it too. However, it would take much more effort and I'm not convinced it's worth it as the workaround is simple and the chances for this edge case to happen are low. If we see users requesting a different behaviour here, we can make this change in the future.

@dimitris-athanasiou dimitris-athanasiou merged commit 44856a5 into elastic:feature-ml-data-frame-analytics Jun 5, 2019
@dimitris-athanasiou dimitris-athanasiou deleted the auc_roc-cannot-be-calculated-when-there-are-no-outliers branch June 5, 2019 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants