[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

dimitris-athanasiou · 2019-06-04T14:23:38Z

…outliers

Also fixes a bug with the matching query for binary soft classification

elasticmachine · 2019-06-04T14:23:42Z

Pinging @elastic/ml-core

…outliers Also fixes a bug with the matching query for binary soft classification

dimitris-athanasiou · 2019-06-04T15:26:30Z

@elasticmachine test this

dimitris-athanasiou · 2019-06-04T15:44:57Z

run tests

benwtrent · 2019-06-04T15:52:16Z

...ain/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/softclassification/AucRoc.java

+        double[] tpPercentiles = percentilesArray(classAgg.getAggregations().get(PERCENTILES),
+            "[" + getMetricName() + "] requires at least one actual_field to have the value [" + classInfo.getName() + "]");
+        double[] fpPercentiles = percentilesArray(restAgg.getAggregations().get(PERCENTILES),
+            "[" + getMetricName() + "] requires at least one actual_field not to have the value [" + classInfo.getName() + "]");


Suggested change

"[" + getMetricName() + "] requires at least one actual_field not to have the value [" + classInfo.getName() + "]");

"[" + getMetricName() + "] requires at least one actual_field NOT to have the value [" + classInfo.getName() + "]");

I'm actually thinking it might be better to go with "... requires at least one actual_field to have a different value than [x]". I'll commit that.

przemekwitek · 2019-06-04T19:02:45Z

...ain/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/softclassification/AucRoc.java

        List<AucRocPoint> aucRocCurve = buildAucRocCurve(tpPercentiles, fpPercentiles);
        double aucRocScore = calculateAucScore(aucRocCurve);
        return new Result(aucRocScore, includeCurve ? aucRocCurve : Collections.emptyList());
    }

-    private static double[] percentilesArray(Percentiles percentiles) {
+    private static double[] percentilesArray(Percentiles percentiles, String errorIfUndefined) {


[my random thought]
So if IIUC in the case of all the label ending up having the same value (inlier or outlier), the "evaluate" request will fail.
Have you considered an alternative approach in which the error is returned as part of EvaluationMetricResult? This way, the request could succeed and other metrics (like precision, recall, etc.) could be calculated and returned to the user.

It's a good one, I've thought about it too. However, it would take much more effort and I'm not convinced it's worth it as the workaround is simple and the chances for this edge case to happen are low. If we see users requesting a different behaviour here, we can make this change in the future.

dimitris-athanasiou added the :ml Machine learning label Jun 4, 2019

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/…

15bec23

…outliers Also fixes a bug with the matching query for binary soft classification

dimitris-athanasiou force-pushed the auc_roc-cannot-be-calculated-when-there-are-no-outliers branch from b5e4c80 to 15bec23 Compare June 4, 2019 15:11

benwtrent self-requested a review June 4, 2019 15:42

benwtrent approved these changes Jun 4, 2019

View reviewed changes

przemekwitek reviewed Jun 4, 2019

View reviewed changes

Rephrase error message

9dd01b0

dimitris-athanasiou merged commit 44856a5 into elastic:feature-ml-data-frame-analytics Jun 5, 2019

dimitris-athanasiou deleted the auc_roc-cannot-be-calculated-when-there-are-no-outliers branch June 5, 2019 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

dimitris-athanasiou commented Jun 4, 2019

elasticmachine commented Jun 4, 2019

dimitris-athanasiou commented Jun 4, 2019

dimitris-athanasiou commented Jun 4, 2019

benwtrent Jun 4, 2019

dimitris-athanasiou Jun 5, 2019

przemekwitek Jun 4, 2019

dimitris-athanasiou Jun 5, 2019

	"[" + getMetricName() + "] requires at least one actual_field not to have the value [" + classInfo.getName() + "]");
	"[" + getMetricName() + "] requires at least one actual_field NOT to have the value [" + classInfo.getName() + "]");

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

[FEATURE][ML] auc_roc cannot be calculated when there are no inliers/… #42853

Conversation

dimitris-athanasiou commented Jun 4, 2019

elasticmachine commented Jun 4, 2019

dimitris-athanasiou commented Jun 4, 2019

dimitris-athanasiou commented Jun 4, 2019

benwtrent Jun 4, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Jun 5, 2019

Choose a reason for hiding this comment

przemekwitek Jun 4, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Jun 5, 2019

Choose a reason for hiding this comment