Adds recall@k metric to rank eval API #52577

joshdevins · 2020-02-20T15:46:32Z

This change adds the recall@k metric and refactors precision@k to match the new metric.

Recall@k is an important metric to use for learning to rank (LTR). Candidate generation / first phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking using LTR or a less efficient ranking function. Adding this metric allows tuning the candidate generation for LTR.

See: #51676

jtibshirani

I did an initial pass, but haven't yet done a detailed review as it sounds like you're still sorting out some details.

For me, the new confusion matrix abstraction adds more complexity than it’s worth:

We compute true negatives, but none of our metrics actually use that value yet. To calculate true negatives, we must add a new 'total hits' parameter to EvaluationMetric#evaluate.
To me it doesn’t make it easier to understand each calculation (although maybe others have a different intuition).

Perhaps we could hold off on adding an abstraction until we add more metrics that would benefit from sharing logic?

joshdevins · 2020-02-21T09:30:31Z

We compute true negatives, but none of our metrics actually use that value yet

Yeah, I stuck it in to see how it would look, but I agree it's more than we need and can take it out.

To me it doesn’t make it easier to understand each calculation

That's basically why I added it — I find it easier to reason about these calculations with a confusion matrix, thinking about true positives, false positives, etc. instead of the more abstract concepts like recall being "the proportion of relevant documents in the top-k vs all possible relevant documents", but maybe that's just me 😄 I'm not very opinionated about that detail so I'm happy either way.

joshdevins · 2020-02-24T10:34:03Z

zOMG, builds are finally passing. I'm removing the confusion matrix abstraction, adding some better recall tests, then I'll ping for a review again after that. @sbourke this should make adding MAP a bit easier, so you can base it off this change set.

joshdevins · 2020-02-24T16:24:09Z

I'm adding Docs as well. I just realised I can do that in the same PR. Will get to that tomorrow though.

jtibshirani

This is looking good to me, just left a few small comments. Some higher-level notes:

There is quite a bit of duplication between PrecisionAtK and RecallAtK in terms of serialization code, getters + setters, etc. We could try a refactor to pull out some shared code, but to me it's best to keep the implementation straightforward for now.
I think the current approach is fine from a backwards-compatibility perspective. When nodes containing this new metric try to communicate with nodes without the implementation, we will simply fail (because we won't find a matching 'named writeable').

modules/rank-eval/src/main/java/org/elasticsearch/index/rankeval/RecallAtK.java

modules/rank-eval/src/test/java/org/elasticsearch/index/rankeval/PrecisionAtKTests.java

modules/rank-eval/src/test/java/org/elasticsearch/index/rankeval/RankEvalSpecTests.java

joshdevins · 2020-02-25T09:28:22Z

There is quite a bit of duplication between PrecisionAtK and RecallAtK in terms of serialization code, getters + setters, etc. We could try a refactor to pull out some shared code, but to me it's best to keep the implementation straightforward for now.

Yeah, that's why I tried to refactor it out in the confusion matrix version. The serialization stuff is annoying to find well factored patterns since it relies on a lot of static methods and fields. I would vote to leave it as-is for now and perhaps after we add MAP/GMAP we start to see better patterns to refactor to.

In general, I would prefer to group these metrics into something like:

Set-based binary metrics (precision, recall)
Binary rank metrics (MAP, MRR)
Graded rank metrics (DCG, ERR)

Maybe we can find better ways to factor things based on these kinds of abstractions. The confusion matrix metric was a first attempt to do that. It's also how we have done these metrics in the ML plugin for classification, based on a confusion matrix metric. Of course it's a bit different for rank metrics since you have to deal with top-k situations (e.g. for recall) which you don't have to deal with in the ML use-cases, which makes it a bit more effort and not as clean to code.

jtibshirani

I would vote to leave it as-is for now and perhaps after we add MAP/GMAP we start to see better patterns to refactor to.

Sounds good!

This looks good to me, I think it's ready to merge after you add a note to the docs. If you agree, it'd be good to remove the 'WIP' label, and add the appropriate labels (area, type of change, plus target versions).

elasticmachine · 2020-02-26T08:57:13Z

Pinging @elastic/es-search (:Search/Ranking)

This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676

docs/reference/search/rank-eval.asciidoc

This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676 Backports: #52577

joshdevins added the WIP label Feb 20, 2020

joshdevins requested review from jtibshirani and cbuescher February 20, 2020 15:46

joshdevins mentioned this pull request Feb 20, 2020

Ranking Evaluation API: Add MAP and recall@k metrics #51676

Open

jtibshirani reviewed Feb 20, 2020

View reviewed changes

joshdevins marked this pull request as ready for review February 24, 2020 15:57

joshdevins requested a review from jtibshirani February 24, 2020 15:57

joshdevins self-assigned this Feb 24, 2020

jtibshirani reviewed Feb 24, 2020

View reviewed changes

jtibshirani approved these changes Feb 25, 2020

View reviewed changes

joshdevins added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. v7.7.0 labels Feb 26, 2020

joshdevins added >enhancement and removed WIP labels Feb 26, 2020

jtibshirani reviewed Feb 26, 2020

View reviewed changes

docs/reference/search/rank-eval.asciidoc Outdated Show resolved Hide resolved

jtibshirani added the v8.0.0 label Feb 26, 2020

Fixes a docs typo from copy-paste

a6f0027

joshdevins merged commit 4ff5e03 into elastic:master Feb 27, 2020

joshdevins deleted the joshdevins/rank-eval-recall-at-k branch February 27, 2020 09:43

joshdevins mentioned this pull request Feb 27, 2020

Adds recall@k metric to rank eval API #52889

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds recall@k metric to rank eval API #52577

Adds recall@k metric to rank eval API #52577

joshdevins commented Feb 20, 2020 •

edited

Loading

jtibshirani left a comment

joshdevins commented Feb 21, 2020 •

edited

Loading

joshdevins commented Feb 24, 2020

joshdevins commented Feb 24, 2020 •

edited

Loading

jtibshirani left a comment

joshdevins commented Feb 25, 2020 •

edited

Loading

jtibshirani left a comment

elasticmachine commented Feb 26, 2020

Adds recall@k metric to rank eval API #52577

Adds recall@k metric to rank eval API #52577

Conversation

joshdevins commented Feb 20, 2020 • edited Loading

jtibshirani left a comment

Choose a reason for hiding this comment

joshdevins commented Feb 21, 2020 • edited Loading

joshdevins commented Feb 24, 2020

joshdevins commented Feb 24, 2020 • edited Loading

jtibshirani left a comment

Choose a reason for hiding this comment

joshdevins commented Feb 25, 2020 • edited Loading

jtibshirani left a comment

Choose a reason for hiding this comment

elasticmachine commented Feb 26, 2020

joshdevins commented Feb 20, 2020 •

edited

Loading

joshdevins commented Feb 21, 2020 •

edited

Loading

joshdevins commented Feb 24, 2020 •

edited

Loading

joshdevins commented Feb 25, 2020 •

edited

Loading