-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ranking Evaluation API: Add MAP and recall@k metrics #51676
Comments
Pinging @elastic/es-search (:Search/Ranking) |
Potentially a duplicate of #29653 |
By mean average precision did you mean like the one described in the stanford IR course - https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-1-per.pdf (The book itself does not appear to talk about MAP) 🤷♂ I've added it here sbourke@652ff11#diff-5fb623709353794e709a58f45104baec I've added recall as well. It thats what you're generally looking for please let me know and I'll clean up the code. The tests are based off of the precision tests. |
Hang tight, I've also got a branch with some other things cleaned up in it. Let's sync up after I have a PR in. |
@sbourke thanks for your interest in contributing! Perhaps we can first work to integrate @joshdevins's PR to add recall@k (#52577), then you could follow-up with a PR to add mAP. Feel free to add any suggestions to the recall@k PR. It's generally nice to separate out changes into small PRs, so I think it's fine to add each metric separately. It would also be great to get @cbuescher's thoughts on the proposed metrics to make sure we're happy to add them. |
@sbourke I think that MAP definition is what I understand. I'm looking at the TREC definition and I think it's the same. |
@joshdevins Your explicit confusion matrix matrix is much nicer than what I was doing. I'll look at the coder changes more closely today. Do you have GMAP as well, or should I do that. |
Have a look at the PR (#52577) again — it's ready to merge so you can use it as a basis for the next change set if you want.
I think from the ML perspective, it's typical for how we evaluate and calculate metrics. We decided to remove it for now though as it introduces a bit of unnecessary indirection in the code. We might put it back later after the MAP implementation. See related discussion in the PR. After removing the confusion matrix, I normalized all the variables and way of calculating metrics in
I haven't done anything for (G)MAP yet so you are welcome to contribute if you want. Let me know if you are still interested in doing a feature branch for that work. If you haven't already, have a look at CONTRIBUTING.md for some details on how we take contributions through PRs. We should be able to implement GMAP as an option on the MAP metric, much as the DCG metric provides the |
This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676
This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676 Backports: #52577
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Several information retrieval "tasks" use a few common evaluation metrics including mean average precision (MAP) [1] and recall@k, in addition to what is already supported (e.g. ERR, nDCG, MRR). Sometimes the geometric MAP (GMAP) variant is used and if it's an easy option to add (like how NDCG is an option on DCG), we should add this option. These are standard measures in many TREC and related tasks (e.g. MSMARCO). In particular, reranking tasks use recall@k to tune the base query which is input to a reranker (e.g. tuning BM25 or RM3 parameters).
[1] "GMAP is the geometric mean of per-topic average precision, in contrast with MAP which is the arithmetic mean"
The text was updated successfully, but these errors were encountered: