Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndcg@k metric is inaccurate #6707

Closed
PascalIversen opened this issue Feb 15, 2021 · 2 comments · Fixed by #8822
Closed

ndcg@k metric is inaccurate #6707

PascalIversen opened this issue Feb 15, 2021 · 2 comments · Fixed by #8822
Labels
LTR Learning to rank

Comments

@PascalIversen
Copy link

Hey,
I am using xgboost version 1.3.3 and I am getting invalid ndcg@k metric values. The values are >1 while the ndcg@k has to be in [0,1] by definition.

Example:

# creating dummy ranking data
import numpy as np
groups = [60, 50, 32, 22, 22 , 22 , 20, 12]
labels = []
features = []
for group in groups:
  label_group = list(np.argsort(np.argsort(np.random.randn(group))))
  labels = labels + label_group
  for label in label_group:
    features.append([label+10, np.random.randn(), np.random.randn()]) 
features = np.stack(features)


# fitting a ranking model

import xgboost as xgb
import pandas

model = xgb.XGBRanker(
        objective="rank:pairwise",
        max_depth= 10,
        n_estimators=1000,
         verbosity=2)

ranker = model.fit(X=features,
        y=labels, group=groups,
        eval_set=[(features, labels)],
        eval_group=[groups], eval_metric="ndcg@10",  early_stopping_rounds=2, verbose=True)
[16:15:01] INFO: ../src/tree/updater_prune.cc:101: tree pruning end, 146 extra nodes, 0 pruned nodes, max_depth=10
[0]	validation_0-ndcg@10:1.32964
[16:15:01] INFO: ../src/tree/updater_prune.cc:101: tree pruning end, 112 extra nodes, 0 pruned nodes, max_depth=10
[1]	validation_0-ndcg@10:112.34529
[16:15:01] INFO: ../src/tree/updater_prune.cc:101: tree pruning end, 130 extra nodes, 0 pruned nodes, max_depth=10
[2]	validation_0-ndcg@10:1.69752
[16:15:01] INFO: ../src/tree/updater_prune.cc:101: tree pruning end, 92 extra nodes, 0 pruned nodes, max_depth=10
@trivialfis trivialfis added the LTR Learning to rank label Mar 30, 2021
@lcschv
Copy link

lcschv commented Mar 3, 2023

Hey, I have faced the same issue. After some investigation I think the issue is that the computation of the ideal ranking is wrong.

For instance, let a query q, and a set of relevant documents for this query RelevantDocs = {d1, d2, d3}. Lets assume that the relevance grades of each document is the following: {d1=2, d2=1, d3=1}.

In order to compute the ideal ranking, we have to sort the Relevant Documents in the best way possible (based only on the set of relevant documents).

So, in this case, the ideal ranking is
R_Ideal = {d1, d2, d3} or because of ties {d1, d3, d3}.

Now, imagine we have a retrieval model (say BM25) which returns a ranking R = {d3, d2, d1}.
In order to compute ndcg, we have to compute first the DCG(R_Ideal) then DCG (R).
Assuming a discount function as the position we can compute: rel / rank.

DCG(R_Ideal)  = (2/1) + (1/2) + (1/3) = 2.83
DCG (R) = (1/1) +(1/2) + (2/3) = 2.16

Then we can compute

NDCG = DCG(R) / DCG (R_Ideal) which leads to:
NDCG = 0.7632

The issue that is happening during the computation of the ndcg@k is probably due to wrong calculations of the ideal ranking. Basically, imagine now we want to compute ndcg@2. The results should be computed as follows:

DCG@2(R_Ideal)  = (2/1) + (1/2) = 2.5
DCG@2(R) = (1/1) +(1/2) = 1.5
NDCG@2 = 0.60`

However, I think that the computation of the ideal ranking is being computed as a sort of the relevant documents in ranking R. So basically, R_Ideal would look like this if we cut-off at k=2. R_Ideal = {d3, d2}

DCG@2(R_Ideal)  = (1/1) + (1/2) = 1.5
DCG@2(R) = (1/1) +(1/2) = 1.5
NDCG@2 = 1.0`

In addition, it looks like it is using the initial ranking (baseline) to extract this top-k of relevant documents. So basically, if you after resorting using LTR got a rank like: R_LTR = {d3, d2}. Since the ideal ranking isbeing computed as DCG@2(R_Ideal) = (1/1) + (1/2) = 1.5, when you compute DCG (R_LTR), you would have something like this:

DCG@2(R_Ideal)  = (1/1) + (1/2) = 1.5
DCG@2(R_LTR) = (2/1) +(1/2) = 2.5
NDCG@2 = 1.66`

which would result in the wrong result of NDCG@2.

This is what I think that is happening.

Did someone else faced this same issue and have a fix?

@trivialfis
Copy link
Member

Hi, thank you for sharing a detailed analysis. For the original issue posted here, the issue is just overflow. The exponential gain 2^{rel - 1}, using a 32-bit integer to calculate limits the maximum relevance label to 31. The existing ltr implementation doesn't have a check for input label, hence a weird output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTR Learning to rank
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants