Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ROUGEScore to evaluate hypotheses against multiple references. #667

Closed
stancld opened this issue Dec 7, 2021 · 8 comments · Fixed by #680
Closed

Enable ROUGEScore to evaluate hypotheses against multiple references. #667

stancld opened this issue Dec 7, 2021 · 8 comments · Fixed by #680
Labels
enhancement New feature or request topic: Text
Milestone

Comments

@stancld
Copy link
Contributor

stancld commented Dec 7, 2021

🚀 Feature

Enable ROUGEScore to evaluate hypotheses against multiple references.

Motivation

In the original paper, Lin (2004) proposes the evaluation of a hypothesis against the multiple references, and eventually, the maximum pairwise score is used.

Pitch

Enable ROUGEScore to evaluate hypotheses against multiple references as it is the case for other text metrics.

Alternatives

Leave it as it is.

Additional context

Ideal for #new-contributors

@stancld stancld added the enhancement New feature or request label Dec 7, 2021
@Borda Borda added this to the v0.7 milestone Dec 8, 2021
@ashutoshml
Copy link
Contributor

I can take a look at this

@ashutoshml
Copy link
Contributor

ashutoshml commented Dec 9, 2021

I started working on it. The requirements say: "maximum pairwise score is used". Depending on rouge_types - there are 1, 2, L, different highest values, and consequently precision, recall and fmeasure. Which rouge_types' highest value should be used? 1, 2, L and which metric precision, recall and fmeasure?

Also, should we have an avg. version also (instead of just maximum pairwise)? It can be passed as an argument during init.

I can take a look at this

@ashutoshml
Copy link
Contributor

@stancld I have written the code for maximum pairwise.
I wanted to know if we can quickly test just the test_rouge.py using pytest. make test seems to be taking a lot of time.
I'll run the full test once test_rouge.py works.

@stancld
Copy link
Contributor Author

stancld commented Dec 9, 2021

Hi @ashutoshml, thanks a lot for your effort! O:] I'll have a look tomorrow, but you can run:

pytest tests/text/test_rouge.py

in the project directory to run ROUGE test only :]

@ashutoshml
Copy link
Contributor

Currently, we have

def update(self, preds: Union[str, List[str]], targets: Union[str, List[str]]) -> None:

Should we convert it into

def update(self, preds: Union[str, List[str]], targets: Union[str, List[str], List[List[str]]]) -> None:

?

For handling cases where we have list of predictions and list of list of references for that ?

@stancld
Copy link
Contributor Author

stancld commented Dec 10, 2021

Hi @ashutoshml, yes I think we may start with something like this. Feel free to open a draft issue once ready and we'll have a proper look :]

@ashutoshml
Copy link
Contributor

@stancld During testing, we compare our score against the scores given by the rouge-score=0.0.4 package. It does not have a multiple-reference version. How do we write the test cases for it i.e., what would be the baseline?

@ashutoshml
Copy link
Contributor

@stancld Opened a draft pull-request. Kindly check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request topic: Text
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants