Fix BLEURT evaluation errors #316

chuandudx · 2024-09-16T22:59:32Z

These changes address the issues described in: #315

I made the code changes such that it built on the BERTScore changes (#311) that haven't been merged yet, so we see those changes here. Please let me know if there is preference on removing those from this PR. Thank you!

…rt_scorer.

…val into generative_metric_error

…d types. Detailed issue description: huggingface#315

…rt-fix

NathanHB

Thanks ! Only a few nits to fix and we will be good

NathanHB · 2024-09-17T14:34:44Z

src/lighteval/metrics/metrics.py

@@ -123,12 +123,14 @@ class Metrics(Enum):
        corpus_level_fn=np.mean,
        higher_is_better=True,
    )
+    def compute_mean(x):


Use a lambda function inside the task def

Thanks for the review! Would you mind taking a look at the errors described here in items 3 and 4: #315 - I was having trouble making the lambda function work and would appreciate some advice on how to do it!

Sure ! Can you tell me what task you are running so that I can replicate ? :)

Thanks so much! https://github.com/chuandudx/lighteval/blob/replicate_bleurt/community_tasks/replicate_bleurt_error_eval.py
Could you try on this example task?

When I tried to run this from lighteval I ran into the formatted_doc error again), and it's also happening for bert_score when it hadn't occurred before.

TypeError: BLEURT.compute() got an unexpected keyword argument 'formatted_doc'

I thought this was fixed by adding **kwargs in the compute function, but I wasn't able to identify what had changed. Do you have insights into why this may be if there was any other backend updates/incompatibilities that - I was unfortunately unable to get to the bottom of this yet.

When I ran it from my repo, it did work, so please let me know if this works for you. Thanks!

I just took a look this morning and your fork should work. You only need to remove this function and only call np.mean

Thank you Nathan! I just implemented this change :)

src/lighteval/metrics/imports/bert_scorer.py

chuandudx · 2024-09-24T03:44:41Z

Hi @NathanHB :) Just wanted to followup on this PR and welcome any feedback from the concerns we previously discussed. Thank you!

chuandudx · 2024-09-27T23:16:42Z

Merged the latest changes from main and fixed style error. Thank you!

clefourrier · 2024-10-03T08:03:50Z

Thanks a lot! FYI, we're working on some other important features atm, but we'll come back to this PR as soon as we can

NathanHB · 2024-10-07T13:03:19Z

src/lighteval/metrics/metrics.py

        category=MetricCategory.GENERATIVE,
        use_case=MetricUseCase.TRANSLATION,
-        corpus_level_fn=lambda x: np.mean(x.flatten()),  # flatten, then average
+        corpus_level_fn=compute_mean,


If you return the scores.item() you can simply use np.mean here

NathanHB · 2024-10-07T13:03:27Z

src/lighteval/metrics/metrics.py

    bleurt = SampleLevelMetric(
        metric_name="bleurt",
-        sample_level_fn=BLEURT.compute,
+        sample_level_fn=BLEURT().compute,


NathanHB · 2024-10-07T13:03:42Z

src/lighteval/metrics/metrics_sample.py

@@ -715,8 +715,7 @@ def compute(self, golds: list[str], predictions: list[str]) -> float:
        if len(predictions) == 1:
            predictions = predictions * len(golds)
        scores = self.model(**self.tokenizer(golds, predictions, return_tensors="pt"))[0].squeeze()
-
-        return scores
+        return scores.item()


This works ! Thanks for finding this bug

…etric.

chuandudx and others added 7 commits September 15, 2024 19:56

Allow kwargs for BERTScore compute function and remove variable in be…

cb073ac

…rt_scorer.

Allow kwargs for BERTScore compute function and remove variable in be…

8a2b66b

…rt_scorer.

Merge branch 'main' into generative_metric_error

229e7c4

Merge branch 'main' into generative_metric_error

b74d573

Merge branch 'generative_metric_error' of github.com:chuandudx/lighte…

36eff0b

…val into generative_metric_error

Keep recale_with_baseline as True by default for existing users.

f19023c

Fix BLEURT score computation errors due to incompatible input args an…

103e54a

…d types. Detailed issue description: huggingface#315

This was referenced Sep 16, 2024

[BUG] Errors when using BLEURT metric #315

Closed

Allow kwargs for BERTScore compute function and remove unused var #311

Merged

chuandudx added 4 commits September 16, 2024 21:20

Fix BLEURT score computation errors due to incompatible input args an…

8b1a466

…d types. Detailed issue description: huggingface#315

Merge branch 'bleurt-fix' of github.com:chuandudx/lighteval into bleu…

6d9e4a4

…rt-fix

Merge branch 'bleurt-fix' of github.com:chuandudx/lighteval into bleu…

76926ff

…rt-fix

Merge branch 'bleurt-fix' of github.com:chuandudx/lighteval into bleu…

3e52296

…rt-fix

NathanHB reviewed Sep 17, 2024

View reviewed changes

clefourrier and others added 4 commits September 27, 2024 11:27

Merge branch 'main' into bleurt-fix

9e75143

Merge branch 'main' into bleurt-fix

219169b

Merge remote-tracking branch 'origin/main' into bleurt-fix

53cbbea

Fix style error.

1351f35

chuandudx added 3 commits September 30, 2024 11:49

Merge branch 'main' into bleurt-fix

b332e6e

Merge branch 'main' into bleurt-fix

1a299fe

Merge branch 'main' into bleurt-fix

71b8fbf

Merge branch 'main' into bleurt-fix

a82f097

NathanHB reviewed Oct 7, 2024

View reviewed changes

NathanHB and others added 2 commits October 7, 2024 15:03

Merge branch 'main' into bleurt-fix

e631d04

Merge branch 'main' into bleurt-fix

18f1bfb

chuandudx and others added 3 commits October 14, 2024 17:38

Remove redundant compute_mean method for corpus_level_fn for bleurt m…

813bc2b

…etric.

Merge branch 'main' into bleurt-fix

84d1418

Merge branch 'main' into bleurt-fix

51ff71d

NathanHB approved these changes Oct 16, 2024

View reviewed changes

NathanHB merged commit d84c378 into huggingface:main Oct 16, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BLEURT evaluation errors #316

Fix BLEURT evaluation errors #316

chuandudx commented Sep 16, 2024

NathanHB left a comment

NathanHB Sep 17, 2024

chuandudx Sep 17, 2024

NathanHB Sep 24, 2024

chuandudx Sep 24, 2024 •

edited

Loading

NathanHB Oct 7, 2024

chuandudx Oct 15, 2024

chuandudx commented Sep 24, 2024

chuandudx commented Sep 27, 2024

clefourrier commented Oct 3, 2024

NathanHB Oct 7, 2024

NathanHB Oct 7, 2024

NathanHB Oct 7, 2024

Fix BLEURT evaluation errors #316

Fix BLEURT evaluation errors #316

Conversation

chuandudx commented Sep 16, 2024

NathanHB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuandudx Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuandudx commented Sep 24, 2024

chuandudx commented Sep 27, 2024

clefourrier commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuandudx Sep 24, 2024 •

edited

Loading