Add few shot and multiple choice to ICL evaluation #1876

bmosaicml · 2023-01-10T18:01:09Z

What does this PR do?

This PR extends the existing ICL-LM evaluator framework to support few shot and multiple choice (e.g. PIQA)

I create a custom multiple choice in-context learning data loader, a special NLP metric of ICL-multiple choice, and extend the existing ICL-LM data loader to support few shot prompting.

Manually evaluation GPT Neo confirms Lambada 57.23% and PIQA 71%

What issue(s) does this change relate to?

Before submitting

[x ] Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
[x ] Did you update any related docs and document your change?
[x ] Did you update any related tests and add any new tests related to your change? (see testing)
[x ] Did you run the tests locally to make sure they pass?
[ x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

dakinggg

Approving to unblock since I will be gone for the next week, but left a bunch of comments.

composer/metrics/nlp.py

composer/datasets/in_context_learning_evaluation.py

tests/datasets/test_in_context_learning_datasets.py

composer/metrics/nlp.py

mvpatel2000 · 2023-01-11T07:08:01Z

Same as #1879, please ensure that new tests are not slower than 5s on CPU/GPU!

abhi-mosaic

Looking good! I added some comments about eos padding and some (maybe) superfluous functions. @bmosaicml I think it would also be good to get 1 more NLP team reveiwer who is more familiar with eval to look over the tests... I am not an expert here so I could have missed something.

composer/datasets/in_context_learning_evaluation.py

composer/trainer/trainer.py

composer/datasets/in_context_learning_evaluation.py

dakinggg

LGTM, maybe would be good to have another NLP person have a glance. Also trusting that you and abhi worked out that attention mask thing, I didn't look at that. Also, I'm not sure why the alibi tests are failing, I can have a look at some point if you can't figure it out...the FSDP one is being fixed and you can ignore.

composer/trainer/dist_strategy.py

composer/datasets/in_context_learning_evaluation.py

composer/models/huggingface.py

tests/datasets/test_in_context_learning_datasets.py

composer/datasets/in_context_learning_evaluation.py

composer/models/huggingface.py

composer/metrics/nlp.py

composer/models/huggingface.py

dakinggg

LGTM once you resolve the last open conversations. Could you please also put in the PR description that numbers that you get from a manual test with a pretrained model + the full dataset?

… logits case

bmosaicml

lgtm!

mvpatel2000

~~One more small nit: can we change from JSONL to JSON to clearly indicate file size limit (which applies bc we use Github)~~ nvm I misunderstood something

A few small nits... i'll stamp once we figure out release process

setup.py

composer/metrics/__init__.py

composer/metrics/nlp.py

composer/datasets/in_context_learning_evaluation.py

bmosaicml marked this pull request as ready for review January 10, 2023 18:03

bmosaicml requested review from a team, knighton and karan6181 as code owners January 10, 2023 18:03

bmosaicml requested review from abhi-mosaic and dakinggg January 10, 2023 18:03

dakinggg approved these changes Jan 11, 2023

View reviewed changes

bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from afc977a to e456ff1 Compare January 12, 2023 18:16

bmosaicml requested a review from alextrott16 January 12, 2023 19:06

bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from 48bf8e2 to 9eebf98 Compare January 16, 2023 19:48

abhi-mosaic reviewed Jan 17, 2023

View reviewed changes

bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from d44cc3f to ce4ab20 Compare January 24, 2023 18:36

bmosaicml requested review from abhi-mosaic and dakinggg January 24, 2023 18:37

dakinggg approved these changes Jan 25, 2023

View reviewed changes

dakinggg reviewed Jan 25, 2023

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

bmosaicml force-pushed the feature/fewshot_lambada branch from ce4ab20 to ffa007e Compare January 30, 2023 23:24

dakinggg reviewed Jan 31, 2023

View reviewed changes

composer/models/huggingface.py Show resolved Hide resolved

alextrott16 reviewed Jan 31, 2023

View reviewed changes

composer/metrics/nlp.py Outdated Show resolved Hide resolved

alextrott16 reviewed Jan 31, 2023

View reviewed changes

composer/metrics/nlp.py Outdated Show resolved Hide resolved

bmosaicml added 6 commits January 31, 2023 15:48

new branch

6b8fe91

unittest multi gpu

9fa367d

add testing for batch padding and idx sampling

27b93c0

update

325f2ae

change naming of file

4b87c1d

reimplement lambada

fe0ff27

bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from 6fec21e to b30d7ae Compare January 31, 2023 21:58

dakinggg reviewed Jan 31, 2023

View reviewed changes

composer/models/huggingface.py Outdated Show resolved Hide resolved

bmosaicml force-pushed the feature/fewshot_lambada branch from b30d7ae to 3001f88 Compare January 31, 2023 22:03

dakinggg reviewed Jan 31, 2023

View reviewed changes

composer/models/huggingface.py Show resolved Hide resolved

dakinggg reviewed Jan 31, 2023

View reviewed changes

composer/models/huggingface.py Outdated Show resolved Hide resolved

bmosaicml force-pushed the feature/fewshot_lambada branch from 3001f88 to 2f006e6 Compare January 31, 2023 23:18

fix broken tests

014ae26

bmosaicml force-pushed the feature/fewshot_lambada branch from 2f006e6 to 014ae26 Compare January 31, 2023 23:23

dakinggg reviewed Jan 31, 2023

View reviewed changes

composer/models/huggingface.py Outdated Show resolved Hide resolved

dakinggg approved these changes Feb 1, 2023

View reviewed changes

dakinggg added 4 commits January 31, 2023 17:38

Merge branch 'dev' into feature/fewshot_lambada

ce61a76

yapf

826060d

fix comment

7e8c4f5

move label shifting back where it was and dont store labels in the no…

a932bdd

… logits case

bmosaicml commented Feb 1, 2023

View reviewed changes

bmosaicml force-pushed the feature/fewshot_lambada branch from 864cfcf to 799daba Compare February 1, 2023 05:24

fix duplicated dep in merge;

f3b4317

bmosaicml force-pushed the feature/fewshot_lambada branch from 799daba to f3b4317 Compare February 1, 2023 05:29

mvpatel2000 reviewed Feb 1, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

composer/metrics/__init__.py Outdated Show resolved Hide resolved

mvpatel2000 self-requested a review February 1, 2023 19:30

mvpatel2000 approved these changes Feb 1, 2023

View reviewed changes

knighton reviewed Feb 1, 2023

View reviewed changes

composer/metrics/nlp.py Show resolved Hide resolved

knighton approved these changes Feb 1, 2023

View reviewed changes

fix nits

f842876

dakinggg reviewed Feb 1, 2023

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

fix nits

bdcfaf4

bmosaicml force-pushed the feature/fewshot_lambada branch from 1d07ece to bdcfaf4 Compare February 1, 2023 22:03

Merge branch 'dev' into feature/fewshot_lambada

b5fb4c7

bmosaicml merged commit d24be14 into mosaicml:dev Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add few shot and multiple choice to ICL evaluation #1876

Add few shot and multiple choice to ICL evaluation #1876

bmosaicml commented Jan 10, 2023 •

edited

Loading

dakinggg left a comment

mvpatel2000 commented Jan 11, 2023

abhi-mosaic left a comment

dakinggg left a comment •

edited

Loading

dakinggg left a comment

bmosaicml left a comment

mvpatel2000 left a comment •

edited

Loading

Add few shot and multiple choice to ICL evaluation #1876

Add few shot and multiple choice to ICL evaluation #1876

Conversation

bmosaicml commented Jan 10, 2023 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

dakinggg left a comment

Choose a reason for hiding this comment

mvpatel2000 commented Jan 11, 2023

abhi-mosaic left a comment

Choose a reason for hiding this comment

dakinggg left a comment • edited Loading

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

bmosaicml left a comment

Choose a reason for hiding this comment

mvpatel2000 left a comment • edited Loading

Choose a reason for hiding this comment

bmosaicml commented Jan 10, 2023 •

edited

Loading

dakinggg left a comment •

edited

Loading

mvpatel2000 left a comment •

edited

Loading