-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add few shot and multiple choice to ICL evaluation #1876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock since I will be gone for the next week, but left a bunch of comments.
Same as #1879, please ensure that new tests are not slower than 5s on CPU/GPU! |
afc977a
to
e456ff1
Compare
48bf8e2
to
9eebf98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I added some comments about eos
padding and some (maybe) superfluous functions. @bmosaicml I think it would also be good to get 1 more NLP team reveiwer who is more familiar with eval to look over the tests... I am not an expert here so I could have missed something.
d44cc3f
to
ce4ab20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, maybe would be good to have another NLP person have a glance. Also trusting that you and abhi worked out that attention mask thing, I didn't look at that. Also, I'm not sure why the alibi tests are failing, I can have a look at some point if you can't figure it out...the FSDP one is being fixed and you can ignore.
ce4ab20
to
ffa007e
Compare
6fec21e
to
b30d7ae
Compare
b30d7ae
to
3001f88
Compare
3001f88
to
2f006e6
Compare
2f006e6
to
014ae26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once you resolve the last open conversations. Could you please also put in the PR description that numbers that you get from a manual test with a pretrained model + the full dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
864cfcf
to
799daba
Compare
799daba
to
f3b4317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small nit: can we change from JSONL to JSON to clearly indicate file size limit (which applies bc we use Github) nvm I misunderstood something
A few small nits... i'll stamp once we figure out release process
1d07ece
to
bdcfaf4
Compare
What does this PR do?
This PR extends the existing ICL-LM evaluator framework to support few shot and multiple choice (e.g. PIQA)
I create a custom multiple choice in-context learning data loader, a special NLP metric of ICL-multiple choice, and extend the existing ICL-LM data loader to support few shot prompting.
Manually evaluation GPT Neo confirms Lambada 57.23% and PIQA 71%
What issue(s) does this change relate to?
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)