-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using different models in evaluating mode-graded eval and in generating the completion #1393
Comments
I recently struggled to get this to work too so I can share what I found. This is currently implemented in the GitHub version of this repo (but not the one on PyPI that you get by installing it the library through any package manager, as these versions are many months out of date and have a version where gpt-3.5-turbo is hard-coded as the grader). Lines 29-32 in evals/elsuite/modelgraded/classify.py show you how this feature is implemented: the last completion_fn given is treated as the evaluation function. Completion functions in turn can be specified in a comma-separated string. The logic for this is at evals/cli/oaieval.py lines 142-145. Concretely, a string like "gpt-4,gpt-3.5-turbo" seems to work for me to get gpt-4 to be the completer and gpt-3.5-turbo the one grading the responses. However, be warned that there seems to be a slight bug where modelgraded eval execution can hang for a long time in a way that other evals don't (and seems unrelated to rate limits). |
I had opened a PR last week (#1418) where I address this issue but forgot to mention it here. |
Regarding #1418: A new PR is not necessary for setting the evaluating model (though the feature really should be documented), since the full relevant lines are:
If you pass in many (in a comma-separated list) into |
But wouldn't the task be run on the passed completion functions if doing
so?
Il giorno lun 27 nov 2023 alle ore 15:50 LRudL ***@***.***>
ha scritto:
… Regarding #1418 <#1418>: A new PR is
not necessary for setting the evaluating model (though the feature really
should be documented), since the full relevant lines
<https://github.com/openai/evals/blob/7400b0ee3934d64ff6efd9d4ec04be631625c014/evals/elsuite/modelgraded/classify.py#L29C1-L29C1>
are:
# treat last completion_fn as eval_completion_fn
self.eval_completion_fn = self.completion_fns[-1]
if len(self.completion_fns) > 1:
self.completion_fns = self.completion_fns[:-1]
If you pass in many (in a comma-separated list) into completion_fns, then
the last one will be treated as the evaluating model.
—
Reply to this email directly, view it on GitHub
<#1393 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIT3WHBJR4EJXJFD3HER233YGSZFHAVCNFSM6AAAAAA64JYRF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGEYDENZWGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If you want to run the eval with modelA, and run the grading with modelB, then you can pass in the string "modelA,modelB" as the name of the completer. |
Can anyone please help me on this #1564 |
Describe the feature or improvement you're requesting
build_eval.md says:
However, I can't find anywhere how to do this. Is this currently implemented?
Additional context
No response
The text was updated successfully, but these errors were encountered: