Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI v1.0 breaks the MT-bench evaluation #2657

Closed
robertgshaw2-neuralmagic opened this issue Nov 8, 2023 · 3 comments · Fixed by #2658
Closed

OpenAI v1.0 breaks the MT-bench evaluation #2657

robertgshaw2-neuralmagic opened this issue Nov 8, 2023 · 3 comments · Fixed by #2658

Comments

@robertgshaw2-neuralmagic

Hello - it appears that the new openai client library released this week breaks the MT-bench evaluation script

(test-env) rshaw@gpuprod:~/FastChat/fastchat/llm_judge$ python gen_judgment.py --model-list claude-v1 gpt-3.5-turbo --parallel 1
Stats:
{
    "bench_name": "mt_bench",
    "mode": "single",
    "judge": "gpt-4",
    "baseline": null,
    "model_list": [
        "claude-v1",
        "gpt-3.5-turbo"
    ],
    "total_num_questions": 80,
    "total_num_matches": 320,
    "output_path": "data/mt_bench/model_judgment/gpt-4_single.jsonl"
}
Press Enter to confirm...
  0%|                                                                                                                                                                                                                                                                       | 0/320 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 411, in chat_compeletion_openai
    response = openai.ChatCompletion.create(
AttributeError: module 'openai' has no attribute 'ChatCompletion'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rshaw/FastChat/fastchat/llm_judge/gen_judgment.py", line 309, in <module>
    play_a_match_func(match, output_file=output_file)
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 199, in play_a_match_single
    score, user_prompt, judgment = run_judge_single(
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 163, in run_judge_single
    judgment = chat_compeletion_openai(model, conv, temperature=0, max_tokens=2048)
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 420, in chat_compeletion_openai
    except openai.error.OpenAIError as e:
AttributeError: module 'openai' has no attribute 'error'

Downgrading to the last version prior to 1.0 seems to fix the issue

pip install openai==0.28.1
@robertgshaw2-neuralmagic
Copy link
Author

you may want to update the dependencies for llm-judge

https://github.com/lm-sys/FastChat/blob/main/pyproject.toml#L25

Just figured I would let you know!

Thanks for the great library

@infwinston infwinston mentioned this issue Nov 8, 2023
3 tasks
@infwinston
Copy link
Member

infwinston commented Nov 8, 2023

@rsnm2 thanks for reporting this issue! does this change #2658 look good to you?

@robertgshaw2-neuralmagic
Copy link
Author

verified this change solves my issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants