OpenAI v1.0 breaks the MT-bench evaluation #2657

robertgshaw2-neuralmagic · 2023-11-08T23:04:52Z

Hello - it appears that the new openai client library released this week breaks the MT-bench evaluation script

(test-env) rshaw@gpuprod:~/FastChat/fastchat/llm_judge$ python gen_judgment.py --model-list claude-v1 gpt-3.5-turbo --parallel 1
Stats:
{
    "bench_name": "mt_bench",
    "mode": "single",
    "judge": "gpt-4",
    "baseline": null,
    "model_list": [
        "claude-v1",
        "gpt-3.5-turbo"
    ],
    "total_num_questions": 80,
    "total_num_matches": 320,
    "output_path": "data/mt_bench/model_judgment/gpt-4_single.jsonl"
}
Press Enter to confirm...
  0%|                                                                                                                                                                                                                                                                       | 0/320 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 411, in chat_compeletion_openai
    response = openai.ChatCompletion.create(
AttributeError: module 'openai' has no attribute 'ChatCompletion'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rshaw/FastChat/fastchat/llm_judge/gen_judgment.py", line 309, in <module>
    play_a_match_func(match, output_file=output_file)
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 199, in play_a_match_single
    score, user_prompt, judgment = run_judge_single(
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 163, in run_judge_single
    judgment = chat_compeletion_openai(model, conv, temperature=0, max_tokens=2048)
  File "/home/rshaw/FastChat/fastchat/llm_judge/common.py", line 420, in chat_compeletion_openai
    except openai.error.OpenAIError as e:
AttributeError: module 'openai' has no attribute 'error'

Downgrading to the last version prior to 1.0 seems to fix the issue

pip install openai==0.28.1

robertgshaw2-neuralmagic · 2023-11-08T23:06:09Z

you may want to update the dependencies for llm-judge

https://github.com/lm-sys/FastChat/blob/main/pyproject.toml#L25

Just figured I would let you know!

Thanks for the great library

infwinston · 2023-11-08T23:07:36Z

@rsnm2 thanks for reporting this issue! does this change #2658 look good to you?

robertgshaw2-neuralmagic · 2023-11-08T23:11:38Z

verified this change solves my issue

infwinston mentioned this issue Nov 8, 2023

Pin openai version < 1 #2658

Merged

3 tasks

infwinston closed this as completed in #2658 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI v1.0 breaks the MT-bench evaluation #2657

OpenAI v1.0 breaks the MT-bench evaluation #2657

robertgshaw2-neuralmagic commented Nov 8, 2023

robertgshaw2-neuralmagic commented Nov 8, 2023

infwinston commented Nov 8, 2023 •

edited

Loading

robertgshaw2-neuralmagic commented Nov 8, 2023

OpenAI v1.0 breaks the MT-bench evaluation #2657

OpenAI v1.0 breaks the MT-bench evaluation #2657

Comments

robertgshaw2-neuralmagic commented Nov 8, 2023

robertgshaw2-neuralmagic commented Nov 8, 2023

infwinston commented Nov 8, 2023 • edited Loading

robertgshaw2-neuralmagic commented Nov 8, 2023

infwinston commented Nov 8, 2023 •

edited

Loading