-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected space character #2346
Comments
Hi! This is because we default to |
Since at the moment I am mostly running leaderboard tasks, I have measured what the impact is from this subtle change with " " in front of the target answer. Here are results: Without the space, the scores now perfectly match with HF Leaderboard scores (https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Notice that with space, 70B model is almost as bad as the 8B one, which definitely seems unexpected. The only change I made was to add |
I can confirm that we've added in our fork (that we're using to run evals of the leaderboard, you can find the command in our doc) the fact that delimiter is always None for chat template tasks |
Hi,
While running
leaderboard_mmlu_pro
evals I've noticed an unexpected space character. Here is an example request:This is a 5-shot example, so looking at the first shot in
arguments
, the correct answer is formatted as:More specifically, notice that the correct answer is presented as:
<|end_header_id|>\n\nA<|eot_id|>
(no space beforeA
).Unfortunately, contrary to few-shot examples, the answer of the actual question has a space character:
...<|end_header_id|>\n\n", ' I')
.Before trying to do down the rabbit hole to find where this diff is coming from, I wanted to reach out here in case you are already familiar with this?
My guess is that this is probably coming from the infamous
add_prefix_space
"feature" of HF-tokenizers and the fact that answers from few-shot samples are tokenized as part of a larger sequence, whereas the answer of the actual question is tokenized on its own as a single character.The text was updated successfully, but these errors were encountered: