性能评测 #2

Shannen3206 · 2024-09-04T11:50:26Z

请问论文中Table 7和Table 8中TinyLlama的性能是用什么工具测的？我用opencompass评测，模型是TinyLlama-1.1B-intermediate-step-1431k-3T，测出来MMLU、CEVAL、CMMLU和ARC性能差不多，但是HumanEval 1.22，MBPP 1.60，GaoKao测了MathQA CHE和BIO的问答题，性能都是0？

survivi · 2024-10-01T03:01:55Z

LLMBox.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

性能评测 #2

性能评测 #2

Shannen3206 commented Sep 4, 2024

survivi commented Oct 1, 2024

性能评测 #2

性能评测 #2

Comments

Shannen3206 commented Sep 4, 2024

survivi commented Oct 1, 2024