Skip to content

TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning

Notifications You must be signed in to change notification settings

Ashura5/TreeEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is the code repository corresponding to the paper "TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning"

Our work has been accepted by AAAI2025, and everyone is welcome to follow up.

here is an overview of our method 图片说明

TreeEval naturally avoids the problem of test data leakage by discarding the fixed test set.

Install

Refer to the installation of FastChat

The model we use can be found in huggingface: Yi-34B-Chat Xwin-LM-13B-V0.1 vicuna-33b-v1.3 Mistral-7B-Instruct-v0.2 WizardLM-13B-V1.2

Run steps

  1. start the server of fastchat
    1. Modify log_dir in fastchat.sh
    2. bash fastchat.sh
    3. python3 -m fastchat.serve.openai_api_server --host localhost --port 23261 --controller-address http://localhost:23241
  2. Configure the config.yaml file, copy the config.yaml file, and modify it to config_modelname.yaml
  3. python main.py

Citation

if you find this useful for your work, please cite:

@article{li2024treeeval,
      title={TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning}, 
      author={Xiang Li and Yunshi Lan and Chao Yang},
      year={2024},
      eprint={2402.13125},
      archivePrefix={arXiv},
      journal={arXiv preprint arXiv:2402.13125},
      primaryClass={cs.CL}
}

About

TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published