Introduction

This is the code repository corresponding to the paper "TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning"

Our work has been accepted by AAAI2025, and everyone is welcome to follow up.

here is an overview of our method

TreeEval naturally avoids the problem of test data leakage by discarding the fixed test set.

Install

Refer to the installation of FastChat

The model we use can be found in huggingface: Yi-34B-Chat Xwin-LM-13B-V0.1 vicuna-33b-v1.3 Mistral-7B-Instruct-v0.2 WizardLM-13B-V1.2

Run steps

start the server of fastchat
1. Modify log_dir in fastchat.sh
2. bash fastchat.sh
3. python3 -m fastchat.serve.openai_api_server --host localhost --port 23261 --controller-address http://localhost:23241
Configure the config.yaml file, copy the config.yaml file, and modify it to config_modelname.yaml
python main.py

Citation

if you find this useful for your work, please cite:

@article{li2024treeeval,
      title={TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning}, 
      author={Xiang Li and Yunshi Lan and Chao Yang},
      year={2024},
      eprint={2402.13125},
      archivePrefix={arXiv},
      journal={arXiv preprint arXiv:2402.13125},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
actors		actors
common		common
images		images
tot		tot
.DS_Store		.DS_Store
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
fastchat.sh		fastchat.sh
main.py		main.py
score.py		score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Install

Run steps

Citation

About

Releases

Packages

Languages

Ashura5/TreeEval

Folders and files

Latest commit

History

Repository files navigation

Introduction

Install

Run steps

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages