This is the repository for TaxoGlimpse, a benchmark evaluating LLMs' performance on taxonomies.
In order to deploy the LLMs and install requirements for the data processing scripts, we need to create the following environments: llama (for Llama-2s), vicuna-self (for vicunas), falcon (for falcons), flan-t5 (for flan-t5s), LLM-probing (for GPTs, Claude-3 and data processing), mixtral (for mistral and mixtral), llama3 (for Llama-3s), and llms4ol (for LLMs4OL). We now introduce how to create these environments with Anaconda.
$ conda create -n llama python=3.10
$ cd LLMs/llama
$ pip install -e .
$ conda create -n vicuna-self python=3.10
$ cd LLMs/vicuna/FastChat
$ pip3 install -e ".[model_worker,webui]"
$ conda create -n falcon python=3.10
$ cd requirements
$ pip install -r falcon.txt
$ conda create -n flan-t5 python=3.10
$ cd requirements
$ pip install -r flan-t5.txt
$ conda create -n LLM-probing python=3.10
$ cd requirements
$ pip install -r LLM-probing.txt
$ conda create -n mixtral python=3.8
$ cd requirements
$ pip install -r mixtral.txt
$ conda create -n llama3 python=3.10
$ cd requirements
$ pip install -r llama3.txt
$ conda create -n llms4ol python=3.9
$ cd requirements
$ pip install -r llms4ol.txt
The data collection process of the taxonomies is as follows:
We crawled the eBay taxonomy from link. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/shopping/.
We obtained the Google Product Category taxonomy from link and crawled the product instances to perform the additional instance typing experiment. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/shopping/.
We crawled Amazon's Product Category and the product instances from the browsenodes.com. We provide the detailed scripts, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/shopping/.
We downloaded the Schema.org data from link. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/general/.
The ACM-CCS taxonomy was obtained from the following link. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/academic/.
We download the GeoNames data from link. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/geography/.
The Glottolog taxonomy (Version 4.8) was obtained from the following link. We provide the data used by us in the README.md in TaxoGlimpse/LLM-taxonomy/language/.
We accessed the ICD-10-CM taxonomy through the simple-icd-10 package (version 2.0.1), for detailed usage, please refer to the github repo of simple-icd-10. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/medical/.
We download the OAE taxonomy from the link. For details, please refer to the README.md in TaxoGlimpse/LLM-taxonomy/OAE/.
The NCBI taxonomy was downloaded through the official download page. We provide the 2023 Sept version as discussed in the README.md in TaxoGlimpse/LLM-taxonomy/biology/.
We introduce how to deploy the LLMs used in our benchmark.
Please refer to steps 3 to 5 of the Quick Start in README.md file to download the model weights (7B-chat, 13B-chat, and 70B-chat).
Please refer to the Model Weights Section in README.md of Vicuna to download the weights for (lmsys/vicuna-7b-v1.5, lmsys/vicuna-13b-v1.5, and lmsys/vicuna-33b-v1.3).
Use the following Python code to deploy the LLMs:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_3b = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xl").cuda() # 3B
tokenizer_3b = AutoTokenizer.from_pretrained("google/flan-t5-xl")
model_11b = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xxl").cuda() # 11B
tokenizer_11b = AutoTokenizer.from_pretrained("google/flan-t5-xxl")
Use the following Python code to deploy the LLMs:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_7b = "tiiuae/falcon-7b-instruct" # 7B
tokenizer_7b = AutoTokenizer.from_pretrained(model_7b)
model_40b = "tiiuae/falcon-40b-instruct" # 40B
tokenizer_40b = AutoTokenizer.from_pretrained(model_40b)
Use the following Python code to deploy the LLMs:
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint = 'https://hkust.azure-api.net',
api_key = 'xxxxx',
api_version = "2023-05-15"
)
def generateResponse(prompt, gpt_name):
messages = [{"role": "user","content": prompt}]
response = client.chat.completions.create(
model=gpt_name,
temperature=0,
messages=messages
)
return response.choices[0].message.content
generateResponse("example", "gpt-35-turbo")
from openai import OpenAI
client = OpenAI(
base_url = 'xxxx',
api_key = 'xxxx'
)
def generateResponse(prompt, gpt_name):
messages = [{"role": "user","content": prompt}]
response = client.chat.completions.create(
model=gpt_name,
temperature=0,
messages=messages
)
return response.choices[0].message.content
generateResponse("example", "gpt-4-1106-preview")
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "XXX"
def generateResponse(prompt):
messages = [{"role": "user","content": prompt['user']}]
response = completion(model="claude-3-opus-20240229", messages=messages, api_base="https://api.openai-proxy.org/anthropic/v1/messages", temperature=0)
return response['choices'][0]['message']['content']
generateResponse("example")
Please refer to README.md for a quick start.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_mistral = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2").cuda()
tokenizer_mistral = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model_mixtral = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", device_map="auto")
tokenizer_mixtral = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
You can use the same code for Flan-T5-3B to deploy the model, by modifying the model weights path.
You can generate the question pools from scratch by referring to the README page for each domain under the sub-folders in TaxoGlimpse/LLM-taxonomy.
To conduct the experiments, please follow these steps.
We introduce the steps for Llama-7B, Llama-13B, and Llama-70B respectively, including the main experiments and the instance typing experiment.
$ conda activate llama
$ cd TaxoGlimpse/LLMs/llama/
$ torchrun --nproc_per_node 1 tf-variants.py >> ../logs/llama-2-7b-chat/tf-log.txt
$ torchrun --nproc_per_node 1 mcq-variants.py >> ../logs/llama-2-7b-chat/mcq-log.txt
$ ### instance typing experiment
$ torchrun --nproc_per_node 1 instance.py >> ../logs/llama-2-7b-chat/instance-log.txt
$ conda activate llama
$ cd TaxoGlimpse/LLMs/llama/
$ torchrun --nproc_per_node 2 tf-variants.py >> ../logs/llama-2-13b-chat/tf-log.txt
$ torchrun --nproc_per_node 2 mcq-variants.py >> ../logs/llama-2-13b-chat/mcq-log.txt
$ ### instance typing experiment
$ torchrun --nproc_per_node 2 instance.py >> ../logs/llama-2-13b-chat/instance-log.txt
$ conda activate llama
$ cd TaxoGlimpse/LLMs/llama/
$ torchrun --nproc_per_node 8 tf-variants.py >> ../logs/llama-2-70b-chat/tf-log.txt
$ torchrun --nproc_per_node 8 mcq-variants.py >> ../logs/llama-2-70b-chat/mcq-log.txt
$ ### instance typing experiment
$ torchrun --nproc_per_node 8 instance.py >> ../logs/llama-2-70b-chat/instance.txt
We introduce the steps for Vicuna-7B, Vicuna-13B, and Vicuna-33B respectively, including the main experiments and the instance typing experiment.
$ conda activate vicuna-self
$ cd TaxoGlimpse/LLMs/vicuna/FastChat/
$ ### main experiments
$ python3 -m fastchat.serve.tf-variants --model-path lmsys/vicuna-7b-v1.5 >> ../logs/vicuna-7b/tf-log.txt
$ python3 -m fastchat.serve.mcq-variants --model-path lmsys/vicuna-7b-v1.5 >> ../logs/vicuna-7b/mcq-log.txt
$ ### instance typing experiments
$ python3 -m fastchat.serve.instance --model-path lmsys/vicuna-7b-v1.5 >> ../logs/vicuna-7b/instance-log.txt
$ conda activate vicuna-self
$ cd TaxoGlimpse/LLMs/vicuna/FastChat/
$ ### main experiments
$ python3 -m fastchat.serve.tf-variants --model-path lmsys/vicuna-13b-v1.5 >> ../logs/vicuna-13b/tf-log.txt
$ python3 -m fastchat.serve.mcq-variants --model-path lmsys/vicuna-13b-v1.5 >> ../logs/vicuna-13b/mcq-log.txt
$ ### instance typing experiments
$ python3 -m fastchat.serve.instance --model-path lmsys/vicuna-13b-v1.5 >> ../logs/vicuna-13b/instance-log.txt
$ conda activate vicuna-self
$ cd TaxoGlimpse/LLMs/vicuna/FastChat/
$ ### main experiments
$ python3 -m fastchat.serve.tf-variants --model-path lmsys/vicuna-33b-v1.3 >> ../logs/vicuna-33b/tf-log.txt
$ python3 -m fastchat.serve.mcq-variants --model-path lmsys/vicuna-33b-v1.3 >> ../logs/vicuna-33b/mcq-log.txt
$ ### instance typing experiments
$ python3 -m fastchat.serve.instance --model-path lmsys/vicuna-33b-v1.3 >> ../logs/vicuna-33b/instance-log.txt
We introduce the steps for Flan-T5-3B and Flan-T5-11B respectively, including the main experiments and the instance typing experiment.
$ conda activate flan-t5
$ cd TaxoGlimpse/LLMs/flan-t5
$ ### main experiments
$ python tf-variants.py >> ../logs/flan-t5/tf-log.txt
$ python mcq-variants.py >> ../logs/flan-t5/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/flan-t5/instance-log.txt
We introduce the steps for Falcon-7B and Falcon-40B respectively, including the main experiments and the instance typing experiment.
$ conda activate falcon
$ cd TaxoGlimpse/LLMs/falcon/7B
$ ### main experiments
$ python tf-variants.py >> ../logs/falcon-7b/tf-log.txt
$ python mcq-variants.py >> ../logs/falcon-7b/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/falcon-7b/instance-log.txt
$ conda activate falcon
$ cd TaxoGlimpse/LLMs/falcon/40B
$ ### main experiments
$ python tf-variants.py >> ../logs/falcon-40b/tf-log.txt
$ python mcq-variants.py >> ../logs/falcon-40b/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/falcon-40b/instance-log.txt
We introduce the steps for GPT-3.5 and GPT-4 respectively, including the main experiments and the instance typing experiment.
Please input your Azure APIs or OpenAI APIs at the beginning of the Python files.
$ conda activate LLM-probing
$ cd TaxoGlimpse/LLMs/GPT3.5
$ ### main experiments
$ python tf-variants.py >> ../logs/gpt-3.5/tf-log.txt
$ python mcq-variants.py >> ../logs/gpt-3.5/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/gpt-3.5/instance-log.txt
$ conda activate LLM-probing
$ cd TaxoGlimpse/LLMs/GPT4
$ ### main experiments
$ python tf-variants.py >> ../logs/gpt-4/tf-log.txt
$ python mcq-variants.py >> ../logs/gpt-4/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/gpt-4/instance-log.txt
We introduce the steps for Claude-3, including the main experiments and the instance typing experiment.
Please input your Anthropic APIs at the beginning of the Python files.
$ conda activate LLM-probing
$ cd TaxoGlimpse/LLMs/Claude
$ ### main experiments
$ python tf-variants.py >> ../logs/Claude/tf-log.txt
$ python mcq-variants.py >> ../logs/Claude/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/Claude/instance-log.txt
We introduce the steps for Llama-3-8B and Llama-3-70B respectively, including the main experiments and the instance typing experiment.
$ conda activate llama3
$ cd TaxoGlimpse/LLMs/llama3/
$ torchrun --nproc_per_node 1 tf-variants.py >> ../logs/llama-3-8b/tf-log.txt
$ torchrun --nproc_per_node 1 mcq-variants.py >> ../logs/llama-3-8b/mcq-log.txt
$ ### instance typing experiment
$ torchrun --nproc_per_node 1 instance.py >> ../logs/llama-3-8b/instance-log.txt
$ conda activate llama3
$ cd TaxoGlimpse/LLMs/llama3/
$ torchrun --nproc_per_node 8 tf-variants.py >> ../logs/llama-3-70b/tf-log.txt
$ torchrun --nproc_per_node 8 mcq-variants.py >> ../logs/llama-3-70b/mcq-log.txt
$ ### instance typing experiment
$ torchrun --nproc_per_node 8 instance.py >> ../logs/llama-3-70b/instance.txt
We introduce the steps for Mistral and Mixtral respectively, including the main experiments and the instance typing experiment.
$ conda activate mixtral
$ cd TaxoGlimpse/LLMs/Mistral-Mixtral/
$ python tf-variants.py >> ../logs/mistral/tf-log.txt
$ python mcq-variants.py >> ../logs/mistral/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/mistral/instance-log.txt
$ conda activate mixtral
$ cd TaxoGlimpse/LLMs/Mistral-Mixtral/
$ python tf-variants.py >> ../logs/mixtral/tf-log.txt
$ python mcq-variants.py >> ../logs/mixtral/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/mixtral/instance-log.txt
We introduce the steps for LLMs4OL, including the main experiments and the instance typing experiment.
$ conda activate llms4ol
$ cd TaxoGlimpse/LLMs/LLMs4OL/tuning
$ ### instruction tuning for main experiments
$ python3 trainer.py
$ ### instruction tuning for instance typing experiments
$ python3 trainer-instance.py
$ cd TaxoGlimpse/LLMs/LLMs4OL/taxoglimpse
$ ### main experiments
$ python tf-variants.py >> ../logs/llms4ol/tf-log.txt
$ python mcq-variants.py >> ../logs/llms4ol/mcq-log.txt
$ ### instance typing experiments
$ python instance.py >> ../logs/llms4ol/instance-log.txt