LLM Testing

This repo is mainly to explore the relationship between vram usage, quantization, model parameters and token generation speed.

Prerequisites

This repo uses huggingface transformers as well as bitsandbytes and accelerate for quantization, some models may require extra packages that may need to be installed when prompted pip install -r requirements.txt

Running the test

test_model.py to run

python test_model.py --model=<model card from HF> --quantization=<4 or 8 else 16> --token=<HF_TOKEN>

after the model runs, it runs a loop that allows users to write in custom prompts

Writing new prompts

prompt.json a simple json structure that stores prompts one may want to feed into the llm make sure the labels field corresponds with the prompts field

Reading outputs

out.txt model outputs, the first line is the tokens per second in order the next is the memory usage after running through all the prompts nvidia-smi shows gpu stats when the when it's idle

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
out.txt		out.txt
prompt.json		prompt.json
requirements.txt		requirements.txt
test-model.py		test-model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Testing

Prerequisites

Running the test

Writing new prompts

Reading outputs

About

Releases

Packages

Languages

zwongstjude/opensource_llm_testbench

Folders and files

Latest commit

History

Repository files navigation

LLM Testing

Prerequisites

Running the test

Writing new prompts

Reading outputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages