TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

Please follow setup instructions found in each model folder's README.md doc

Model	Hardware
LLaMa 3.1 70B	TT-QuietBox & TT-LoudBox
Mistral 7B	n150 and n300

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github/workflows		.github/workflows
benchmarking		benchmarking
docs		docs
evals		evals
locust		locust
scripts		scripts
tests		tests
tt-metal-llama3-70b		tt-metal-llama3-70b
tt-metal-mistral-7b		tt-metal-mistral-7b
vllm-tt-metal-llama3-70b		vllm-tt-metal-llama3-70b
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt