Benchmarking

Repository for AI model benchmarking on accelerator hardware.

Performance Table

Model	Input Size	Batch	Grayskull e75	Grayskull e150	Wormhole n150	Wormhole n300 (single-chip)	Wormhole n300 (dual-chip)	TT-LoudBox/TT-QuietBox (4 MMIO chip)	TT-LoudBox/TT-QuietBox (8 chip)
BERT-Large (sen/s)	384	64	81	99	118	103	x	x	x
T5-Large (tok/s)	64	1	25	30	75	68	x	x	x
FLAN-T5-Large (tok/s)	64	1	9	25	71	52	x	x	x
Whisper-Small (tok/s)	30s	1	3.4	3.7	16	10	x	x	x
Falcon-7B (tok/s)	128	32	x	x	76	77	x	x	x
SD v1-4 (s/img)	512x512	1	x	x	50	50	x	x	x
ResNet50 (img/s)	3x224x224	256	1106	1410	2891	1060	2000	3315	4711
VoVNet-V2 (img/s)	3x224x224	128	518	819	1603	1197	1931	2860	3294
MobileNetV1 (img/s)	3x224x224	128	2468	2924	3102	2338	2978	4334	4347
MobileNetV2 (img/s)	3x224x224	256	1141	1491	2721	2439	4332	4579	6800
MobileNetV3 (img/s)	3x224x224	64	1192	1741	1981	1670	2017	2695	1688
HRNet-V2 (img/s)	3x224x224	128	197	233	324	257	269	845	262
ViT-Base (img/s)	3x224x224	64	301	363	540	447	546	970	1311
DeiT-Base (img/s)	3x224x224	64	301	363	539	446	545	973	1317
YOLOv5-Small (img/s)	3x320x320	128	290	232	1190	1090	1435	x	x
OpenPose-2D (img/s)	3x224x224	64	828	1098	1252	1204	1805	1542	1438
U-Net (img/s)	3x256x256	48	222	268	490	344	547	455	x
Inception-v4 (img/s)	3x224x224	128	371	458	1061	1116	1810	2795	3162

Setup Instructions

Installation

First, create either a Python virtual environment with PyBuda installed or execute from a Docker container with PyBuda installed.

Installation instructions can be found at Install TT-BUDA.

Next, install the model requirements:

pip install -r requirements.txt --constraint constraints.txt

Installing on a System with a GPU

If your system contains a GPU device, you can use the requirements-cuda.txt file to install the correct dependencies.

To install packages, make sure your virtual environment is active.

To add additional CUDA packages

pip install -r requirements-cuda.txt

Environment Setup

Setup Access to HuggingFace Hub

To access the benchmarking datasets, follow these steps to set up your access to the HuggingFace Hub:

Create a HuggingFace Account:
- Visit Hugging Face and create an account if you haven't already.
Generate User Access Token:
- Follow the steps outlined in the HuggingFace Docs - Security Tokens to generate a User Access Token.
Install huggingface_hub Library:
- Install the huggingface_hub library by running:
```
pip install huggingface_hub
```
Login to HuggingFace CLI:
- Login to the HuggingFace CLI using your User Access Token:
```
huggingface-cli login
```
- Enter your User Access Token when prompted.
Validate Setup:
- Run the following command to verify your login status:
```
huggingface-cli whoami
```
- If your username is displayed, it means you are successfully logged in.
Dataset Access:
- Visit HuggingFace Datasets - ImageNet-1k and follow the instructions to grant access to the ImageNet-1k dataset.

Validation Steps

After completing the setup process, ensure that everything is working correctly:

Verify Hugging Face Hub Login:
- Run the following command to verify that you are logged in to the Hugging Face Hub:
```
huggingface-cli whoami
```
- If your username is displayed, it means you are successfully logged in.
Check Dataset Access:
- Visit the HuggingFace Datasets - ImageNet-1k page.
- Make sure you can view and access the dataset details without any permission errors.
Accept Dataset Access (If Required):
- If you encounter any permission errors while accessing the ImageNet-1k dataset, ensure that you follow the instructions provided on the dataset page to grant access.

Benchmarking Datasets Setup

To set up the three required datasets for running benchmarking tests within this repository, follow these steps for each dataset:

HuggingFace datasets: will download into ${HF_HOME}/datasets/ once HuggingFace Hub access is set up.

COCO Dataset: You can automatically download the COCO dataset from here. No login is required, and the dataset will be cached in ~/.cache/coco.

To download the COCO dataset, follow these steps:

# use another location for MLDATA_DIR if desired, below is default
# Create the `coco` directory inside the cache directory:
mkdir -p ~/.cache/mldata/coco

# Navigate to the `coco` directory:
cd ~/.cache/mldata/coco

# Create the `images` directory:
mkdir images
cd images

# Download the COCO validation images:
wget http://images.cocodataset.org/zips/val2017.zip

# Unzip the downloaded file:
unzip val2017.zip

# Move back to the `coco` directory:
cd ..

# Download the COCO train/val annotations:
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip

# Unzip the downloaded file:
unzip annotations_trainval2017.zip

LGG Segmentation Dataset: must be manually downloaded from https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation, this requires a Kaggle login. The commands below are to extract the downloaded archive into the correct location: ~/.cache/mldata/lgg_segmentation/kaggle_3m.

# use another location for MLDATA_DIR if desired, below is default;
# Download and move the downloaded archive and unzip within the lgg_segmentation folder.
mkdir -p ~/.cache/mldata/lgg_segmentation
cd ~/.cache/mldata/lgg_segmentation
# download and move the archive here then unzip
unzip archive.zip
# the dataset appears to have two copies that are equivalent, remove the extra one
rm -r lgg-mri-segmentation

Run Benchmarking

As part of the environment setup, you may need to add the root to PYTHONPATH:

export PYTHONPATH=.

Benchmarking Script

benchmark.py allows easy way to benchmark performance of a support model, while varying configurations and compiler options. The script will measure real end-to-end time on host, starting post-compile at first input pushed, and ending at last output received.

To specify the device to run on ("tt", "cpu", or "cuda"), include the -d argument flag.

The script optionally outputs a .json file with benchmark results and options used to run the test. If the file already exists, the results will be appended, allowing the user to run multiple benchmark back-to-back, like:

python benchmark.py -d tt -m bert -c base --task text_classification --save_output

or

python benchmark.py -d cuda -m bert -c base --task text_classification --save_output

To see which models and configurations are currently supported, run:

benchmark.py --list

Run Examples

You can find example commands for various conditions in the file:

run_benchmark_tt_perf for TT and run_benchmark_cuda GPU & CPU devices

Running On Multi-Device

To run the benchmarks on multichip or multicard systems,

n300 (multichip)
- Set env variable PYBUDA_N300_DATA_PARALLEL=1
TT-LoudBox / TT-QuietBox (multicard)
- set env variable PYBUDA_FORCE_THREADS=1 and set config --parallel_tti device_images/

Contributing

We are excited to move our development to the public, open-source domain. However, we are not adequately staffed to review contributions in an expedient and manageable time frame at this time. In the meantime, please review the contributor's guide for more information about contribution standards.

Communication

If you would like to formally propose a new feature, report a bug, or have issues with permissions, please file through GitHub issues.

Please access the Discord community forum for updates, tips, live troubleshooting support, and more!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
benchmark		benchmark
results		results
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md
benchmark.py		benchmark.py
check_copyright_config.yaml		check_copyright_config.yaml
constraints.txt		constraints.txt
pyproject.toml		pyproject.toml
requirements-cuda.txt		requirements-cuda.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_benchmark_cuda		run_benchmark_cuda
run_benchmark_tt_perf		run_benchmark_tt_perf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking

Performance Table

Setup Instructions

Installation

Installing on a System with a GPU

To add additional CUDA packages

Environment Setup

Setup Access to HuggingFace Hub

Validation Steps

Benchmarking Datasets Setup

Run Benchmarking

Benchmarking Script

Run Examples

Running On Multi-Device

Contributing

Communication

About

Releases

Packages

Contributors 5

Languages

License

tenstorrent/benchmarking

Folders and files

Latest commit

History

Repository files navigation

Benchmarking

Performance Table

Setup Instructions

Installation

Installing on a System with a GPU

To add additional CUDA packages

Environment Setup

Setup Access to HuggingFace Hub

Validation Steps

Benchmarking Datasets Setup

Run Benchmarking

Benchmarking Script

Run Examples

Running On Multi-Device

Contributing

Communication

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages