Repository for our ACM SIGGRAPH i3D'2024 paper: ShaderPerFormer: Platform-independent Context-aware Shader Performance Predictor.
Useful links:
BibTeX command for citation
author = {Liu, Zitan and Huang, Yikai and Liu, Ligang},
title = {ShaderPerFormer: Platform-independent Context-aware Shader Performance Predictor},
year = {2024},
issue_date = {May 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {7},
number = {1},
url = {},
doi = {10.1145/3651295},
abstract = {The ability to model and predict the execution time of GPU computations is crucial for real-time graphics application development and optimization. While there are many existing methodologies for graphics programmers to provide such estimates, those methods are often vendor-dependent, require the platforms to be tested, or fail to capture the contextual influences among shader instructions. To address this challenge, we propose ShaderPerFormer, a platform-independent, context-aware deep-learning approach to model GPU performance and provide end-to-end performance predictions on a per-shader basis. To provide more accurate predictions, our method contains a separate stage to gather platform-independent shader program trace information. We also provide a dataset consisting of a total of 54,667 fragment shader performance samples on 5 different platforms. Compared to the PILR and SH baseline methods, our approach reduces the average MAPE across five platforms by 8.26\% and 25.25\%, respectively.},
journal = {Proc. ACM Comput. Graph. Interact. Tech.},
month = {may},
articleno = {2},
numpages = {17},
keywords = {GPU, performance modeling, shader performance prediction}
- Upload model & intermediate files
- vkPredict notebook clean-up
- More instructions on vkToy
- Code tidy-up
contains the code for our paper.
contains the dataset collected for our paper.
contains models trained in our paper.
First, be sure to read our paper to know the bigger picture. Below are guides helpful for reproducing our paper and analyse your shaders.
The following might be a good start: pip install -r requirements.txt
OR the following
pip install accelerate==0.25.0 \
huggingface-hub==0.19.4 \
nvidia-cublas-cu12== \
nvidia-cuda-cupti-cu12==12.1.105 \
nvidia-cuda-nvrtc-cu12==12.1.105 \
nvidia-cuda-runtime-cu12==12.1.105 \
nvidia-cudnn-cu12== \
nvidia-cufft-cu12== \
nvidia-curand-cu12== \
nvidia-cusolver-cu12== \
nvidia-cusparse-cu12== \
nvidia-nccl-cu12==2.18.1 \
nvidia-nvjitlink-cu12==12.3.101 \
nvidia-nvtx-cu12==12.1.105 \
peewee==3.17.0 \
regex==2023.10.3 \
tokenizers==0.15.0 \
torch==2.1.1 \
torch-tb-profiler==0.4.3 \
torchinfo==1.8.0 \
transformers==4.35.2 \
triton==2.1.0 \
Please refer to code/vkExecute/ for more information.
If you want to reproduce our paper, you can choose to extract from our snapshot (link is available inside dataset/ on the shaders we gather from the website at around Feb. 2023.
Otherwise, you can make your own. Rough steps below:
- Register account on and apply for an api key
- Put the key into
cd toyDb && python --amend noop
After using this, you can check python imageonly-shaders
to see how many image only shaders are gathered. The toyDb provides further hints on the formats of Shadertoy website APIs.
You can measure the performance of the collections of shaders running on your machine by using the following command:
# An example command setting the iTime parameter to 7 and iFrame parameter to 420
# and using 10 number of trails (refer to our paper for what num_trials and num_cycles exactly means)
# Other parameters are specified in the default options.
python run --iTime 7 --iFrame 420 --comment YOU_CAN_ADDITIONALLY_PUT_COMMENT_HERE --num-trials 10 --save-images --database-file ${REPO_ROOT}/dataset/experiments.db 2> measurement_log.log
# This will run the "Instruction Tracing" stage for all runs in environment with ID=1
python trace --save-images --environment-id 1 --database-file ${REPO_ROOT}/dataset/experiments.db 2> trace_log.log
Locking the GPU frequency will help in measuring performance, see toyDb for more information.
You can also download our data (which is a SQLite database file experiments.db
) used in the paper. See dataset/ for more information.
cd code/ && pip install --editable .
requires functionalities insidetoyDb
to access theexperiments.db
file, and we had put toyDb undercode/vkPredict
before in our original repo. This is a nice walkaround to not changing our original code too much while alleviating the need to manually manipulate thePYTHONPATH
. -
Do exporting as in
.Fork your own for your new data. Also, do read the implementations of those filters before you edit.
You can also download our data (which is a series of .dat
files in dataset/intermediates/
) used in the paper. See dataset/ for more information.
See vkPredict for more info.
You can also download our model trained in the paper. See dataset/ for more information.
Refer to CustomTester.ipynb and for more info.
Refer to BaselineMethodsValTimeFiltered.ipynb for more info.
This includes Simple Heuristics and Per-Inst Linear Regression as is described in the paper.
See code/vkToy for more info.
contains some useful information on analyzing and experimenting.However, some attempts have failed and doesn't land into our eventual proposed method (e.g. simple augmentation, contrastive learning, MLM pretraining)
Also be aware of any potential path and import issues.
While we believe our method is generalizable, currently only fragment shaders with image pass only (=single pass) and uniform block input only (i.e. no texture read / write) inputs are supported in this work.
- Works by us: Licensed under MIT license.
- Third-party: Licenses may vary.
- Shadertoy: Shadertoy shaders have their respective licenses, for detail please refer to their own licenses.