Machine Learning, hardware-software co-design, GPU kernel development, ML performance modeling.
-
AMD,inc
- Austin,TX
- https://ramjana.github.io
Pinned Loading
-
Tensile
Tensile PublicForked from ROCm/Tensile
Stretching GPU performance for GEMMs and tensor contractions.
Python
-
fp8_quant_simulation
fp8_quant_simulation Publicanalytical model for fp8 quantization (power of exponent)
Jupyter Notebook
-
LLM-Inference-Modeling
LLM-Inference-Modeling Publicthroughput inference modeling for AMD GPU(s)
Jupyter Notebook
-
-
composable_kernel
composable_kernel PublicForked from ROCm/composable_kernel
Composable C++ Template abstractions for implementing Tensor contraction operators (GEMM, iGEMM).
C++
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.