-
"ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs". Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu. Proceedings of the 37th IEEE International Parallel & Distributed Processing Symposium (Best Paper), May 2023.
-
"Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU". Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, John D. Owens. arXiv, January 2023.
-
"GPU Load Balancing". Muhammad Osama. Doctoral dissertation, University of California, Davis, December 2022.
-
"Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production". Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla. Proceedings of the Third Workshop on Simple and Efficient Natural Language Processing, December 2022.
-
"Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance". Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu. Proceedings of the 5th MLSys Conference, August 2022.
-
"Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance". Hiroyuki Ootomo, Rio Yokota. International Journal of High Performance Computing, March 2022.
-
"Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads". Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi. Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, February 2022.
-
"Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs". Jack Kosaian, K. V. Rashmi. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2021.
-
"Real-time Neural Radiance Caching for Path Tracing". Thomas Muller, Fabrice Rousselle, Jan Novak, Alex Keller. ACM Trans. Graph., August 2021.
-
"Scalable Knowledge Graph Analytics at 136 Petaflop/s". Ramakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert Patton, Richard Vuduc, Thomas Potok. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020.
-
"Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity ". Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020.
-
"Strassen's Algorithm Reloaded on GPUs". Jianyu Huang, Chenhan D. Yu, Robert A. van de Geijn. ACM Transactions on Mathematical Software, March 2020.