Few CUDA Kernels on V100. Mainly used to demonstrate optimization methods.
For minimal dependency requirement, use Makefile to build all executables.
// reduce operation
reduce/
// Scan operation
scan/
// Square matrix transpose
transpose/
// General matrix multiply C = A * B
sgemm/