matmul

Sandbox for matrix multiplication, just having fun with fundamental algorithms.

For the CPU version (.cpp), you need to have CBLAS installed (if you want to compare against cblas_dgemm(...)). IJK, IJK (with transposed B), and IKJ approaches are implemented
The GPU version (.cu), uses shared memory to ensure data reuse, and the tiling approach. Of course the best option is to call cublasDgemm(...) instead of reinventing the wheel

NOTE: In both implementations I use one-dimensional dynamically allocated arrays.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
matmul.cpp		matmul.cpp
matmul.cu		matmul.cu

Provide feedback