Skip to content
Tianqi Chen edited this page Apr 6, 2014 · 26 revisions

This is the documentation for mshadow: A Lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA.

Getting started

  • Read Tutorial to get started, understand data structures, and basic usages.
  • [Expression API](Expression API) gives a list of expressions supported in mshadow
  • Read rest part of this page for other information.
  • Example codes are in example

What (documents) to read when using mshadow

  • Basically, all the files ends in -inl.hpp, -inl.cuh are implementations, and can be ignored if only using mshadow
  • The files ends in .h are heavily commented with doxyen format, and can be used to generate the corresponding document
  • List of useful headers related to different type of functions. read comments directly, or doxygen generated documents, an online version of doxygen generated documents is available Here
  • Doxygen works well for normal APIs header listed above, except for expression related codes. They also comes with detail comments, but maybe not straight forward to understand, read [Expression API](Expression API) for documentation of expression.

Use and Customize mshadow

  • Use mshadow: mshadow is a purely template library, include #include "mshadow/tensor.h" to your code to use the lib
  • Example Makefile: checkout example/neuralnet/Makefile to see how mshadow is compiled
  • Package dependency: mshadow will need MKL or other CBLAS to do matrix multiplication. The package dependencies can be customized via macros in mashadow/tensor_base.h, examples:
    • To compile with CBLAS(ATLAS), add -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=0 to CFLAGS
    • To compile without CUDA, and only use cpu mode, add -DMSHADOW_USE_CUDA=0 to CFLAGS
    • To only use element-wise operations, and remove dependency on all packages, add -DMSHADOW_STAND_ALONE=1 to CFLAGS
  • SSE support: mshadow will use sse2 optimization for simple element-wise operations. However, due to incompatibility of nvcc and emmintrin.h, sse is only supported when not compiling with nvcc. To make use of sse and device invariant code:
    • Write a implementation in template say Learner<xpu>() in learner.hpp
    • Create learner.cu and learner.cpp, and include learner.hpp in both file, return Learner<cpu> in cpp file and Learner<gpu> in cu using some factory function.
    • Compile cpp file with g++ and cu file with nvcc, and link all together.
    • Example: CXXNET Project (checkout cxxnet_nnet .cpp .cu) use this way to create device invariant code using mshadow.
    • We should note that not using sse does not influence performance of most project, if the project's major computation is matrix multiplication, which translated to optimized MKL, or if you simply care about performance in GPU.
Clone this wiki locally