batchnorm specify channel axis and performance optimizations for batc…

…hnorm (#6411) * Add channel_axis to batch norm, performance improvements * rearrange tests a bit * rearrange tests a bit * CR changes * cpp package link issue * Fix: MSVC wants all parallel omp to be int * CR comments, expand legal negative axes * lint * lint * Fix download link (#6431) * Fix Download Button * Small fix * Add release note (#6434) * Fixing tutorials. (#6436) Most of the fixes should be self evident. For tutorial on pre-trained models, one of the images doesn't exist anymore so selected a new one. Long-term, we should put such images on web-data repo but alas, some other day. For Handwritten digit tutorial, we are missing couple of imports in the test_utils.py that was recently created. Note that: for pre-trained model tutorial, we get a softmax_label warning and the probability scores are not really probabilities. Will deal with that issue in another PR. Testing: I've tried to test all the notebooks with this change and things look fine. * Formatting fixes (#6433) * Formatting fixes * lint fixed * fix * doc bash 2-5, for pack, unpack, pack_img and unpack_img (#6140) * doc bash for pack, unpack, pack_img and unpack_img * Add comments for labels could be 1d list * Update recordio.py * Update recordio.py * Update recordio.py fixing text * Update recordio.py fixing text * remove empty line * Improve style (#6445) * Correction (#6444) * CSVIter example correction * fix * Update documentation for MXNetDataIter in io.py (#6000) (#6113) * Update documentation for MXNetDataIter in io.py (#6000) * [DOC] Respond to feedback (#6113) * Fix minor issues with api pages. (#6410) 1. In the notes section for ndarray, references did not seem clear enough to be referring to mxnet.ndarray or numpy.ndarray. Added the package names as prefixes to make it more obvious. 2. "share the same C++ operator source codes" => "share the same code". Since we don't really need to throw in more details than required. 3. Other relatively minor language changes which will be obvious from the diff. Note that I'm relatively not sure about the need for 1/ since it makes things more verbose. Let me know if it unnecessary and I'll remove it. * fixing the early stop for maximize = T (#5915) close #4587 * Update documentation for mxnet.ndarray.GridGenerator. (#6430) * Update documentation for mxnet.ndarray.GridGenerator. Thanks @Lyken17 for #6147 * Fix lint error. * Minor fix. * Remove the example. * Update documentation for deconvolution operation. (#6184) * Update documentation for deconvolution operation. * Add examples. * Remove the example. * skip lines that have %matplotlib (#6451) * Fixing some more broken links before v0.10 release (#6449) * close #4838 (#6452) * Fix linear regression (#6432) * Fix Linear Regression Tutorial * Small fix * Pre-trained model tutorial fixes. (#6453) Before the change on running the tutorial for the first time: "UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])". It also showed probability of >>1 due to incorrect usage of np.argsort(). * Nightly test tutorial (#6447) * Add tutorial test * Fix pylint * Small fix * fix member variable name: make them end with underline (#6438) * [R] captcha example (#6443) * skip lines that have %matplotlib (#6459) * Fix cudnn_deconv not guarding no_bias (#6456) * Fixing up issues in install guide (#6463) * Fixing copy code functionality for bash command (#6465) * Residual unroll (#6397) * residual unroll * unroll for residual cell * merge_outputs fix * Linear regression Tutorial link (#6468) * Fixing a link in the linear regression tutorial. The link was initally going to mxnet-test.readthedocs.io. Changed it to mxnet.io/api. * More appropriate language. * bump up version number for release (#6462) * bump up version number for release * update version for scala/R/backend * [R][DOC] update R installation guide (#6457) * Use sphinx==1.3.5 in Dockerfile.doc (#6470) changed PR name * Add 0.10 release info to README.md and NEWS.md (#6471) @nswamy wants to merge it immediately, so i'm going to do it now. I also changed the PR title. * fix batchNorm cpp example (#6454) * Update im2rec.py (#6473) Updated Line 107 of 'im2rec.py'. Read an image as binary. * Change Interface of NDArray & TBlob for DLPack Compatible (#6345) * Change Interface of NDArray & TBlob for DLPack Compatible Fix for cudnn operator Fix cpp tests * Update nnvm * Fix for MKL mem * Fix for windows macro * Bump up version number to 0.10.1 * Update NDArray Save&Load * trigger update * Add test for legacy data load * Use LegacyTShapeLoad * trigger update * Update tensor_blob.h * change 'channel_axis' parameter to 'axis' * Change DEFAULT_CHANNEL_AXIS to DEFAULT_AXIS * wait for dlpack PR to go through * Trigger build
apache · Jun 5, 2017 · 788c280 · 788c280
1 parent b2e7c3a
commit 788c280
Show file tree

Hide file tree

Showing 9 changed files with 838 additions and 375 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -144,6 +144,11 @@ include_directories("nnvm/include")
 include_directories("dmlc-core/include")
 include_directories("dlpack/include")
 
+# commented out until PR goes through
+#if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/dlpack)
+#  add_subdirectory(dlpack)
+#endif()
+
 if(NOT MSVC)
   set(BEGIN_WHOLE_ARCHIVE -Wl,--whole-archive)
   set(END_WHOLE_ARCHIVE -Wl,--no-whole-archive)
@@ -426,7 +431,6 @@ if(USE_PROFILER)
 	add_definitions(-DMXNET_USE_PROFILER)
 endif()
 
-# Do tests after chrpath so that we use the "real" cuda driver
 add_subdirectory(tests)
 
 # AUTO_INSTALL_DIR -> Optional: specify post-build install direcory

diff --git a/cpp-package/example/CMakeLists.txt b/cpp-package/example/CMakeLists.txt
@@ -3,7 +3,8 @@ if(NOT MSVC)
 endif()
 
 set(CPP_EXAMPLE_LIBS
-  ${BEGIN_WHOLE_ARCHIVE} mxnet ${END_WHOLE_ARCHIVE}
+  ${BEGIN_WHOLE_ARCHIVE} mxnet_static ${END_WHOLE_ARCHIVE}
+  ${BEGIN_WHOLE_ARCHIVE} dmlc ${END_WHOLE_ARCHIVE}
   ${mxnet_LINKER_LIBS}
   )
 

diff --git a/src/common/cuda_utils.h b/src/common/cuda_utils.h
@@ -9,8 +9,6 @@
 #include <dmlc/logging.h>
 #include <mshadow/base.h>
 
-#if MXNET_USE_CUDA
-
 /*! \brief Macros/inlines to assist CLion to parse Cuda files (*.cu, *.cuh) */
 #ifdef __JETBRAINS_IDE__
 #define __CUDACC__ 1
@@ -22,12 +20,14 @@
 inline void __syncthreads() {}
 inline void __threadfence_block() {}
 template<class T> inline T __clz(const T val) { return val; }
-struct __cuda_fake_struct { int x; int y; };
+struct __cuda_fake_struct { int x; int y; int z; };
 extern __cuda_fake_struct blockDim;
 extern __cuda_fake_struct threadIdx;
 extern __cuda_fake_struct blockIdx;
 #endif
 
+#if MXNET_USE_CUDA
+
 #include <cuda_runtime.h>
 #include <cublas_v2.h>
 #include <curand.h>

diff --git a/src/operator/batch_norm-inl.h b/src/operator/batch_norm-inl.h
@@ -10,6 +10,7 @@
 #include <dmlc/logging.h>
 #include <dmlc/parameter.h>
 #include <mxnet/operator.h>
+#include <mshadow/base.h>
 #include <map>
 #include <vector>
 #include <string>
@@ -30,6 +31,9 @@ namespace batchnorm {
 enum BatchNormOpInputs {kData, kGamma, kBeta};  // kGamma: weights, kBeta: biases
 enum BatchNormOpOutputs {kOut, kMean, kVar};  // req, out_data
 enum BatchNormOpAuxiliary {kMovingMean, kMovingVar};  // aux_states
+
+/*! \brief Default channel axis if none specified int he params */
+constexpr int DEFAULT_AXIS = 1;
 }  // namespace batchnorm
 
 /*! \brief Parameters for BatchNoram operator */
@@ -39,6 +43,7 @@ struct BatchNormParam : public dmlc::Parameter<BatchNormParam> {
   bool fix_gamma;
   bool use_global_stats;
   bool output_mean_var;
+  int axis;
   bool cudnn_off;
   DMLC_DECLARE_PARAMETER(BatchNormParam) {
     DMLC_DECLARE_FIELD(eps).set_default(1e-3f)
@@ -54,6 +59,8 @@ struct BatchNormParam : public dmlc::Parameter<BatchNormParam> {
               "This will force change batch-norm into a scale shift operator.");
     DMLC_DECLARE_FIELD(output_mean_var).set_default(false)
     .describe("Output All,normal mean and var");
+    DMLC_DECLARE_FIELD(axis).set_default(mxnet::op::batchnorm::DEFAULT_AXIS)
+      .describe("Specify which shape axis the channel is specified");
     DMLC_DECLARE_FIELD(cudnn_off).set_default(false)
       .describe("Do not select CUDNN operator, if available");
   }
@@ -187,7 +194,7 @@ class BatchNormOp : public Operator {
 };  // class BatchNormOp
 
 template<typename xpu>
-Operator *CreateOp(const BatchNormParam& param, const int dtype, const TShape& shape);
+Operator *CreateOp(BatchNormParam param, const int dtype, const TShape& shape);
 
 #if DMLC_USE_CXX11
 class BatchNormProp : public OperatorProperty {
@@ -207,21 +214,28 @@ class BatchNormProp : public OperatorProperty {
     CHECK_EQ(in_shape->size(), 3U) << "Input:[data, gamma, beta]";
     const TShape &dshape = in_shape->at(0);
 
+    const size_t channelAxis = static_cast<size_t>(param_.axis < 0
+                            ? static_cast<int>(dshape.ndim()) + param_.axis
+                            : param_.axis);
+    CHECK_LT(channelAxis, dshape.ndim()) << "Channel axis out of range: " << param_.axis;
+
+    const int channelCount = dshape[channelAxis];
+
     if (dshape.ndim() == 0) {
       return false;
     }
 
-    in_shape->at(1) = TShape(Shape1(dshape[1]));
-    in_shape->at(2) = TShape(Shape1(dshape[1]));
+    in_shape->at(1) = TShape(Shape1(channelCount));
+    in_shape->at(2) = TShape(Shape1(channelCount));
 
     out_shape->clear();
-    out_shape->push_back(dshape);             // kOut
-    out_shape->push_back(Shape1(dshape[1]));  // kMean
-    out_shape->push_back(Shape1(dshape[1]));  // kVar
+    out_shape->push_back(dshape);                // kOut
+    out_shape->push_back(Shape1(channelCount));  // kMean
+    out_shape->push_back(Shape1(channelCount));  // kVar
 
     aux_shape->clear();
-    aux_shape->push_back(Shape1(dshape[1]));  // kMovingMean
-    aux_shape->push_back(Shape1(dshape[1]));  // kMovingVar
+    aux_shape->push_back(Shape1(channelCount));  // kMovingMean
+    aux_shape->push_back(Shape1(channelCount));  // kMovingVar
     return true;
   }
 
@@ -329,6 +343,129 @@ class BatchNormProp : public OperatorProperty {
   BatchNormParam param_;
 };  // class BatchNormProp
 
+namespace batchnorm {
+
+template<typename DType>
+class BNTensor3 {
+  enum { OUTER, CHANNEL, INNER, COUNT };
+
+ public:
+  inline BNTensor3(const TBlob& blob, const int indexOfChannel)
+    : dptr_(blob.dptr<DType>())
+      , indexOfChannel_(static_cast<size_t>(indexOfChannel < 0
+                               ? (static_cast<int>(blob.shape_.ndim()) + indexOfChannel)
+                               : indexOfChannel)) {
+    shape_[OUTER] = 1;
+    for (size_t i = 0; i < indexOfChannel_; ++i) {
+      shape_[OUTER] *= blob.shape_[i];
+    }
+    shape_[CHANNEL] = blob.shape_[indexOfChannel_];
+    shape_[INNER] = 1;
+    for (size_t i = indexOfChannel_ + 1, n = blob.shape_.ndim(); i < n; ++i) {
+      shape_[INNER] *= blob.shape_[i];
+    }
+  }
+
+  inline BNTensor3(DType *p, const TShape& shape, const int indexOfChannel)
+    : dptr_(p)
+      , indexOfChannel_(static_cast<size_t>(indexOfChannel < 0
+                               ? (static_cast<int>(shape.ndim()) + indexOfChannel)
+                               : indexOfChannel)) {
+    shape_[OUTER] = 1;
+    for (size_t i = 0; i < indexOfChannel_; ++i) {
+      shape_[OUTER] *= shape[i];
+    }
+    shape_[CHANNEL] = shape[indexOfChannel_];
+    shape_[INNER] = 1;
+    for (size_t i = indexOfChannel_ + 1, n = shape.ndim(); i < n; ++i) {
+      shape_[INNER] *= shape[i];
+    }
+  }
+
+  MSHADOW_FORCE_INLINE bool IsEmpty() const {
+    return dptr_ == nullptr;
+  }
+
+  MSHADOW_XINLINE size_t Size() const {
+    size_t n = 1;
+    for (int i = 0; i < COUNT; ++i) {
+      n *= shape_[i];
+    }
+    return n;
+  }
+
+  MSHADOW_XINLINE size_t ChannelCount() const {
+    return shape_[CHANNEL];
+  }
+
+  MSHADOW_XINLINE size_t OuterSize() const {
+    return shape_[OUTER];
+  }
+
+  MSHADOW_XINLINE size_t InnerSize() const {
+    return shape_[INNER];
+  }
+
+  /*! \brief start of a given channel's spatial data */
+  MSHADOW_XINLINE size_t StartOffset(const size_t channel) const {
+    return channel * InnerSize();
+  }
+
+  /*! \brief This is the amount to skip to next same-channel data
+   * This is the number of bytes to skip from one past the end of the current spatial data
+   * to the next start of the same channel's "spatial data"
+   * It is assume that the pointer being calculated points just beyond the
+   * end of the last blobk of spatial data
+   * i.e. RGBRGB <-- 2
+   *      RRGGBB <-- 4
+   **/
+  MSHADOW_XINLINE size_t SkipLengthToNextSameChannelData() const {
+    return (ChannelCount() - 1) * InnerSize();
+  }
+
+  MSHADOW_XINLINE size_t offset(const size_t outer,
+                                const size_t channel,
+                                const size_t i) const {
+    const size_t spatial_size = InnerSize();
+    const size_t skip_length = SkipLengthToNextSameChannelData();
+    size_t off = StartOffset(channel);
+    off += outer * shape_[CHANNEL] * shape_[INNER];
+    const size_t skips = i / spatial_size;
+    off += (1 + skip_length) * skips;
+    off += i % spatial_size;
+    return off;
+  }
+
+  MSHADOW_XINLINE DType& get_ref(const size_t batch,
+                                 const size_t channel,
+                                 const size_t i) {
+    const size_t off = offset(batch, channel, i);
+    return dptr_[off];
+  }
+
+  MSHADOW_XINLINE const DType& get_ref(const size_t batch,
+                                       const size_t channel,
+                                       const size_t i) const {
+    const size_t off = offset(batch, channel, i);
+    return dptr_[off];
+  }
+
+  DType *dptr_;
+  size_t indexOfChannel_;
+  size_t shape_[COUNT];
+};
+
+inline int GetRealAxis(const TShape& shape, int axis) {
+  if (axis < 0) {
+    axis += shape.ndim();
+  }
+  return axis;
+}
+
+extern volatile bool disable_mkl;
+
+}  // namespace batchnorm
+
 #endif  // DMLC_USE_CXX11
 }  // namespace op
 }  // namespace mxnet