Add job=time in trainer, refine cudnn_conv to reduce gpu memory and speed up training. #218

qingqing01 · 2016-10-17T14:26:15Z

Add job=time in trainer, which can print time without to enable the option WITH_TIMER on.
Add ConvProjection to reduce gpu memory for GoogleNet.
Use TmpMatrix in CudnnConvLayer and ConvProjection to reduce gpu memory.

This pull extracts the code in paddle from the pull #217

luotao1 · 2016-10-18T03:13:09Z

Can abstract a BaseProjectionLayer for duplicate code in MixedLayer and ConcatLayer2?

hedaoyuan · 2016-10-20T07:50:49Z

paddle/cuda/src/hl_cuda_cudnn.cc

@@ -242,7 +242,7 @@ void hl_conv_workspace(hl_tensor_descriptor input,
    CHECK_NOTNULL(conv);

    // Specify workspace limit directly
-    size_t memoryLimitBytes = 8 * 1024 * 1024;
+    size_t memoryLimitBytes = 1LL << 30;


a configurable parameter is better.

hedaoyuan · 2016-10-20T08:38:13Z

paddle/gserver/layers/ConcatenateLayer.cpp

@@ -110,6 +112,8 @@ class ConcatenateLayer2 : public Layer {
  std::vector<std::unique_ptr<Projection>> projections_;
  std::vector<Argument> projOutput_;
  std::vector<std::pair<size_t, size_t>> projCol_;
+  bool isConvProj_;


Use sharedBiases_ here, and configured to True in config_parser.py.
Than ConcatenateLayer2 do not need to know if the input projections_ is ConvProjection.

hedaoyuan · 2016-10-20T09:02:10Z

paddle/gserver/layers/CudnnConvLayer.cpp

@@ -186,6 +164,12 @@ void CudnnConvLayer::forward(PassType passType) {
  reshape(batchSize);
  resetOutput(batchSize, outputH_ * outputW_ * numFilters_);

+  void* workSpace = NULL;
+  if (workSpaceInBytes_ > 0) {
+    MatrixPtr tmpMat = Matrix::getTmpMatrix(1, workSpaceInBytes_, true);


Use getTmpMatrix is not very suitable.
There need a device memory only used by all ConvLayers, so a global device memory with mutex is more suitable.

hedaoyuan · 2016-10-20T11:50:32Z

paddle/gserver/layers/MixedLayer.cpp

@@ -15,11 +15,39 @@ limitations under the License. */

 #include "paddle/utils/Stat.h"
 #include "MixedLayer.h"
+#include "ConvProjection.h"


Also use sharedBiases_, and do not include "ConvProjection.h".

hedaoyuan · 2016-10-20T11:57:56Z

paddle/math/Matrix.h

+   * temporarily, i.e. do not store it or use it as return value.
+   * Do NOT use large amount of tmp matrix.
+   */
+  static MatrixPtr getTmpMatrix(


getTmpMatrix is do not need. MemoryHandle is alloc/free memory from/to a pool.

hedaoyuan · 2016-10-20T12:00:27Z

paddle/parameter/Argument.h

@@ -144,8 +144,8 @@ struct Argument {
  }
  size_t getFrameHeight() const { return frameHeight; }
  size_t getFrameWidth() const { return frameWidth; }
-  void setFrameHeight(size_t h) { frameHeight = h; }
-  void setFrameWidth(size_t w) { frameWidth = w; }
+  void setFrameHeight(size_t h) const { frameHeight = h; }


why use const?

The out_ is const in gserver/layers/Projection.h and these two functions are called by out_ in conv_projection.

This modification will lead others to look more puzzling. A const object want to modify member variables.
Use const_cast is more suitable.

Revert the modification and use const_cast.

luotao1 · 2016-10-25T03:48:48Z

paddle/math/Matrix.cpp

+  for (size_t i = 0; i < numSamples; i++) {
+    for (size_t c = 0; c < channel; c++) {
+      for (size_t j = 0; j < dim; j++) {
+        B[c] += scale * A[i * channel * dim + c * dim + j];


you can compute the channel*dim outside the loop,which can reduce the computation. May be

size_t channel_dim = channel * dim; for (size_t i = 0; i < numSamples; i++) { size_t index = channel_dim * i; for (size_t c = 0; c < channel; c++) { size_t index2 = c * dim; for (size_t j = 0; j < dim; j++) { B[c] += scale * A[index + index2 + j];

luotao1 · 2016-10-25T03:52:17Z

paddle/gserver/layers/MixedLayer.cpp

+      outV->addBias(*(biases_->getW()), 1);
+    }
+  }
+


you can wrapper line 146~150 to be a new function.

luotao1 · 2016-10-25T03:53:16Z

paddle/gserver/layers/MixedLayer.cpp

+      biases_->getWGrad()->collectSharedBias(*getOutputGrad(), 1);
+    } else {
+      biases_->getWGrad()->collectBias(*getOutputGrad(), 1);
+    }



you can wrapper line 167~171 to be a new function.

hedaoyuan · 2016-10-27T06:38:33Z

paddle/cuda/include/hl_device_functions.cuh

+ *
+ * @param[in,out]  smem       input data, better to use __shared__ memory.
+ * @param[in]      tid        local thread index.
+ * @param[in]      blockDimX  the size of blockDim.x.


comment is incorrect.

have modified blockDimX.

hedaoyuan · 2016-10-27T09:04:34Z

paddle/gserver/layers/ConcatenateLayer.cpp

@@ -107,9 +108,13 @@ class ConcatenateLayer2 : public Layer {
  virtual void backward(const UpdateCallback& callback = nullptr);

 protected:
+  bool sharedBiases_;


hedaoyuan

need some small fix and approve convolution part of the code. @reyoung please review the python code.

reyoung · 2016-10-31T03:50:48Z

I will review them today. Do not merge it now.

reyoung · 2016-10-19T05:47:48Z

paddle/cuda/src/hl_cuda_cudnn.cc

@@ -242,7 +242,7 @@ void hl_conv_workspace(hl_tensor_descriptor input,
    CHECK_NOTNULL(conv);

    // Specify workspace limit directly
-    size_t memoryLimitBytes = 8 * 1024 * 1024;
+    size_t memoryLimitBytes = 1LL << 30;


Left style is better, and size_t may not same as unsigned long long.

In clang compiler, size_t is unsigned long

reyoung · 2016-10-31T03:47:58Z

python/paddle/trainer/config_parser.py

+
+    def calc_parameter_dims(self, input_size, output_size):
+        return None
+        # or [self.proj_conf.conv_conf.channels *


Either uncomment it or remove it.

reyoung · 2016-10-31T04:35:26Z

paddle/gserver/layers/ConcatenateLayer.cpp

@@ -137,6 +139,15 @@ bool ConcatenateLayer2::init(const LayerMap& layerMap,
  }
  CHECK_EQ(getSize(), endCol);

+  /* initialize biases_ */
+  if (biasParameter_.get() != NULL) {
+    if (config_.has_shared_biases()) {


直接给proto里面加default value，这样，这个判断就可以去掉了。

in proto

optional bool shared_biases = 1 [default=false];

in cpp

sharedBias_ = config.shared_biases();

reyoung · 2016-10-31T04:35:45Z

paddle/gserver/layers/ConcatenateLayer.cpp

@@ -97,7 +97,8 @@ void ConcatenateLayer::backward(const UpdateCallback& callback) {
 */
 class ConcatenateLayer2 : public Layer {
 public:
-  explicit ConcatenateLayer2(const LayerConfig& config) : Layer(config) {}
+  explicit ConcatenateLayer2(const LayerConfig& config) :
+      Layer(config), sharedBias_(false) {}


sharedBias 直接在proto里面加default value.

reyoung · 2016-10-31T04:36:49Z

paddle/gserver/layers/ConcatenateLayer.cpp

+  }
+
+  /* add the bias-vector */
+  if (biases_.get() != NULL) {


if (biases_) { }

reyoung · 2016-10-31T05:12:07Z

python/paddle/trainer/config_parser.py

@@ -2528,8 +2569,20 @@ def __init__(
            record_operator_conf = self.config.operator_confs.add()
            record_operator_conf.CopyFrom(operator_conf)

+        shared_biases=None


reyoung · 2016-10-31T05:12:16Z

python/paddle/trainer/config_parser.py

+                psize += input.calc_bias_size()
+
+        if shared_biases is not None:
+            self.config.shared_biases = shared_biases


去掉这两行

reyoung · 2016-10-31T05:13:22Z

python/paddle/trainer/config_parser.py

+          for input_index in xrange(len(self.inputs) - 1):
+              input = self.inputs[input_index + 1]
+              config_assert(isinstance(input, ConvProjection),
+                  "All the inputs of ConcatenateLayer2 should be ConvProjection.")


这个assert提示消息不对！ConcateLayer要么是其他的projection，要么都是ConvProjection。

这个提示的意思是，所有ConcateLayer的输入必须都是ConvProjection

reyoung · 2016-10-31T05:13:34Z

python/paddle/trainer/config_parser.py

+
+        psize = self.config.size
+        if isinstance(self.inputs[0], ConvProjection):
+            shared_biases = True


reyoung · 2016-10-31T05:18:58Z

python/paddle/trainer_config_helpers/networks.py

+                mixed_layer_attr=None,
+                gru_cell_attr=None
+                ):
+    """


应该把simple_gru的直接改成memory的吧。

simple_gru在seq2seq demo中有用，并且提供了模型，如果直接改成memory，会导致兼容性有问题。

1. unit test in test_LayerGrad. 2. compare the ConvPorjection and CudnnConvLayer, also compare the concat_layer+img_conv_layer and concat_layer_conv_projection.

1. Use TmpMatrix as the workspace in cudnn_conv to reduce gpu memory. It reduce lots of memory. 2. Add benchmark document. 3. fix smallnet_mnist_cifar.py in paddle.

…ed_layer.

reyoung · 2016-11-01T08:52:26Z

paddle/gserver/layers/CudnnConvLayer.cpp

+  numFilters_ = config_.num_filters();
+  CHECK(config_.shared_biases());
+  for (size_t i = 0; i < inputLayers_.size(); i++) {
+    ProjectionConfig* conf = new ProjectionConfig();


内存泄露

…em_time

reyoung · 2016-11-01T11:05:08Z

paddle/trainer/TrainerBenchmark.cpp

+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#undef PADDLE_DISABLE_TIMER


这个，如果with timer的情况下，是不是会报错呢？

一般都是，if defined, then undefine

刚拿clang试了一下，似乎undef一个不存在的macro也无所谓

* refine sparse momentum api and unittest (PaddlePaddle#126) * refine sparse momentum api and unittest * fix unittests bug * Remove main function in some unittest. * Update Mac OS X port * follow comments to fix bugs * Revise some word in build doc * Add automatic check AVX in CMake (PaddlePaddle#145) * Add automatic check AVX in CMake * Revise table format and some words in build docs * Fix cmake/FindAVX.cmake * Update build docs (PaddlePaddle#148) * Add automatic check AVX in CMake * Add indent in FindAVX.cmake * Revise table format and some words in build docs * Update build docs * Fix bug when only support AVX 2 (PaddlePaddle#150) In some situation, for instance, in the virtual machine, it could happen. * add scripts to build ubuntu install package. (PaddlePaddle#132) * also refine install docs, too * some bug fix for sparse matrix (PaddlePaddle#133) * some bug fix for sparse matrix * a minor bug fix * Update build docs (PaddlePaddle#149) * Add automatic check AVX in CMake * Add indent in FindAVX.cmake * Revise table format and some words in build docs * Update build docs * Update build docs * [DOC CHANGE] Rerange Build docs & emphasize them in README.md (PaddlePaddle#151) * Rerange Build docs & emphasize them in README.md * Rerange Build docs & emphasize them in README.md * Update Readme (PaddlePaddle#153) * Update Readme * Update readme * Update readme * Fix CUDA_VERSION Comparsion (PaddlePaddle#165) * Update readme (PaddlePaddle#155) * Update readme * Apache 2.0 * add interface and test of RecurrentGradientMachine (PaddlePaddle#156) * add interface and unittest of RecurrentGradientMachine for the function of multiple Subsequence inlinks with unequal token length * bug fix for dataprovider for quick start inference (PaddlePaddle#168) * Support MAC OS Sierra (PaddlePaddle#169) * typo in image classification demo (PaddlePaddle#167) * support rectangle padding, stride, window and input for PoolProjection (PaddlePaddle#115) * support rectangle padding, stride, window and input for PoolProjection * Follow comments. 1. Remove start 2. refine img_pool_a/b.conf for test_NetworkCompare 3. Split unit test * Modify the test in img_layers.py * Use C++ 11 atomic_flag in MacOS as spin lock (PaddlePaddle#175) * Use C++ 11 atomic_flag in MacOS as spin lock * Add unittest for it. * Read git sha1 when building Paddle, and add it to PADDLE_VERSION macro * save the model file including git sha1 * add weight for cost layer interface (PaddlePaddle#177) * Should not compile the two files if -DWITH_AVX=OFF. (PaddlePaddle#163) * If cmake -DWITH_AVX=OFF during configuration, should not compile the file src/hl_math.cc and src/hl_avx_functions.cc. * Add travis for osx (PaddlePaddle#189) * set MKL search path with intel64 (PaddlePaddle#188) * Mnist demo (PaddlePaddle#162) * added mnist demo * modified .gitignore for .project files * normalize pixel in mnist_provider.py and set use_gpu=0 * add interface and unittest for nce layer (PaddlePaddle#180) * add interface and unittest for nce layer * follow comments * Merge internal changes (PaddlePaddle#198) * fix DataProvider create function args bug Change-Id: I9e3a1c535c805bf30204a14aea8d5143ff534784 * remove PserverForPython.h which is not used Change-Id: I2b27f1f3c11a42766a92fc689f0f5f1f73ee1d70 * add internal document script Change-Id: Ia0fec79456caea0b271f9903cc13e8a3d32e0774 * hierarchical rnn document, add new config example (PaddlePaddle#106) * hierarchical rnn document, add new config example * update inputs_type of label * add check for unsupported config * refine hierarchical document * refine doc title * update docs, fix paddle to PaddlePaddle * follow comments * remove some copyfrom in AgentLayer and ExpandLayer, fix warning in seq2seq config (PaddlePaddle#183) * remove redundant HPPL_TYPE_DOUBLE (PaddlePaddle#200) * add cost_type constraint to weighted_cost interface (PaddlePaddle#206) * remove unmerged internal documents (PaddlePaddle#205) * Add FAQ (PaddlePaddle#128) * Init commit for doing FAQ * Add speed up training * Add graphviz to ci * Add shared paramter * Tiny refine * Fix bug in yield dictionary in DataProvider. (PaddlePaddle#197) * Fix bug in yield dictionary in DataProvider. * Also make virtualenv work in Paddle. * Update docker_instll.rst docker image name (PaddlePaddle#210) * Fix sparse training for trainer_count=1 (PaddlePaddle#204) * Fix sparse training for trainer_count=1 For trainer_count=1, the gradient machine is NeuralNetwork, which does not create parameter buf for PARAMETER_GRADIENT for sparse update in Parameter::enableType. But gradient parameter buf is still used in SgdThreadUpdater. * Minor update to comment * Supplement doc for RNN (PaddlePaddle#214) * Speed up PyDP2, support numpy.float array (PaddlePaddle#207) * fix bug in some different python environment (PaddlePaddle#220) * Fix install_docker.rst and data_sources file open mode * Follow PaddlePaddle#223 * Fix PaddlePaddle#222 * add base class for seqlastin/max/average layer (PaddlePaddle#187) * Added Bidi-LSTM and DB-LSTM to quick_start demo (PaddlePaddle#226) * add missing layer_attr (PaddlePaddle#234) * fix build bug in gcc46 (PaddlePaddle#236) * error in doc of quick_start (PaddlePaddle#228) * fix error in doc of quick_start * There are some warning when execute preprocess.sh * add maxout layer, including interface and unittest (PaddlePaddle#229) * add maxout layer, including interface and unittest * follow maxout comments * auto setting channels * fix unittest bug in test_RecurrentGradientMachine * remove deprecated start input in img_pool_layer (PaddlePaddle#237) * Fix dataprovider converter for sparse data * FIx check type unmatch in MaxOutLayer (PaddlePaddle#242) Compiled failed on gcc 4.6 * Sequence tagging demo (PaddlePaddle#225) * Update contribute_to_paddle.md (PaddlePaddle#248) * add input sparse data check for sparse layer at runtime (PaddlePaddle#247) * add input sparse data check for sparse layer at runtime, to avoid invalid data access at pserver end while doing prefetch * remote sparse design support binary sparse and float saprse both * Python trainer api (PaddlePaddle#193) * Python trainer API and demo * Adding missing PaddleAPIPrivate.h * Adding api_train.sh * More comments * Bump up patch version to 0b3 * Change contribute to paddle to fit new branching model (PaddlePaddle#275) * Change contribute to paddle to fit new branching model * set test_period default value to 0 (PaddlePaddle#279) * Make Paddle --save_dir support a directory name (PaddlePaddle#277) * Also fix PaddlePaddle#243 * fix interface bug of block_expand_layer and add unittest (PaddlePaddle#265) * fix interface bug of block_expand_layer and add unittest * auto compute num_channels * default value of num_channels is None * adjust input order of block_expand * Support empty Param Block in ParameterSever (PaddlePaddle#244) * Because in cluster maybe use a lot machine to train a model, and some parameter size could be too small for ParameterServer. Then some of pservers could not have any ParamBlock. * Also, because ports_num or ports_num_for_sparse is too large, then give a warning in runtime. * Add bilinear interpolation layer * fix type unmatch on gcc * Adding an introduction doc for Paddle to implement simplest linear regression. * Add default cuda system path (PaddlePaddle#192) * DYLD_LIBRARY_PATH is disable after Mac OS X 10.11 * fix clang + gpu compile error on Mac OS * fix some words and errors in build docs * Add glog header path to include (PaddlePaddle#295) * add SpatialPyramidPoolLayer c++ support * Add job=time in trainer, refine cudnn_conv to reduce gpu memory and speed up training. (PaddlePaddle#218) * Add benchmark for PaddlePaddle, tensorflow and caffe * ConvProjection to reduce memory for goolenet * Add unit test for ConvProjection. 1. unit test in test_LayerGrad. 2. compare the ConvPorjection and CudnnConvLayer, also compare the concat_layer+img_conv_layer and concat_layer_conv_projection. * Reduce cudnn_conv memory and add benchmark document. 1. Use TmpMatrix as the workspace in cudnn_conv to reduce gpu memory. It reduce lots of memory. 2. Add benchmark document. 3. fix smallnet_mnist_cifar.py in paddle. * Add job=time and refine cudnn_conv to reduce gpu memroy and speed up * Refine cudnn_conv and shared biases operation in concat_layer and mixed_layer. * follow comments * follow comments * Use unique_ptr to prevent memory leaks in CudnnConvLayer. * Add some concepts documents to guide user for using paddle (PaddlePaddle#249) * reuse code of PoolProjection in PoolProjectionLayer * Add How to build docs (PaddlePaddle#312) * Bug fix in CudnnConvLayer, which will lead to destruction error. (PaddlePaddle#317) * Fix a bug in testOnePeriod. (PaddlePaddle#322) * Forget to finishTestPeriod in testOnePeriod. * Fix PaddlePaddle#318 * add user_arg to LayerConfig (PaddlePaddle#315) * install the right python package version (PaddlePaddle#326) For multiple installation of paddle, there might be multiple versions of python package at opt/paddle/share/wheels/. We should install the right version. Ideally, we should remove the wrong versions when install. But it's not easy to do this with cmake. Change-Id: Ida8a8d60643ad9e42cf1c85776de9122d5ba1392 * Add matrix inverse (PaddlePaddle#240) * Add matrix inverse * report error when use parallel_nn to train recurrent_nn model (PaddlePaddle#335) * install the right python package version (PaddlePaddle#340) For multiple installation of paddle, there might be multiple versions of python package at opt/paddle/share/wheels/. We should install the right version. Ideally, we should remove the wrong versions when install. But it's not easy to do this with cmake. Change-Id: Ida8a8d60643ad9e42cf1c85776de9122d5ba1392 * Fix minor errors in instructions of building Paddle on Mac OS X (PaddlePaddle#347) * Fix bug and redundant code in hl_dso_loader.cc (PaddlePaddle#306) * Fix glog check type unmatch in Util.cpp (PaddlePaddle#353) * Fix glog check type unmatch in Util.cpp PaddlePaddle#352 * Add code coverage and coveralls (PaddlePaddle#296) * Add Issue template to guide user submit good issue (PaddlePaddle#354) * Add issue template * Update ISSUE_TEMPLATE.md * Update ISSUE_TEMPLATE.md * Rename * Rename * Typo * Typo * Typo * Typo * Follow comments * Follow comments * Add elementwise math operations (PaddlePaddle#343) * Add elementwise math operations This allows use to use expressions like: y=log(1+exp(x)) Also added unittests for ActivationFunction * Enforce keyword arguments for non-positional arguments * Add LogActivation to doc * include mkl_lapacke.h (PaddlePaddle#359) * Update ISSUE_TEMPLATE.md (PaddlePaddle#357) * add rdma cmake support (PaddlePaddle#284) * add rdma cmake support * move rdma related code to rdma.cmake * using find_package for swig (PaddlePaddle#334) * Use diff to compare config unittest (PaddlePaddle#363) Fix PaddlePaddle#342 * Fix SRL hang when exit. (PaddlePaddle#291) * Fix SRL hang when exit. * Error occurred when enable Async Load in TestDataProvider. * It because DataProvider is calling getNextBatchInternal in one thread, and destructing DataProvider in other thread. * Add wait routine in DataProvider destructing. * Also fix another bug, when destructing TestDataProvider and do not read any test data. Fix PaddlePaddle#286 * Follow comments, Use mutex is cool! * Follow comments * Add img_size for unit test * Fix bilinear interp bug * revert flags.cmake * Replace outputH to batchSize * Follow comments * Revise one word in ISSUE_TEMPLATE.md (PaddlePaddle#371) * abstract outputSize function in CNN-related layers (PaddlePaddle#314) * Add define for double getrf, getri (PaddlePaddle#381) * Add SumCost This allows user to implement any type of cost by summing over the output of non-cost layers. Change-Id: Ic55aaabbf0c1299e70b8e48a0effcc91f8f5bd29 * Add sum_cost to document And rebase Change-Id: I7ea234b3aa8fc70675af15d91db08242c43fb5ff * Remove Mac OS X build docs (PaddlePaddle#386) Currently, Paddle on Mac OS X is not deliberate testing through the different versions of Mac OS X and Clang. When all these things that we've done, we will reopen Mac build docs. * add python wrap for sppLayer * Cancelling Travis build with docs updates only. (PaddlePaddle#372) * fix deadlink in Chinese quick start doc. (PaddlePaddle#389) * add python-related unittest problem in faq document (PaddlePaddle#377) * Fix macOS quick start preprocess script. (PaddlePaddle#390) * Use `gshuf` instead of `shuf` in macOS * Fix PaddlePaddle#388 * fix floating-point overflow problem of tanh (PaddlePaddle#355) * py_paddle link zlib(PaddlePaddle#393) * enable swig unittest in travis-ci (PaddlePaddle#394) * Init * Add numpy deps * Refine * fix some nvcc compile options (PaddlePaddle#392) * Follow comments * modify the format of diff information in protostr (PaddlePaddle#398) * Fix minior bug * add patch does not trigger travis ci * follow comments * Fix Travis Ci does not build when push patches (PaddlePaddle#399) * add getSize method for PoolProjection * Make matrix well-conditioned when unittest inverse * Implement setDiag() with BaseMatrix::assign() * Follow comments * follow comments * Update FindAVX.cmake (PaddlePaddle#404) * make AVX_FOUND is default value to WITH AVX * let AVX_FLAG always keep -mavx flag since compiler can build binary with -mavx even CPU does not support avx. * some tiny fixs (PaddlePaddle#406) * some tiny fixs * use VLOG(3) * [Work in Progress] Update cluster_train.md (PaddlePaddle#391) Update cluster_train.md for easier understanding * Fix memory leak in image classification demo, which is caused by dataprovider (PaddlePaddle#323) * the memory leak is inside one pass. * Update * Delelte old protostr * Follow comments * add some code comments for SppLayer * Update * Fix a bug * initial take on deconv layers * added convTrans test and python components * added more test on convTrans layer and comments * Refactor ExpandConvTransLayer to share codes with ExpandConvLayer * refactored ExpandConvLayer and ExpandConvTransLayer with ConvBaseLayerCpu * fixed a bug in refactoring ExpandConv/TransLayer * add another small test in test_LayerGrad for convTransLayer * Revised deconv implementations according to luotao1 * rebase deconv implementation with develop branch and resolve conflicts with pull#218 commit 45c81a4 * deconv layer implementation modification following luotao1 comments * fix a small bug in ConvTransLayerBase in config_parser.py * deconv implementation mionr changes in ConvBaseLayer.cpp and config_parser.py * minor changes on deconv per luotao1 comments * Refactored imageSize in ConvBaseLayer to MathUtil * minor change to convTransLayer test in test_LayerGrad * minor changes on deconv implementation and add protostr test for deconv layer * fixed a bug in parse_conv in config_parser.py * Generate bilinear protostr via Linux * set mixedlayer output size according to input operator (PaddlePaddle#414) * set mixedlayer output size according to input operator * change from num_channel to num_channels for conv_operator (the old one is really misleading because all the others are num_channels) * also changed the arg name in projections.py * change the act.name for LinearActivation() to "linear" so that it won't fail in hl_activetype; also fix the hasinputsset in submodel * Revise code * use yapf to format python code, add style config file * Add checkout name for Dockerfile * Because in dockerhub, we cannot set the `docker build `running directory, we could only use `git clone` command to get the latest code if we put `Dockerfile` in subdirectory * But the `git clone` will checkout the default branch only, so here we add a `ENV` in Dockerfile to checkout special branch or tag in git repo. We could change it to `V0.9.0` tag when it release. * '*' operator overload for LayerOutput Making '*' support the multiplication between a scalar and LayerOutput Also changing '+' to support adding between a vector and a scalar. Change-Id: I7daf35590dc2b2f855a29d9ef43ac57979442e0f * change hlactivetype instead of act.name * fix bug in sum_cost * fix test_layerHelpers unittest error * change python code style to pep8 * Fix bug in multple objects in define_py_sources * Add unittest for split datasource * Fix PaddlePaddle#436 * multi_binary_cross_entropy when ids vector is provided * copy the data when createSparseMatrix * format python code in demo, doc, doc_cn and paddle directories * format python code in python directory * modifications according to comments * Add pre-commit config file. * Add yapf hook to format python code. * Add Remove CRLF * Update pre-commit-config * Check all files by pre commit hooks * Bug fix in testing mode. * Refine clang-format for Paddle style * fix url of sub-pages * added resnet lstm architecture from GNMT * modify document directory structure in model config helpers * Revert "fix url of sub-pages" * Add ScalingProjection out = w * input where w is a parameter of size 1 Change-Id: Ife682d62323ceb1a20cbbf6269421b20a862d888 * Fix unittest Change-Id: Ic80845c892c96c37a0df0ddc433fe1aeaa5a9d1c * Fix forwardTest for ids in python swig. * unittest need to be added. But fix the bugs first. * Bumping up version number to v0.9.0a0 * Fix some problems in Debian build scripts. * Mount local Paddle instead of git clone from remote. * Use official chinese ubuntu source instead of 163 mirror. * Update dockerfile tags * Add version check for paddle * Refine ver2num function, add comments * Fix Debian package name in ubuntu install docs. * Fix PaddlePaddle#486 * Change demo datafile location by using CDN in baidu. * merge bugfix PaddlePaddle#593 and # 597 from develop branch * Bumping up version number * Add Release notes * Refine documentation in RELEASE.md * fix dead link for quick start * update * Fix Travis-CI build for release * Remove typo in documentation. * fix typo

reyoung · 2017-03-24T10:04:34Z

paddle/gserver/layers/MixedLayer.cpp

  /* initialize biases_ */
  if (biasParameter_.get() != NULL) {
-    biases_ = std::unique_ptr<Weight>(new Weight(1, getSize(), biasParameter_));
+    sharedBias_ = config_.shared_biases();
+    size_t psize = config_.bias_size();


@qingqing01 这里为啥要这么修改？这个修改会导致这个issue #1700

#1700 错误现象是由于biasParameter_是有的，但是config_.bias_size==0？

update new_guides

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

qingqing01 changed the title ~~Add job=time in trainer, refine cudnn_conv to reduce gnu memory and speed up training.~~ Add job=time in trainer, refine cudnn_conv to reduce gpu memory and speed up training. Oct 17, 2016

qingqing01 mentioned this pull request Oct 17, 2016

Add benchmark and reduce GPU memory for cudnn_conv and speed up cudnn_conv. #217

Closed

qingqing01 assigned emailweixu, hedaoyuan, luotao1, reyoung and gangliao Oct 18, 2016

qingqing01 force-pushed the conv_mem_time branch from 9bbb50b to 7e3fa4c Compare October 18, 2016 02:50

hedaoyuan reviewed Oct 20, 2016

View reviewed changes

qingqing01 force-pushed the conv_mem_time branch 2 times, most recently from 7c82f0b to e8eaca8 Compare October 25, 2016 02:03

luotao1 reviewed Oct 25, 2016

View reviewed changes

reyoung changed the base branch from master to develop October 26, 2016 08:13

hedaoyuan reviewed Oct 27, 2016

View reviewed changes

hedaoyuan approved these changes Oct 27, 2016

View reviewed changes

qingqing01 force-pushed the conv_mem_time branch from e8eaca8 to 2e3c49c Compare October 28, 2016 10:24

reyoung requested changes Oct 31, 2016

View reviewed changes

qingqing01 added 3 commits October 31, 2016 21:36

Add benchmark for PaddlePaddle, tensorflow and caffe

df90edc

ConvProjection to reduce memory for goolenet

763fcd1

Add unit test for ConvProjection.

6351ac7

1. unit test in test_LayerGrad. 2. compare the ConvPorjection and CudnnConvLayer, also compare the concat_layer+img_conv_layer and concat_layer_conv_projection.

qingqing01 added 5 commits October 31, 2016 21:36

Reduce cudnn_conv memory and add benchmark document.

2aabb73

1. Use TmpMatrix as the workspace in cudnn_conv to reduce gpu memory. It reduce lots of memory. 2. Add benchmark document. 3. fix smallnet_mnist_cifar.py in paddle.

Add job=time and refine cudnn_conv to reduce gpu memroy and speed up

d5d8caf

Refine cudnn_conv and shared biases operation in concat_layer and mix…

b6a948f

…ed_layer.

follow comments

61e21c3

follow comments

27e89df

qingqing01 force-pushed the conv_mem_time branch from 2e3c49c to 27e89df Compare October 31, 2016 13:37

reyoung requested changes Nov 1, 2016

View reviewed changes

qingqing01 added 2 commits November 1, 2016 17:18

Use unique_ptr to prevent memory leaks in CudnnConvLayer.

a7cd9be

Merge branch 'develop' of https://github.com/baidu/Paddle into conv_m…

78d6a76

…em_time

reyoung approved these changes Nov 1, 2016

View reviewed changes

qingqing01 merged commit 45c81a4 into PaddlePaddle:develop Nov 2, 2016

gangliao mentioned this pull request Mar 17, 2017

Improve Docker Images #1630

Closed

5 tasks

reyoung reviewed Mar 24, 2017

View reviewed changes

reyoung mentioned this pull request Apr 17, 2017

交通流预测训练时报F0323 13:21:31.180212 14883 Weight.cpp:28] Check failed: param->getSize() == width * height (64 vs. 0) #1700

Closed

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019

Merge pull request PaddlePaddle#218 from tink2123/api_guide_1024

49a1bab

update new_guides

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

Add Resnet test and debug. (PaddlePaddle#218)

46deb17

zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Feb 24, 2023

fix FillInferBuf (PaddlePaddle#218)

8fa81f6

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Sep 13, 2023

fix FillInferBuf (PaddlePaddle#218)

afec065

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

Add job=time in trainer, refine cudnn_conv to reduce gpu memory and speed up training. #218

Add job=time in trainer, refine cudnn_conv to reduce gpu memory and speed up training. #218

Conversation

qingqing01 commented Oct 17, 2016 • edited Loading

luotao1 commented Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 Oct 24, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hedaoyuan Oct 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hedaoyuan left a comment

Choose a reason for hiding this comment

reyoung commented Oct 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 Oct 31, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 commented Oct 17, 2016 •

edited

Loading

luotao1 commented Oct 18, 2016 •

edited

Loading

qingqing01 Oct 24, 2016 •

edited

Loading

hedaoyuan Oct 27, 2016 •

edited

Loading

qingqing01 Oct 31, 2016 •

edited

Loading