Add model average optimizer for fluid #9082

wanghaoshuang · 2018-03-14T17:05:53Z

fix #9172
And the result of some experiments was attached in #9172.

… average_model

1. Rename inputs and outputs 2. Add some comments

… average_model

qingqing01

The review has not been completed yet.

qingqing01 · 2018-03-19T02:22:43Z

paddle/fluid/operators/average_accumulates_op.cc

+             "accumulating sums of parameter values with the same shape as "
+             "input(param).");
+    AddInput("in_num_accumulates",
+             "Input(Tensor): The accumulating times of current window with "


Tensor<int64_t>

qingqing01 · 2018-03-19T02:23:51Z

paddle/fluid/operators/average_accumulates_op.cc

+  AverageAccumulatesOpMaker(OpProto* proto, OpAttrChecker* op_checker)
+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("param",
+             "Input(Tensor or LoDTensor): The parameter to be accumulated.");


Input(Tensor or LoDTensor) -> (Tensor or LoDTensor)

There is no Input before (

https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/mul_op.cc#L79

The same as below.

qingqing01 · 2018-03-19T02:27:10Z

paddle/fluid/operators/average_accumulates_op.cc

+    AddInput("param",
+             "Input(Tensor or LoDTensor): The parameter to be accumulated.");
+    AddInput("in_sum_1",
+             "Input(Tensor or LoDTensor): A tensor used to store the parameter "


Now, maybe all the inputs and outputs are Tensor.

qingqing01 · 2018-03-19T02:31:39Z

paddle/fluid/operators/average_accumulates_op.cc

+
+    AddComment(R"DOC(
+AverageAccumulates Operator.
+Accumulate the sum of parameter whtin sliding window. The size of sliding window is determined by 'average_window', 'max_average_window' and 'min_average_window'.


Need to more details to show how to average.

qingqing01 · 2018-03-19T02:50:13Z

paddle/fluid/operators/average_accumulates_op.h

+using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
+
+template <typename DeviceContext>
+void getAccumulators(const framework::ExecutionContext& ctx,


getAccumulators -> GetAccumulators

qingqing01 · 2018-03-19T02:50:24Z

paddle/fluid/operators/average_accumulates_op.h

+                     int64_t& old_num_accumulates);
+
+template <typename DeviceContext>
+void setAccumulators(const framework::ExecutionContext& ctx,


setAccumulators -> SetAccumulators

qingqing01 · 2018-03-19T04:44:01Z

paddle/fluid/operators/average_accumulates_op.h

+ public:
+  void Compute(const framework::ExecutionContext& ctx) const override {
+    // It is used to avoid loss of precision
+    static const int64_t kMaxNumAccumulates = 16384;


Is there any reference paper for kMaxNumAccumulates 16384?

It seems that 16384 is an experimental value. There are no reference papers.

qingqing01

Excellent work!!

qingqing01 · 2018-03-19T04:59:47Z

paddle/fluid/operators/average_accumulates_op.cc

+              "before this batch with shape [1].");
+
+    AddAttr<float>("average_window",
+                   "The rate of average window size relative to num_updates.");


Set 0. as the default value here.

qingqing01 · 2018-03-19T05:01:11Z

paddle/fluid/operators/average_accumulates_op.cc

+    AddAttr<float>("average_window",
+                   "The rate of average window size relative to num_updates.");
+    AddAttr<int64_t>("max_average_window", "Maximum size of average window.");
+    AddAttr<int64_t>("min_average_window", "Minimu size of average window.");


Set 10000L as the default value for min_average_window ?

qingqing01 · 2018-03-19T05:06:33Z

paddle/fluid/operators/average_accumulates_op.h

+    out_sum_2_tensor.device(place) = in_sum_2_tensor;
+    out_sum_3_tensor.device(place) = in_sum_3_tensor;
+    if (num_updates % kMaxNumAccumulates == 0) {
+      out_sum_2_tensor.device(place) = in_sum_2_tensor + in_sum_1_tensor;


Add comments before lin 87:

Move the sum to a different buffer to avoid loss of precision due to too many sums.

qingqing01 · 2018-03-19T05:08:03Z

paddle/fluid/operators/average_accumulates_op.h

+    if (num_accumulates >= min_average_window &&
+        num_accumulates >= std::min<int64_t>(max_average_window,
+                                             num_updates * average_window)) {
+      out_sum_3_tensor.device(place) = in_sum_1_tensor + in_sum_2_tensor;


Add comments before line 94:

Now the average window is too long, discard the old sum.

qingqing01 · 2018-03-19T06:27:20Z

python/paddle/fluid/optimizer.py

+            self._append_average_accumulate_op(param)
+
+    def _add_average_apply_op(self, block, param_grad):
+        param = block.clone_variable(param_grad[0])


Why use clone here? 这里clone实现来看，Variable的名字、存储内容(Tensor)都一样，为什么需要clone呢？可以直接用原始的Variable吗？

Op在做InferShape的时候，需要从当前block中查找input variables, 所以需要clone_variable function clone一份variable desc放到当前blcok中，同时修改variable.block为当前block. 否则，InferShape会有Input not found错误。

qingqing01 · 2018-03-19T06:29:50Z

python/paddle/fluid/framework.py

+        """
+        assert isinstance(var, Variable)
+        return self.create_var(
+            name=var.name,


我理解‘clone’的var和输入的var是两片空间，这里var的name都一样，更像是‘共享’同一个var。

qingqing01 · 2018-03-19T06:58:26Z

paddle/fluid/operators/average_accumulates_op.cc

+
+    AddAttr<float>("average_window",
+                   "The rate of average window size relative to num_updates.");
+    AddAttr<int64_t>("max_average_window", "Maximum size of average window.");


改下这里的注释吧，让用户手动设置成，一个pass/epoc里总共的mini-batch数。

qingqing01 · 2018-03-19T06:59:31Z

python/paddle/fluid/optimizer.py

+            model_average.apply()
+            for data in test_reader():
+                exe.run(inference_program...)
+            model_average.restore(exe)


可用通过with model_average.apply() 语法，隐藏model_average.restore 调用。

qingqing01 · 2018-03-19T07:01:26Z

paddle/fluid/operators/average_accumulates_op.cc

+             "shape [1].");
+    AddInput("in_num_updates",
+             "Input(Tensor): The total number of batches used by trainning "
+             "before this batch with shape [1].");


in_num_accumulates
in_old_num_accumulates
in_num_updates

这3个标量用fill_constant初始化的时候可以用fore_cpu属性，让这些标量始终在CPU上，这样GPU计算时，就不用拷贝了。

如果op是通过继承OperatorWithKernel 实现的话，在执行之前，这里会判断inputs是不是都是在期望的device上并将其转到期望的device上。
但是，OperatorWithKernel提供的自动转换不支持input和output共享内存的情况.
如果不继承OperatorWithKernel, 应该会有一定的修改工作量，可以放在后续PR.

明白了，那就现在这样吧。觉得更好的是，支持Variable<int/float>这样的变量作为op的输入。

1. Implement 'with model_average.apply()' syntax 2. Init apply_program and restore_program in __init__ functin of ModelAverage

… average_model

qingqing01

Since the feature of model average uses much memory, need to support do_average_in_cpu in next PR.

qingqing01

Please create an issue for this two problems before merge this PR.

qingqing01 · 2018-03-22T01:32:21Z

python/paddle/fluid/optimizer.py

+        params_grads: A list of parameter-grad variable pairs.
+        average_window_rate: The rate of average window.
+        min_average_window: The minimum size of average window.
+        max_average_window: The maximum size of average window.


The user document needs to refine, should tell users how to set average_window_rate, average_window_rate, max_average_window, and so on.

wanghaoshuang added 7 commits March 15, 2018 01:03

Add sum accumulator with window for model average

8a64568

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d7e5e1f

… average_model

Add clone_variable function for Block class.

aee6867

Add python API for sum op.

016d0eb

Add ModelAverage class to optimizer.py

87fe52c

Refine average accumulates op

e0b136c

1. Rename inputs and outputs 2. Add some comments

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

92a01d4

… average_model

wanghaoshuang changed the title ~~Add sum accumulator with window for model average~~ Add model average optimizer for fluid Mar 18, 2018

wanghaoshuang mentioned this pull request Mar 18, 2018

Add model average optimizer for fluid #9172

Closed

wanghaoshuang requested a review from qingqing01 March 18, 2018 15:05

qingqing01 reviewed Mar 19, 2018

View reviewed changes

wanghaoshuang added 6 commits March 19, 2018 16:40

Refine initial and API of ModelAverage API

cad4d7f

1. Implement 'with model_average.apply()' syntax 2. Init apply_program and restore_program in __init__ functin of ModelAverage

Refine sum_accumulates_op.

d22f4de

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e01c770

… average_model

Fix error while params_grads[1]==None

68c9f6e

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ad63722

… average_model

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

edb4e29

… average_model

qingqing01 approved these changes Mar 22, 2018

View reviewed changes

This was referenced Mar 22, 2018

Support model average in cpu. #9311

Closed

Refine doc of model average optimizer #9312

Closed

wanghaoshuang merged commit b594251 into PaddlePaddle:develop Mar 22, 2018

wanghaoshuang deleted the average_model branch May 20, 2022 03:59

Add model average optimizer for fluid #9082

Add model average optimizer for fluid #9082

Conversation

wanghaoshuang commented Mar 14, 2018 • edited Loading

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghaoshuang Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghaoshuang commented Mar 14, 2018 •

edited

Loading

wanghaoshuang Mar 19, 2018 •

edited

Loading

qingqing01 Mar 19, 2018 •

edited

Loading