Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and simplify hook design & add Tensor.register_hook API #31775

Merged

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Mar 22, 2021

PR types

New features

PR changes

APIs

Describe

Refactor and simplify hook design & add Tensor.register_hook API

1. Refactor

Simplify Hook class design

  • original classes
- OpBasePreHook
  - PyOpBasePreHook (Implement later)
  - CppOpBasePreHook (Implement later)
- GradAccumulatorPostHook
  - PyGradAccumulatorPostHook (Implement later)
  - CppGradAccumulatorPostHook (Implement later)
  - LambdaGradAccumulatorPostHook
- InteriorVarHookPipeline
- LeafVarHookPipeline
  • new classes
- VariableWrapperHook
  - PyVariableWrapperHook
- InplaceVariableWrapperHook
  - PyInplaceVariableWrapperHook (Implement later)
  - LambdaInplaceVariableWrapperHook 
  • The input of hook operator is VariableWrapper, so hook is completely managed by VariableWrapper itself
  • Remove weak_ptr in OpBase and GradientAccumullator
  • Remove several hook related methods

2. Add Tensor.register_hook method

  • Support register backward hook in Python
from __future__ import print_function

import paddle

# hook function return None
def print_hook_fn(grad):
    print(grad)

# hook function return Tensor
def double_hook_fn(grad):
    grad = grad * 2
    return grad

x = paddle.to_tensor([0., 1., 2., 3.], stop_gradient=False)
y = paddle.to_tensor([4., 5., 6., 7.], stop_gradient=False)
z = paddle.to_tensor([1., 2., 3., 4.])

# one Tensor can register multiple hooks
h = x.register_hook(print_hook_fn)
x.register_hook(double_hook_fn)

w = x + y
# register hook by lambda function
w.register_hook(lambda grad: grad * 2)

o = z.matmul(w)
o.backward()
# print_hook_fn print content in backward
# Tensor(shape=[4], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
#        [2., 4., 6., 8.])

print("w.grad:", w.grad) # w.grad: [1. 2. 3. 4.] - no changed
print("x.grad:", x.grad) # x.grad: [ 4.  8. 12. 16.] - deal with by two *2 hook
print("y.grad:", y.grad) # y.grad: [2. 4. 6. 8.] - - deal with by one *2 hook

# remove hook
h.remove()

3. Doc

related cn doc: PaddlePaddle/docs#3390

image

英文由于文档抽取问题,现在无法预览

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

@@ -408,9 +412,25 @@ void BasicEngine::Execute() {
}
}

for (auto& pair : tmp_ins) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about create tmp_ins only when it needed, it seems make too many tmp variable_wrapper copy here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

accumulator->CallBackwardPostHooks();
}
// 3. Call backward Hooks for `var_`
accumulator->CallReduceHooks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad name, or may use inherent to fix it? CallHooks indicates invoke all hooks, but CallReduceHooks make it confused to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, CallHooks-> CallGradientHooks

platform::errors::InvalidArgument("Leaf Tensor's inner var "
"is not initialized when "
"call gradient hook."));
if (var_->HasHook()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seal this or make it has difference with the same code in Execute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for loop is similar

}
}

void GradientAccumulator::CallReduceHooks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do some check to differ it with normal hook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* parallel multi-card training.
*/

void CallHooks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this two func not a parallel structure with related name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

*/
class OpBasePreHook {
class VariableWrapperHook {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about make a abstract class of Hook to seal different kinds of hooks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried, that is not a good idea

int64_t next_hook_id_{0};
// Hooks used to register hook for grad var, support adding and removing,
// key is the accumulated int64_t value
std::map<int64_t, std::shared_ptr<VariableWrapperHook>> hooks_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why map here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hook remove helper need to hold hook id for removing it correctlly

ForFishes
ForFishes previously approved these changes Mar 30, 2021
Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* If the gradient has been calculated by previous graph,
* it should be added to the previous graph result.
* If the leaf gradient has been calculated done, the inner_var_
* should be added to the var_.
*/
if (!var_->IsLeafGrad() || !SumGradCompleted() || !HasInnerVar()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!HasInnerVar() 这个应该能去掉了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不能吧,现在每次调用AccumulatedGrad仍然要求有InnerVar的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯对

for (const auto& hook_pair : var_->GetHooks()) {
tmp_var = (*hook_pair.second)(tmp_var);
}
inner_var_ = tmp_var;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

叶子节点在GradientAccumulator里面做CallGradientHooks就会替代自己内部的inner_var_,相当于inplace了吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,本来就是inplace的,这里改成这样,主要目的是统一hook的基类管理和调用,如果这里使用InplaceHook,那之前的HookPipeLine那些就仍然需要,数据结构和逻辑都会比较复杂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

zhwesky2010
zhwesky2010 previously approved these changes Mar 30, 2021
TCChenlong
TCChenlong previously approved these changes Mar 31, 2021
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

JiabinYang
JiabinYang previously approved these changes Mar 31, 2021
Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lanxianghit
lanxianghit previously approved these changes Mar 31, 2021
@chenwhql chenwhql merged commit dbeb3ea into PaddlePaddle:develop Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants