-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sequence expand op #11618
Fix sequence expand op #11618
Conversation
@@ -151,8 +151,8 @@ struct SequenceExpandGradFunctor<platform::CPUDeviceContext, T> { | |||
const framework::Vector<size_t>& x_lod, /*expand source lod*/ | |||
const framework::Vector<size_t>& ref_lod, /*expand referenced lod*/ | |||
LoDTensor* dx) { | |||
math::SetConstant<platform::CPUDeviceContext, T> set_zero; | |||
set_zero(context, dx, static_cast<T>(0)); | |||
// math::SetConstant<platform::CPUDeviceContext, T> set_zero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Actually, the sequence expand op
may also give wrong gradient value even memory optimizer is off.
@wanghaoshuang Please have a test on the attention-based OCR model to make sure that this change solves the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! thanks!
Sequence expand op's GPU grad kernel implementation is not robust enough if memory optimizer is on.
The GPU kernel directly computed the sum of gradient without checking the initial value in d_x tensor.
In this PR, I moved the "set zero" function outside the functor to guarantee d_x is set to zero both on CPU and GPU.