-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speed Compiling]: Reduce NVCC compiling files. #5573
Conversation
916a9b9
to
37ee17c
Compare
37ee17c
to
5f21709
Compare
[self.input_size[0]]] | ||
self.output_represention = 8 # output feature size | ||
|
||
#class TestSeqProjectCase1(TestSeqProject): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is these codes can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resume these unit testing.
e97a5fb
to
e9082bb
Compare
input.data<T>(), bias.data<T>(), output->data<T>(), in_dims[0], size); | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use eigen to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Move RowwiseAdd to paddle/operators/math/math_function
and use Eigen for CPU and GPU.
paddle/operators/gru_op.h
Outdated
math::SetConstant<Place, T> set; | ||
set(dev_ctx, &ones, static_cast<T>(1)); | ||
math::gemv<Place, T>(dev_ctx, true, m, n, 1., batch_gate_grad.data<T>(), | ||
ones.data<T>(), 0., bias_grad->data<T>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is used at many places, should we make a similar function as RowwiseAdd to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Add a ColwiseSum functor in paddle/operators/math/math_function.h
to compute the gradient of bias.
void operator()(const platform::DeviceContext& context, | ||
const framework::Tensor& input, const framework::Tensor& bias, | ||
framework::Tensor* output); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this at sequence2batch.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, this functor is only used in lstm_op and gru_op. Now I move it to paddle/operators/math/math_function
.
Another possibility is to merge .cc and .cu.cc as a single .cc. What do everyone think? |
I agree with this. |
Can this PR be merged at first, since it reduces NVCC compile time, and many files will be conflicted with it. How about merge .cc and .cu.cc as a single .cc in next PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this PR conflicts with many files, we will merge it at first. And un-finished works will be continued in next PR.
Related to #5491
Testing : local machine
Now the PaddlePaddle compiling is much slower, about 30min in GPU server each time. I tested the NVCC and G++ compiling time by operator compiling, the NVCC is slower than G++ more than 1min. And some GPU kernel implementation of operators can be compiled by G++. So I try to change about 25 cu operator files to cu.cc and make them compiled by G++ to speed the compiling time. But I'm not sure whether it is necessary.