Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speed Compiling]: Reduce NVCC compiling files. #5573

Merged
merged 12 commits into from
Nov 16, 2017

Conversation

qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Nov 11, 2017

Related to #5491

Testing : local machine

  • env: centos, cuda 7.5, make -j8, WITH_GPU=ON
  • time: 31m24.320s -> 26m43.523s

Now the PaddlePaddle compiling is much slower, about 30min in GPU server each time. I tested the NVCC and G++ compiling time by operator compiling, the NVCC is slower than G++ more than 1min. And some GPU kernel implementation of operators can be compiled by G++. So I try to change about 25 cu operator files to cu.cc and make them compiled by G++ to speed the compiling time. But I'm not sure whether it is necessary.

@qingqing01 qingqing01 changed the title [Speed Compiling Test]: Reduce NVCC compiling files. [Speed Compiling]: Reduce NVCC compiling files. Nov 13, 2017
hedaoyuan
hedaoyuan previously approved these changes Nov 13, 2017
[self.input_size[0]]]
self.output_represention = 8 # output feature size

#class TestSeqProjectCase1(TestSeqProject):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is these codes can be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resume these unit testing.

input.data<T>(), bias.data<T>(), output->data<T>(), in_dims[0], size);
}
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use eigen to do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Move RowwiseAdd to paddle/operators/math/math_function and use Eigen for CPU and GPU.

math::SetConstant<Place, T> set;
set(dev_ctx, &ones, static_cast<T>(1));
math::gemv<Place, T>(dev_ctx, true, m, n, 1., batch_gate_grad.data<T>(),
ones.data<T>(), 0., bias_grad->data<T>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used at many places, should we make a similar function as RowwiseAdd to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add a ColwiseSum functor in paddle/operators/math/math_function.h to compute the gradient of bias.

void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, const framework::Tensor& bias,
framework::Tensor* output);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this at sequence2batch.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, this functor is only used in lstm_op and gru_op. Now I move it to paddle/operators/math/math_function.

@emailweixu
Copy link
Collaborator

Another possibility is to merge .cc and .cu.cc as a single .cc. What do everyone think?

@hedaoyuan
Copy link
Contributor

I agree with this.

@luotao1
Copy link
Contributor

luotao1 commented Nov 16, 2017

Can this PR be merged at first, since it reduces NVCC compile time, and many files will be conflicted with it. How about merge .cc and .cu.cc as a single .cc in next PR?

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR conflicts with many files, we will merge it at first. And un-finished works will be continued in next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants