Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. #35510

Merged
merged 20 commits into from
Sep 20, 2021

Conversation

arogowie-intel
Copy link
Contributor

@arogowie-intel arogowie-intel commented Sep 6, 2021

PR types

Performance optimization

PR changes

OPs

Describe

This PR optimize current implementations for SGD and SUM operators for BF16 data type (mainly) when SelectedRows (sparse) tensors were used by reusing OneDNN handler.

The performance results on word2vec model are as follows on CPX 6348 machine with single thread:

type commit engine words/sec
bf16 c56d697 oneDNN 18142.32
bf16 This PR oneDNN 20814.31
fp32 c56d697 CPU 27680.99

This gives ~15% speedup.

From profiling:

type commit sgd total[ms] sgd % sum total[ms] sum %
fp32 c56d697 117.77 2.5 395.31 8
bf16 c56d697 954.94 8.6 5151.35 47
bf16 wo cache 339.28 4.6 2159.22 29

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 6, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim
@arogowie-intel arogowie-intel marked this pull request as ready for review September 9, 2021 06:36
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja
Copy link
Contributor

jczaja commented Sep 10, 2021

@wzzju Please tak a look at changes in this PR

@jczaja jczaja requested a review from wzzju September 10, 2021 08:15
@lidanqing-intel
Copy link
Contributor

@wzzju Hi, please take a look at this PR, use oneDNN sum and SGD

@jczaja
Copy link
Contributor

jczaja commented Sep 14, 2021

@tsocha Please review this PR

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_ENFORCE

@jczaja
Copy link
Contributor

jczaja commented Sep 20, 2021

@chenwhql could you please approve PR-CI-APPROVAL ?

@jczaja jczaja merged commit 799f386 into PaddlePaddle:develop Sep 20, 2021
@arogowie-intel arogowie-intel deleted the aosewski/sgd_axpy_op branch September 21, 2021 07:31
ghost pushed a commit to piotrekobi/Paddle that referenced this pull request Sep 24, 2021
…addlePaddle#35510)

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021
…addlePaddle#35510)

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants