Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operators profiling in Transformer model. #697

Closed
peterzhang2029 opened this issue Mar 8, 2018 · 3 comments
Closed

Operators profiling in Transformer model. #697

peterzhang2029 opened this issue Mar 8, 2018 · 3 comments

Comments

@peterzhang2029
Copy link
Contributor

peterzhang2029 commented Mar 8, 2018

Operators profiling:

Run 1 pass.

GPU:TITAN X (Pascal, 12GB global memory)

------------------------->     Profiling Report     <-------------------------

Place: CUDA
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::mul_grad                43941       13300.3     0.171008    5.76614     0.302686
thread0::layer_norm_grad         13590       12576.8     0.786432    58.5759     0.925442
thread0::layer_norm              13590       9943.62     0.661504    0.884736    0.731687
thread0::mul                     43941       6476.36     0.0768      3.70176     0.147388
thread0::softmax                 453         3779.52     7.69248     9.76282     8.3433
thread0::matmul_grad             16308       3431.95     0.145408    0.321536    0.210446
thread0::elementwise_add_grad    33522       2000.51     0.004096    0.2816      0.0596776
thread0::adam                    82899       1963.99     0.003072    0.421888    0.0236914
thread0::sum                     22197       1780.81     0.01024     0.611328    0.0802277
thread0::matmul                  16308       1413.82     0.034816    3.33517     0.0866951
thread0::transpose               32616       1371.65     0.028672    2.69722     0.0420544
thread0::transpose_grad          32616       1370.53     0.028672    0.070656    0.0420202
thread0::elementwise_add         33522       1045.33     0.007168    24.4326     0.0311834
thread0::dropout_grad            22650       644.103     0.009216    0.060416    0.0284372
thread0::softmax_grad            453         611.858     0.900096    2.48218     1.35068
thread0::dropout                 22650       600.103     0.006144    3.95059     0.0264946
thread0::scale                   17214       395.886     0.003072    0.043008    0.0229979
thread0::relu_grad               5436        355.256     0.04608     0.105472    0.0653524
thread0::fill_zeros_like         49830       311.703     0.002048    0.026624    0.00625532
thread0::elementwise_mul         83352       282.403     0.002976    0.048128    0.00338808
thread0::relu                    5436        258.603     0.031744    10.8749     0.0475723
thread0::elementwise_div_grad    8154        249.431     0.02048     0.062464    0.0305901
thread0::reduce_sum_grad         8607        245.62      0.003072    0.070656    0.0285372
thread0::lookup_table_grad       1812        191.795     0.043008    0.264192    0.105847
thread0::reduce_sum              8607        136.825     0.004096    0.070656    0.015897
thread0::exp_grad                8154        117.174     0.007168    0.043008    0.0143702
thread0::elementwise_div         8154        96.2836     0.007168    0.046976    0.0118081
thread0::cross_entropy_grad      453         86.9844     0.135168    0.362496    0.192019
thread0::exp                     8154        76.608      0.004096    0.03072     0.00939514
thread0::lookup_table            1812        73.7901     0.023552    0.106496    0.040723
thread0::reshape                 33975       66.6537     0.001024    3.75808     0.00196184
thread0::reshape_grad            33975       63.0589     0.001024    0.01536     0.00185604
thread0::cross_entropy           453         2.26896     0.004096    0.013312    0.00500874
thread0::elementwise_mul_grad    453         1.62397     0.003072    0.014336    0.00358492

CPU: Single thread

------------------------->     Profiling Report     <-------------------------

Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::mul_grad                43941       1.37484e+06 14.1704     934.205     31.2884
thread0::transpose_grad          32616       1.02324e+06 20.8854     122.053     31.3724
thread0::transpose               32616       1.0213e+06  20.8805     88.0507     31.3129
thread0::mul                     43941       670698      7.00583     390.276     15.2636
thread0::softmax                 453         462634      696.503     1808.74     1021.27
thread0::layer_norm_grad         13590       375420      18.1468     95.6174     27.6247
thread0::adam                    82899       318720      0.012018    177.504     3.84468
thread0::layer_norm              13590       207723      10.2002     47.4344     15.285
thread0::reduce_sum_grad         8607        205651      0.016318    131.23      23.8935
thread0::softmax_grad            453         148503      222.244     589.22      327.821
thread0::dropout                 22650       105028      1.1015      19.8472     4.637
thread0::matmul_grad             16308       49075.7     1.43275     21.8065     3.0093
thread0::elementwise_add         33522       34885.7     0.172053    33.2651     1.04068
thread0::sum                     22197       33028.7     0.203313    22.527      1.48798
thread0::elementwise_add_grad    33522       30067.1     0.109021    16.5107     0.896935
thread0::relu_grad               5436        23078.8     2.31625     20.2764     4.24554
thread0::dropout_grad            22650       20590.6     0.169235    8.56589     0.909077
thread0::matmul                  16308       20140.8     0.559672    8.58716     1.23502
thread0::elementwise_div_grad    8154        12677.3     0.598271    9.90769     1.55473
thread0::fill_zeros_like         49830       12354.7     0.002088    9.29022     0.247938
thread0::scale                   17214       11808.7     0.001299    5.37838     0.685996
thread0::relu                    5436        7071.9      0.691306    14.388      1.30094
thread0::lookup_table_grad       1812        5678.67     0.146436    21.6215     3.13392
thread0::cross_entropy_grad      453         5495.68     5.93957     38.3655     12.1317
thread0::reduce_sum              8607        5443.33     0.004744    4.33712     0.632431
thread0::exp                     8154        3926.4      0.191484    3.60489     0.48153
thread0::elementwise_div         8154        3443.53     0.1601      4.35008     0.422312
thread0::exp_grad                8154        3120.79     0.124565    3.20178     0.382731
thread0::lookup_table            1812        1550.32     0.497613    2.62074     0.855583
thread0::elementwise_mul         83352       335.148     0.002086    0.041859    0.00402087
thread0::reshape_grad            33975       308.679     0.003699    2.49831     0.00908548
thread0::reshape                 33975       114.659     0.001462    2.63728     0.00337482
thread0::cross_entropy           453         91.1172     0.110631    0.720848    0.201142
thread0::elementwise_mul_grad    453         4.59584     0.007498    0.025791    0.0101453
@peterzhang2029
Copy link
Contributor Author

peterzhang2029 commented Mar 13, 2018

Train time:

Train time with single GPU(second):

GPU version: TITAN X (Pascal, 12GB global memory)

pass_id Fluid Pytorch
0 96.8643 106.2836
1 109.6547 99.4232
2 109.3142 106.3398
3 102.0829 101.7508
4 99.5857 106.6458
5 98.8272 103.1086
6 97.3120 106.4803
7 98.2273 105.1168
8 98.6190 99.5186
9 98.6494 100.9658
avg: 100.9137 103.5633

Train time with CPU(second):

pass_id Fluid Pytorch
0 5281.8969 6718.3502
1 5245.1814 6896.3676
2 5165.9476 7008.7306
avg: 5231.0086 6874.4828

@peterzhang2029
Copy link
Contributor Author

peterzhang2029 commented Mar 13, 2018

Convergence:

Please refer to #700

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants