Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The current SE-Resnext 152 benchmark result on P40 #10706

Closed
chengduoZH opened this issue May 16, 2018 · 5 comments
Closed

The current SE-Resnext 152 benchmark result on P40 #10706

chengduoZH opened this issue May 16, 2018 · 5 comments
Assignees

Comments

@chengduoZH
Copy link
Contributor

chengduoZH commented May 16, 2018

Env

Net work config

  • se-resnext 152
  • Excluding data read operations
  • mem_opt:ON
  • flower data set
  • batch_size:320(40/card)
/ 8 card - single card - Acceleration ratio
/ sec/batch image/sec sec/batch image/sec -
parallel _do 3.100722447 103.2017555 1.783022 22.43382303 4.600275014
parallel_exe 2.870211447 111.4900438 1.71482642 23.32597605 4.779651817
parallel_exe + balance_param_opt 2.864137779 111.7264687 - - 4.78978751
@typhoonzero
Copy link
Contributor

Can you paste the result of nvidia-smi topo -m, if not all GPUs are connected with PCIe switch, it may reduce the performance. In that case, you can test 4 GPU ratio for reference.

@chengduoZH
Copy link
Contributor Author

@typhoonzero Thanks!
The result is:

GPU0	 X 	PIX	PIX	PIX	PXB	PXB	PXB	PXB	SOC	0-13
GPU1	PIX	 X 	PIX	PIX	PXB	PXB	PXB	PXB	SOC	0-13
GPU2	PIX	PIX	 X 	PIX	PXB	PXB	PXB	PXB	SOC	0-13
GPU3	PIX	PIX	PIX	 X 	PXB	PXB	PXB	PXB	SOC	0-13
GPU4	PXB	PXB	PXB	PXB	 X 	PIX	PIX	PIX	SOC	0-13
GPU5	PXB	PXB	PXB	PXB	PIX	 X 	PIX	PIX	SOC	0-13
GPU6	PXB	PXB	PXB	PXB	PIX	PIX	 X 	PIX	SOC	0-13
GPU7	PXB	PXB	PXB	PXB	PIX	PIX	PIX	 X 	SOC	0-13
mlx5_0	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	 X

Legend:

  X   = Self
  SOC  = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

@chengduoZH
Copy link
Contributor Author

The following is the performant comparison of four cards:

/ 4 card - single card - Acceleration ratio
/ s/batch image/s s/batch image/s -
parallel_do 2.30841 69.31177737 1.783022 22.43382303 3.089610598
parallel_exe 2.192076294 72.99016027 1.71482642 23.32597605 3.129136381
parallel_exe + balance_param_opt 2.179436251 73.41348019 - - 3.147284385

@typhoonzero
Copy link
Contributor

Seems adding 4 more cards only added about 1.5 equal GPU cards, not sure whether this is affected by the topo.

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants