Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enable peer access in case of the GPUs are located over QPI #3319

Closed
wants to merge 1 commit into from

Conversation

buaaliyi
Copy link
Contributor

I've found that for some hardware architectures, such as several types of HP server, the GPUs over QPI can even enable p2p access between each others, however, the bandwidth is quit low (less than 200MB/s instead of 6~8GB/s in normal). So I think there should not to enable peer access between GPUs which were plugged on the different I/O Hubs (IOH) to ensure at least normal cudaMemcpy performance.

The following quote is from our related hardware engineer.

quote:
"""
NVIDIA GPUs are designed to take full advantage of the PCI-e Gen2 standard, including the Peer-to-Peer communication, but the IOH chipset does not support the full PCI-e Gen2 specification for P2P communication with other IOH chipsets
The cudaPeerEnable() API call will return an error code if the application tries to establish a P2P relationship between two GPUs that would require P2P communication over QPI. The cudaMemcopy() function for P2P Direct Transfers automatically falls back to using a Device-to-Host-to-Device path, but there is no automatic fallback for P2P Direct Access (P2P load/store instructions in device code).
One known example system is the HP Z800 workstation with dual IOH chipsets which can run the simpleP2P example, but bandwidth is very low (100s of MB/s instead of several GB/s) because of the fallback path.
NVIDIA is investigating whether GPU P2P across QPI can be supported by adding functionality to future GPU architectures.
"""

@flx42
Copy link
Contributor

flx42 commented Nov 12, 2015

What is your driver version?
I think this issue was fixed in a recent driver update, and this sort of low-level hack should not be part of Caffe IMO.

@buaaliyi
Copy link
Contributor Author

@flx42 The driver version is 346.46, which is the release version with CUDA 7.0.

@flx42
Copy link
Contributor

flx42 commented Nov 13, 2015

I don't remember exactly when it was fixed, could you try on 352.39? The official version for CUDA 7.5.

@buaaliyi
Copy link
Contributor Author

OK, I'll try it. Thank you for your advise.

@shelhamer
Copy link
Member

Closing given new parallelism in #4563

@shelhamer shelhamer closed this Mar 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants