Does the DeepSpeed support automatic selection of different types of network cards, such as Ethernet and high-speed IB network cards? #3723
Unanswered
pengshuang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Our scenario involves two heterogeneous GPU clusters, Cluster A and Cluster B, each consisting of 20 GPU machines (A100-80G). Cluster A is internally equipped with both high-speed IB cards and regular Ethernet cards, while Cluster B is internally equipped with high-speed RoCE cards and regular Ethernet cards. Due to the inability to establish a high-speed IB network between Cluster A and Cluster B, communication between them can only be done via Ethernet using TCP/IP (Socket).
Our objective is to have machines within Cluster A and Cluster B connected through high-speed cards, while machines between Cluster A and Cluster B are connected through regular Ethernet networking. I would like to inquire whether DeepSpeed can support automatic card detection and configuration to enable distributed training between the heterogeneous clusters. If it doesn't support it, is it possible to achieve this by modifying the source code?
I am unsure of the feasibility and would greatly appreciate your response. Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions