You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for reporting the issue. It seems both instances are related to slow performance with NCCL. To investigate further, we need additional information. Please provide the following context:
GPU Topology:
Please run the command nvidia-smi topo -mp and share the output. This will help us understand the GPU interconnect topology.
Reference: Understanding NVIDIA GPU Topologies
NCCL Logs:
Before running ScaleLLM, enable NCCL logs with NCCL_DEBUG=INFO. This will provide detailed NCCL logs that can assist in diagnosing the issue.
Reference: NCCL Environment Variables
Additionally, it would be beneficial if you could measure the inter-GPU connection speed for your node. Here’s a guide on how to measure it: Measuring Inter-GPU Connection Speed
This information will greatly aid us in pinpointing the root cause of the performance issue. Thank you.
Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU
Issue 1:
Error: can't init model correctly. Disable CUDA Graph, then launch successful. It seems bugs in CUDA Graph when distributed inference.
Issue 2:
Once disable CUDA Graph, model could successfully be launched, but TTFT (Time to First Token) is nearly to 50 seconds which is too slow.
The text was updated successfully, but these errors were encountered: