[Issue] Qwen-14B-Chat init fail and performance issue. #275

liutongxuan · 2024-07-16T09:03:55Z

Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU

Issue 1:
Error: can't init model correctly. Disable CUDA Graph, then launch successful. It seems bugs in CUDA Graph when distributed inference.

Issue 2:
Once disable CUDA Graph, model could successfully be launched, but TTFT (Time to First Token) is nearly to 50 seconds which is too slow.

guocuimi · 2024-07-17T19:54:01Z

Thank you for reporting the issue. It seems both instances are related to slow performance with NCCL. To investigate further, we need additional information. Please provide the following context:

GPU Topology:
Please run the command nvidia-smi topo -mp and share the output. This will help us understand the GPU interconnect topology.
Reference: Understanding NVIDIA GPU Topologies
NCCL Logs:
Before running ScaleLLM, enable NCCL logs with NCCL_DEBUG=INFO. This will provide detailed NCCL logs that can assist in diagnosing the issue.
Reference: NCCL Environment Variables

Additionally, it would be beneficial if you could measure the inter-GPU connection speed for your node. Here’s a guide on how to measure it: Measuring Inter-GPU Connection Speed

This information will greatly aid us in pinpointing the root cause of the performance issue. Thank you.

guocuimi · 2024-08-09T17:30:58Z

you can try to run python -m scalellm.utils.collect_env to collect environment info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue] Qwen-14B-Chat init fail and performance issue. #275

[Issue] Qwen-14B-Chat init fail and performance issue. #275

liutongxuan commented Jul 16, 2024

guocuimi commented Jul 17, 2024

guocuimi commented Aug 9, 2024

[Issue] Qwen-14B-Chat init fail and performance issue. #275

[Issue] Qwen-14B-Chat init fail and performance issue. #275

Comments

liutongxuan commented Jul 16, 2024

guocuimi commented Jul 17, 2024

guocuimi commented Aug 9, 2024