Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue] Qwen-14B-Chat init fail and performance issue. #275

Open
liutongxuan opened this issue Jul 16, 2024 · 2 comments
Open

[Issue] Qwen-14B-Chat init fail and performance issue. #275

liutongxuan opened this issue Jul 16, 2024 · 2 comments

Comments

@liutongxuan
Copy link
Collaborator

Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU

Issue 1:
Error: can't init model correctly. Disable CUDA Graph, then launch successful. It seems bugs in CUDA Graph when distributed inference.

Issue 2:
Once disable CUDA Graph, model could successfully be launched, but TTFT (Time to First Token) is nearly to 50 seconds which is too slow.

@guocuimi
Copy link
Collaborator

Thank you for reporting the issue. It seems both instances are related to slow performance with NCCL. To investigate further, we need additional information. Please provide the following context:

  1. GPU Topology:
    Please run the command nvidia-smi topo -mp and share the output. This will help us understand the GPU interconnect topology.
    Reference: Understanding NVIDIA GPU Topologies

  2. NCCL Logs:
    Before running ScaleLLM, enable NCCL logs with NCCL_DEBUG=INFO. This will provide detailed NCCL logs that can assist in diagnosing the issue.
    Reference: NCCL Environment Variables

Additionally, it would be beneficial if you could measure the inter-GPU connection speed for your node. Here’s a guide on how to measure it: Measuring Inter-GPU Connection Speed

This information will greatly aid us in pinpointing the root cause of the performance issue. Thank you.

@guocuimi
Copy link
Collaborator

guocuimi commented Aug 9, 2024

you can try to run python -m scalellm.utils.collect_env to collect environment info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants