You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We followed Accelerated-RLHF.md to run the accelerate the PPO training by using TensorRT-LLM. After launching the reward model and critic server, we launched the initial policy and PPO actor training. We encountered an error at the beginning of the 2nd step of the PPO actor training:
The name of weights is nullptr or does not correspond to any refittable weights.
The number of weights is inconsistent with the original specification.
To debug this error, we retrieved the existing weights given the name in TensorRT-LLM (see commit aff4e0f5) and found that they are empty, i.e., values == nullptr and count == 0. Note that the 1st step was completed successfully, which implies that the engine was compiled and was able to generate the response.
Steps/Code to reproduce bug
We can reproduce this bug on a p4DE with 8 A100 GPUs.
Attached are the model_config.yaml of the 3 models for which we hit this issue, i.e., GPT-2B, Mistral-7B-Instruct-v0.2, and tulu-2-7b. Note that Github does not allow the upload of yaml files, so we changed the file extension to .txt.
Describe the bug
We followed Accelerated-RLHF.md to run the accelerate the PPO training by using TensorRT-LLM. After launching the reward model and critic server, we launched the initial policy and PPO actor training. We encountered an error at the beginning of the 2nd step of the PPO actor training:
According to the comment of
setNamedWeights
in TensorRT v.9.3.0,setNamedWeights
may fail due to two reasons:To debug this error, we retrieved the existing weights given the name in TensorRT-LLM (see commit aff4e0f5) and found that they are empty, i.e.,
values == nullptr
andcount == 0
. Note that the 1st step was completed successfully, which implies that the engine was compiled and was able to generate the response.Steps/Code to reproduce bug
We can reproduce this bug on a p4DE with 8 A100 GPUs.
Expected behavior
We expect the PPO actor training job to succeed.
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
8 NVIDIA A100-SXM4-80GB
The text was updated successfully, but these errors were encountered: