You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
Inferring chatglm version from path...
Chatglm version: glm4
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 10/10 [04:35<00:00, 27.53s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Calibration: 100%|█████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 10.68it/s]
Traceback (most recent call last):
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 263, in
main()
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 255, in main
convert_and_save_hf(args)
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 213, in convert_and_save_hf
ChatGLMForCausalLM.quantize(args.model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/model.py", line 351, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 723, in quantize
weights = load_weights_from_hf_model(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 438, in load_weights_from_hf_model
np.array([qkv_vals_int8['scale_y_quant_orig']],
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 1084, in array
return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
The text was updated successfully, but these errors were encountered:
System Info
GPU: NVIDIA RTX 4090
TensorRT-LLM 0.13
root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/chatglm# python3 convert_checkpoint.py --chatglm_version glm4 --model_dir "/llm/other/models/glm-4-9b-chat" --output_dir "/llm/other/trt-model" --dtype float16 --use_weight_only --int8_kv_cache --weight_only_precision int8
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
Inferring chatglm version from path...
Chatglm version: glm4
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 10/10 [04:35<00:00, 27.53s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Calibration: 100%|█████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 10.68it/s]
Traceback (most recent call last):
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 263, in
main()
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 255, in main
convert_and_save_hf(args)
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 213, in convert_and_save_hf
ChatGLMForCausalLM.quantize(args.model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/model.py", line 351, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 723, in quantize
weights = load_weights_from_hf_model(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 438, in load_weights_from_hf_model
np.array([qkv_vals_int8['scale_y_quant_orig']],
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 1084, in array
return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
The text was updated successfully, but these errors were encountered: