You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to deploy LLAMA 3.2 vision 4 bit bitsandbytes quantized model as a sagemaker endpoint, but I've encountered one error regarding quantization.
As per the above image it says it is receiving quantization as 'None' even though I've set it's properties during configuration while creating serving.properties of sagemaker endpoint.
When I re-iterated the error in vllm github repo then I found that the actual reason of the error that parameter quantization is not receiving its value.
Can you guys help me with the possible solution or need to wait for another version?
The text was updated successfully, but these errors were encountered:
adi7820
changed the title
DJL 0.30 Sagemaker Endpoint Deployment of quantized model parameter option.quantization is not working
DJL 0.30 Sagemaker Endpoint Deployment using vllm of quantized model parameter option.quantization is not working
Nov 28, 2024
Hello,
I'm trying to deploy LLAMA 3.2 vision 4 bit bitsandbytes quantized model as a sagemaker endpoint, but I've encountered one error regarding quantization.
As per the above image it says it is receiving quantization as 'None' even though I've set it's properties during configuration while creating serving.properties of sagemaker endpoint.
%%writefile serving.properties
engine=Python
option.model_id=unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit
option.rolling_batch=vllm
option.dtype=bf16
option.max_model_len=8192
option.max_num_seqs=1
option.enforce_eager=True
option.gpu_memory_utilization=0.9
option.quantization=bitsandbytes
option.load_format=bitsandbytes
When I re-iterated the error in vllm github repo then I found that the actual reason of the error that parameter quantization is not receiving its value.
Can you guys help me with the possible solution or need to wait for another version?
The text was updated successfully, but these errors were encountered: