You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built the engine, and had two separate LoRA layers with the base llama3.1 model. The output from the build is rank0.engine, config.json, and then a lora folder with the following structure:
lora
|
|>0
| |_> adapter_config.json
| |> adapter_model.safetensors
|
|>1
| |> adapter_config.json
|__ |> adapter_model.safetensors
Is this expected? I figured there would be rank engines? I passed these in the lora directory on the engine build:
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_tp1 --output_dir /opt/tensorrt_llm_engine --gemm_plugin auto --lora_plugin auto --max_batch_size 8 --max_input_len 512 --max_seq_len 562 --lora_dir "/opt/lora_1" "/opt/lora_2" --max_lora_rank 8 --lora_target_modules attn_q attn_k attn_v
Any advice is appreciated.
The text was updated successfully, but these errors were encountered:
I suppose the output folder is expected. You built the eigine with TP=1, and there was one rank0.engine. The LoRA weights are saved in adapter_model.safetensors under each LoRA folder.
Thanks for the response. Yes, I have a rank0.engine file, and config. My question now is that when I deploy on to a container, say NVIDIA Triton, do I have to include the LoRA weights? Or have those been baked in to the rank0.engine?
Thanks for the response. Yes, I have a rank0.engine file, and config. My question now is that when I deploy on to a container, say NVIDIA Triton, do I have to include the LoRA weights? Or have those been baked in to the rank0.engine?
Yes, you have to include the LoRA weights. They are not baked into the engine because TRT-LLM supports multi-lora, so has to load LoRA weights dynamically at runtime.
Thank you! So, after running my engine build I had the aforementioned folder structure, if I was to deploy on NVIDIA Triton, would I include the LoRA weights in the 1/ sub folder where my rank0.engine file and config.json are? Or would I be placed on a different path?
I believe this is the container we are going with on deployment.
I built the engine, and had two separate LoRA layers with the base llama3.1 model. The output from the build is rank0.engine, config.json, and then a lora folder with the following structure:
lora
|
|>0
| |_> adapter_config.json
| |> adapter_model.safetensors
|
|>1
| |> adapter_config.json
|__ |> adapter_model.safetensors
Is this expected? I figured there would be rank engines? I passed these in the lora directory on the engine build:
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_tp1 --output_dir /opt/tensorrt_llm_engine --gemm_plugin auto --lora_plugin auto --max_batch_size 8 --max_input_len 512 --max_seq_len 562 --lora_dir "/opt/lora_1" "/opt/lora_2" --max_lora_rank 8 --lora_target_modules attn_q attn_k attn_v
Any advice is appreciated.
The text was updated successfully, but these errors were encountered: