How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine... #2371

JoJoLev · 2024-10-24T19:19:43Z

Is this expected? I figured there would be rank engines? I passed these in the lora directory on the engine build:
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_tp1 --output_dir /opt/tensorrt_llm_engine --gemm_plugin auto --lora_plugin auto --max_batch_size 8 --max_input_len 512 --max_seq_len 562 --lora_dir "/opt/lora_1" "/opt/lora_2" --max_lora_rank 8 --lora_target_modules attn_q attn_k attn_v

Any advice is appreciated.

syuoni · 2024-10-28T01:24:28Z

Hi @JoJoLev ,

I suppose the output folder is expected. You built the eigine with TP=1, and there was one rank0.engine. The LoRA weights are saved in adapter_model.safetensors under each LoRA folder.

JoJoLev · 2024-10-28T02:37:06Z

Hi @syuoni

Thanks for the response. Yes, I have a rank0.engine file, and config. My question now is that when I deploy on to a container, say NVIDIA Triton, do I have to include the LoRA weights? Or have those been baked in to the rank0.engine?

syuoni · 2024-10-28T02:40:35Z

Hi @syuoni

Thanks for the response. Yes, I have a rank0.engine file, and config. My question now is that when I deploy on to a container, say NVIDIA Triton, do I have to include the LoRA weights? Or have those been baked in to the rank0.engine?

Yes, you have to include the LoRA weights. They are not baked into the engine because TRT-LLM supports multi-lora, so has to load LoRA weights dynamically at runtime.

JoJoLev · 2024-10-28T02:43:06Z

@syuoni got it!

Thank you! So, after running my engine build I had the aforementioned folder structure, if I was to deploy on NVIDIA Triton, would I include the LoRA weights in the 1/ sub folder where my rank0.engine file and config.json are? Or would I be placed on a different path?
I believe this is the container we are going with on deployment.

Superjomn assigned syuoni Oct 26, 2024

Superjomn added question Further information is requested triaged Issue has been triaged by maintainers build labels Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine... #2371

How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine... #2371

JoJoLev commented Oct 24, 2024

syuoni commented Oct 28, 2024

JoJoLev commented Oct 28, 2024

syuoni commented Oct 28, 2024

JoJoLev commented Oct 28, 2024

How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine... #2371

How to integrate Multi-LoRA Setup at Inference with NVIDIA Triton / TensorRT-LLM? I built the engine... #2371

Comments

JoJoLev commented Oct 24, 2024

syuoni commented Oct 28, 2024

JoJoLev commented Oct 28, 2024

syuoni commented Oct 28, 2024

JoJoLev commented Oct 28, 2024