-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encountered an error in forwardAsync function: [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemcpyAsync(dst, src.data(), src.getSizeInBytes(), cudaMemcpyDefault, mStream->get()): invalid argument #2358
Comments
@symphonylyh, could you please take a look at this? |
Hi @zhaocc1106, where is this |
Yes, |
I encountered same problem
TP 4 llama like model and executor API I'm passing the pinned tensor as an embeddings. If I pass the kCpu tensor everything will be fine. I guess you miss synchronization point in batch_manager code. Because transfer from kCpu to kGpu implies implicit sync but transfer from kPinned to kGpu (and from kGpu to kGpu) does't. |
@zhaocc1106 try passing kCPU tensor instead of kGPU as a workaround |
But my vit_embeding is the output of tensorrt. It's in gpu device memory and i copy to trtllm gpu mem by D2D copy. Copy to cpu will result in time wasting. It's strange that if the first request have no image, following image request will be ok. |
I know, but it works for me ) |
thanks, i will try |
ENV:
ISSUE:
I use c++ api of "tensorrt_llm/batch_manager/" to deploy a multi-modal llm. When i build tensorrt-llm engine with
--tp 4
and use 4 gpus to deploy a service. If the first request is a image request, will have following err.But if the first request have no image, following image request will be ok. Even though, if i use 1 gpu to deploy, first image request will be also ok. Image request means have prompt-tunning table input, like codes:
The text was updated successfully, but these errors were encountered: