-
Notifications
You must be signed in to change notification settings - Fork 965
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
What is the difference between stop_words_list and end_id
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2383
opened Oct 27, 2024 by
tonylek
CUDA Out of Memory Error when Running Nemotron-51B with TensorRT-LLM on 4xA100
Investigating
#2381
opened Oct 26, 2024 by
ShivamSphn
Error while importing tensorrt_llm
installation
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2380
opened Oct 26, 2024 by
Aaryanverma
build bert: build does not load model
bug
Something isn't working
build
triaged
Issue has been triaged by maintainers
#2379
opened Oct 26, 2024 by
Alireza3242
2 of 4 tasks
FP8 Conversion failure when using Mixtral 8x7B with use_fp8_rowwise
bug
Something isn't working
build
triaged
Issue has been triaged by maintainers
#2377
opened Oct 25, 2024 by
ValeGian
2 of 4 tasks
ModelRunner cannot start engine with "multi-rank nemo LoRA" checkpoints
bug
Something isn't working
build
triaged
Issue has been triaged by maintainers
#2376
opened Oct 25, 2024 by
jolyons123
TPOT=0 without In-flight Batching in benckmark
benchmark
performance issue
Issue about performance number
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2374
opened Oct 25, 2024 by
mltloveyy
Bug in build bert
bug
Something isn't working
build
triaged
Issue has been triaged by maintainers
#2373
opened Oct 24, 2024 by
Alireza3242
XQA kernel works slower with fp8 kv than with fp16 kv on H100
performance issue
Issue about performance number
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2372
opened Oct 24, 2024 by
ttim
4 tasks
UnsupportedOperatorError: ONNX export failed on an operator with unrecognized namespace flash_attn::_flash_attn_forward. If you are trying to export a custom operator, make sure you registered it with the right domain and version.
Investigating
triaged
Issue has been triaged by maintainers
#2369
opened Oct 24, 2024 by
scuizhibin
return_log_probs slow down generation
bug
Something isn't working
Investigating
performance issue
Issue about performance number
#2367
opened Oct 24, 2024 by
Desmond819
4 tasks
fast-forward tokens in logits post processor
feature request
New feature or request
runtime
triaged
Issue has been triaged by maintainers
#2365
opened Oct 23, 2024 by
mmoskal
c++ inference example
question
Further information is requested
runtime
#2361
opened Oct 22, 2024 by
scuizhibin
Error when run 'sudo make -C docker release_build'
build
Investigating
question
Further information is requested
#2360
opened Oct 21, 2024 by
SouthWest7
Encountered an error in forwardAsync function: [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemcpyAsync(dst, src.data(), src.getSizeInBytes(), cudaMemcpyDefault, mStream->get()): invalid argument
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2358
opened Oct 20, 2024 by
zhaocc1106
openai_server error
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2357
opened Oct 19, 2024 by
imilli
Build and run nvidia/Llama-3_1-Nemotron-51B-Instruct on a single A100 80Gb
quantization
Issue about lower bit quantization, including int8, int4, fp8
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2355
opened Oct 19, 2024 by
edesalve
qwen, tensorrt-llm=0.12.0
question
Further information is requested
runtime
#2353
opened Oct 18, 2024 by
yanglongbiao
2 of 4 tasks
[Question] Int8 Gemm's perf degraded in real models.
quantization
Issue about lower bit quantization, including int8, int4, fp8
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2351
opened Oct 18, 2024 by
foreverlms
free_gpu_memory_fraction not working for examples/apps/openai_server.py
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2350
opened Oct 18, 2024 by
anaivebird
2 of 4 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.