You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Runtime using ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm:v0.0.1-tt-metal-385904186f81-384f1790c3be
Model Input Dumps
No response
🐛 Describe the bug
vLLM model backend crashed when running single user prompt, after several prompts were completed and some time had passed between usage it failed on first prompt sent again.
The prompts sent initially varied from 1-2048 context, ~2 hours passed, and the prompt causing failure was sent. The failure causing prompt was "Can you tell me a joke?", on the order of 10 tokens.
Logs below:
INFO 11-17 13:49:12 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:22 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:40 logger.py:37] Received request cmpl-5a89b06405434a5ba0e8845972bbfc22-0: prompt: '\n<|begin_of_text|>\n\n<|start_header_id|>user<|end_header_id|>\n\n\nTell me a fun fact.\n\n<|start_header_id|>assistant<|end_header_id|>\n\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=0.9, top_k=20, min_p=0.0, seed=None, stop=['<|eot_id|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [128000, 198, 128000, 271, 128006, 882, 128007, 1432, 41551, 757, 264, 2523, 2144, 382, 128006, 78191, 128007, 1432], lora_request: None, prompt_adapter_request: None.
INFO: 172.18.0.4:58712 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 11-17 13:49:40 engine.py:291] Added request cmpl-5a89b06405434a5ba0e8845972bbfc22-0.
2024-11-17 13:49:40.071 | INFO | models.demos.t3000.llama2_70b.tt.llama_generation:prefill_forward:316 - Filling kv cache for user 1
ERROR 11-17 13:49:48 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 11-17 13:49:48 client.py:250] NoneType: None
ERROR: Exception in ASGI application
+ Exception Group Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 188, in __call__
| await response(scope, wrapped_receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 222, in __call__
| async for chunk in self.body_iterator:
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 179, in body_stream
| raise app_exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 149, in coro
| await self.app(scope, receive_or_disconnect, send_no_error)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
| await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 715, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
| await wrap(partial(self.listen_for_disconnect, receive))
| File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
| return await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
| await self.app(scope, receive, _send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
| response_sent.set()
| File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
| self.gen.throw(type, value, traceback)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
| async for prompt_idx, res in result_generator:
| File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
| item = await d
| File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
| raise request_output
| vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
+------------------------------------
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 77, in collapse_excgroups
| yield
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
| response_sent.set()
| File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
| await wrap(partial(self.listen_for_disconnect, receive))
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
| message = await receive()
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 126, in receive_or_disconnect
| message = await wrap(wrapped_receive)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 121, in wrap
| result = await func()
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 51, in wrapped_receive
| msg = await self.receive()
| File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
| await self.message_event.wait()
| File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait
| await fut
| asyncio.exceptions.CancelledError
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 188, in __call__
| await response(scope, wrapped_receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 222, in __call__
| async for chunk in self.body_iterator:
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 179, in body_stream
| raise app_exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 149, in coro
| await self.app(scope, receive_or_disconnect, send_no_error)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
| await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 715, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
| await wrap(partial(self.listen_for_disconnect, receive))
| File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
| return await self.app(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
| await self.app(scope, receive, _send)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
| response_sent.set()
| File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
| self.gen.throw(type, value, traceback)
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
| raise exc
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
| async for prompt_idx, res in result_generator:
| File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
| item = await d
| File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
| raise request_output
| vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
+------------------------------------
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
response_sent.set()
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
raise exc
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
async for chunk in self.body_iterator:
File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
async for prompt_idx, res in result_generator:
File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
item = await d
File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
raise request_output
vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
CRITICAL 11-17 13:51:22 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 172.18.0.4:40756 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
tt-metal: tenstorrent/tt-metal@3859041
vllm: 384f179
Runtime using ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm:v0.0.1-tt-metal-385904186f81-384f1790c3be
Model Input Dumps
No response
🐛 Describe the bug
vLLM model backend crashed when running single user prompt, after several prompts were completed and some time had passed between usage it failed on first prompt sent again.
The prompts sent initially varied from 1-2048 context, ~2 hours passed, and the prompt causing failure was sent. The failure causing prompt was "Can you tell me a joke?", on the order of 10 tokens.
Logs below:
Possibly related to #35
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: