Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM model backend crashed when running single user prompts, less than 128 token input context #36

Open
1 task done
tstescoTT opened this issue Nov 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tstescoTT
Copy link

Your current environment

tt-metal: tenstorrent/tt-metal@3859041
vllm: 384f179

Runtime using ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm:v0.0.1-tt-metal-385904186f81-384f1790c3be

Model Input Dumps

No response

🐛 Describe the bug

vLLM model backend crashed when running single user prompt, after several prompts were completed and some time had passed between usage it failed on first prompt sent again.

The prompts sent initially varied from 1-2048 context, ~2 hours passed, and the prompt causing failure was sent. The failure causing prompt was "Can you tell me a joke?", on the order of 10 tokens.

Logs below:

INFO 11-17 13:49:12 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:22 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 11-17 13:49:40 logger.py:37] Received request cmpl-5a89b06405434a5ba0e8845972bbfc22-0: prompt: '\n<|begin_of_text|>\n\n<|start_header_id|>user<|end_header_id|>\n\n\nTell me a fun fact.\n\n<|start_header_id|>assistant<|end_header_id|>\n\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=0.9, top_k=20, min_p=0.0, seed=None, stop=['<|eot_id|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [128000, 198, 128000, 271, 128006, 882, 128007, 1432, 41551, 757, 264, 2523, 2144, 382, 128006, 78191, 128007, 1432], lora_request: None, prompt_adapter_request: None.
INFO:     172.18.0.4:58712 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 11-17 13:49:40 engine.py:291] Added request cmpl-5a89b06405434a5ba0e8845972bbfc22-0.
2024-11-17 13:49:40.071 | INFO     | models.demos.t3000.llama2_70b.tt.llama_generation:prefill_forward:316 - Filling kv cache for user 1
ERROR 11-17 13:49:48 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 11-17 13:49:48 client.py:250] NoneType: None
ERROR:    Exception in ASGI application
  + Exception Group Traceback (most recent call last):
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 188, in __call__
  |     await response(scope, wrapped_receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 222, in __call__
  |     async for chunk in self.body_iterator:
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 179, in body_stream
  |     raise app_exc
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 149, in coro
  |     await self.app(scope, receive_or_disconnect, send_no_error)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
  |     await wrap(partial(self.listen_for_disconnect, receive))
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    |     result = await app(  # type: ignore[func-returns-value]
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    |     return await self.app(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    |     await super().__call__(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
    |     raise exc
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
    |     await self.app(scope, receive, _send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
    |     response_sent.set()
    |   File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    |     self.gen.throw(type, value, traceback)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
    |     raise exc
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    |     async for prompt_idx, res in result_generator:
    |   File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    |     item = await d
    |   File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    |     raise request_output
    | vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
    +------------------------------------

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 77, in collapse_excgroups
  |     yield
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
  |     response_sent.set()
  |   File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
    |     await wrap(partial(self.listen_for_disconnect, receive))
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    |     message = await receive()
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 126, in receive_or_disconnect
    |     message = await wrap(wrapped_receive)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 121, in wrap
    |     result = await func()
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 51, in wrapped_receive
    |     msg = await self.receive()
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    |     await self.message_event.wait()
    |   File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait
    |     await fut
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Exception Group Traceback (most recent call last):
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 188, in __call__
    |     await response(scope, wrapped_receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 222, in __call__
    |     async for chunk in self.body_iterator:
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 179, in body_stream
    |     raise app_exc
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 149, in coro
    |     await self.app(scope, receive_or_disconnect, send_no_error)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
    |     await self.app(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    |     raise exc
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 715, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 735, in app
    |     await route.handle(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 288, in handle
    |     await self.app(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 76, in app
    |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    |     raise exc
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/routing.py", line 74, in app
    |     await response(scope, receive, send)
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 257, in __call__
    |     await wrap(partial(self.listen_for_disconnect, receive))
    |   File "/tt-metal/python_env/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
      |     result = await app(  # type: ignore[func-returns-value]
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
      |     return await self.app(scope, receive, send)
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
      |     await super().__call__(scope, receive, send)
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
      |     await self.middleware_stack(scope, receive, send)
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
      |     raise exc
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
      |     await self.app(scope, receive, _send)
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
      |     response_sent.set()
      |   File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
      |     self.gen.throw(type, value, traceback)
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
      |     raise exc
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
      |     await func()
      |   File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
      |     async for chunk in self.body_iterator:
      |   File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
      |     async for prompt_idx, res in result_generator:
      |   File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
      |     item = await d
      |   File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
      |     raise request_output
      | vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
      +------------------------------------

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
    response_sent.set()
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    async for chunk in self.body_iterator:
  File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: TimeoutError('No heartbeat received from MQLLMEngine').
CRITICAL 11-17 13:51:22 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     172.18.0.4:40756 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

Possibly related to #35

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@tstescoTT tstescoTT added the bug Something isn't working label Nov 17, 2024
@tstescoTT tstescoTT changed the title [Bug]: [Bug]: vLLM model backend crashed when running single user prompts, less than 128 token input context Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant