Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton warmup do not work for models with batch dimention>1 #459

Open
Apich238 opened this issue Oct 3, 2024 · 0 comments
Open

Triton warmup do not work for models with batch dimention>1 #459

Apich238 opened this issue Oct 3, 2024 · 0 comments

Comments

@Apich238
Copy link

Apich238 commented Oct 3, 2024

I exported YOLOv10 model to onnx format with batchsize>1:

from ultralytics import YOLOv10

yolov10_detector = YOLOv10.from_pretrained('jameslahm/yolov10x',)
yolov10_detector.export(format="onnx", dynamic=False,opset=15,batch=16,simplify=False)

and run it in tritonserver with following config:

name: "yolov10"
platform: "onnxruntime_onnx"


instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [16,3,640,640]
  }]

output [
  {
    name: "output0"
    data_type: TYPE_FP32
    dims: [16,300,6]
  }
]

optimization { execution_accelerators {
  gpu_execution_accelerator : [ { name : "tensorrt" } ]
}}

So far, tritonserver runs with no troubles.

Code I use to make requests to model in triton raises exception.
Code:

from ultralytics import YOLOv10

yolov10_batchsz=16

yolov10_detector = YOLOv10("http://127.0.0.1:8085/yolov10", task="detect")

import numpy as np
c=3
h=1080
w=1920

imgs=[np.random.uniform(0.,255.,(h,w,c)).astype(np.uint8) for _ in range(yolov10_batchsz)]

objss=yolov10_detector.predict(imgs,verbose=True)

Exception:

Exception has occurred: InferenceServerException       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
[400] unexpected shape for input 'images' for model 'yolov10'. Expected [16,3,640,640], got [1,3,640,640]
  File "/home/alex/.local/lib/python3.12/site-packages/tritonclient/http/_utils.py", line 69, in _raise_if_error
    raise error
  File "/home/alex/.local/lib/python3.12/site-packages/tritonclient/http/_client.py", line 1482, in infer
    _raise_if_error(response)
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/utils/triton.py", line 90, in __call__
    outputs = self.triton_client.infer(model_name=self.endpoint, inputs=infer_inputs, outputs=infer_outputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/nn/autobackend.py", line 516, in forward
    y = self.model(im)
        ^^^^^^^^^^^^^^
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/nn/autobackend.py", line 588, in warmup
    self.forward(im)  # warmup
    ^^^^^^^^^^^^^^^^
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/engine/predictor.py", line 228, in stream_inference
    self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
  File "/home/alex/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/engine/predictor.py", line 168, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/.local/lib/python3.12/site-packages/ultralytics/engine/model.py", line 441, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/sdb/faces/video_proto_2/FacesAPI/issue.py", line 22, in <module>
    objss=yolov10_detector.predict(imgs,verbose=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.12/runpy.py", line 198, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tritonclient.utils.InferenceServerException: [400] unexpected shape for input 'images' for model 'yolov10'. Expected [16,3,640,640], got [1,3,640,640]

Problem is, ultralytics api assumes that models in Tritonserver will always have batchsize=1:
/home/alex/.local/lib/python3.12/site-packages/ultralytics/engine/predictor.py", line 228:
self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
which, of course, not always truth.

I sincerely ask to fix this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant