AMD GPU machine learning? #4883

terryrankine · 2023-11-07T07:52:22Z

terryrankine
Nov 7, 2023

So, amd are producing docker images which contain the rocm core and python and a few other things long the way, how hard would it be to take what is being done in the current ML container and shift its base functionality into a more AI worth compute engine?

happy to be a test monkey and help out where needed

let me know what the gaps are and if its a hard thing to transition to a 'immich-ml-accel-cuda' 'immich-ml-accel-rocm' style of container :)

bo0tzz · 2023-11-07T09:41:18Z

bo0tzz
Nov 7, 2023
Maintainer

cc @mertalev, I think we already support AMD right?

9 replies

Zelnes Jul 9, 2024

Thanks for the tips.
I have something working now. Here's the updated version, still wip as I didn't take the time to rework the Dockerfile yet.

Dockerfile

RUN poetry config installer.max-workers 10 && \
    poetry config virtualenvs.create false

+ RUN apt purge -y libmimalloc2.0
+ RUN apt autoremove -y
COPY poetry.lock pyproject.toml ./
RUN poetry install --no-interaction --no-ansi --no-root --with rocm --without dev

RUN rm -rf /var/lib/apt/lists/*

For the moment, my image is still big, but I will dig on this later, or if someone is interested in merging this into the ML Dockerfile.
I removed mimalloc, as suggested by @mertalev.

`docker-compose.yml`

  immich-machine-learning:
    container_name: immich_test_machine_learning
    image: immich-pytorch-rocm:latest
    group_add:
      - video
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd

Immich side

I must not use parallel in the jobs that are doing ML, otherwise the application crashes. I must set everything to 1 in the concurrency page.
It's okay for me, it's still much faster that the CPU work

Logs okay, but warnings

ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libmimalloc.so.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[07/09/24 08:08:04] DEBUG    Current configuration:                             
                               config: ./gunicorn.conf.py                       
                               wsgi_app: None                                   
                               bind: ['[::]:3003']                              
                               backlog: 2048                                    
                               workers: 1                                       
                               worker_class: app.config.CustomUvicornWorker     
                               threads: 1                                       
                               worker_connections: 1000                         
                               max_requests: 0                                  
                               max_requests_jitter: 0                           
                               timeout: 120                                     
                               graceful_timeout: 0                              
                               keepalive: 2                                     
                               limit_request_line: 4094                         
                               limit_request_fields: 100                        
                               limit_request_field_size: 8190                   
                               reload: False                                    
                               reload_engine: auto                              
                               reload_extra_files: []                           
                               spew: False                                      
                               check_config: False                              
                               print_config: False                              
                               preload_app: False                               
                               sendfile: None                                   
                               reuse_port: False                                
                               chdir: /usr/src/app                              
                               daemon: False                                    
                               raw_env: []                                      
                               pidfile: None                                    
                               worker_tmp_dir: None                             
                               user: 0                                          
                               group: 0                                         
                               umask: 0                                         
                               initgroups: False                                
                               tmp_upload_dir: None                             
                               secure_scheme_headers: {'X-FORWARDED-PROTOCOL':  
                             'ssl', 'X-FORWARDED-PROTO': 'https',               
                             'X-FORWARDED-SSL': 'on'}                           
                               forwarded_allow_ips: ['127.0.0.1']               
                               accesslog: None                                  
                               disable_redirect_access_to_syslog: False         
                               access_log_format: %(h)s %(l)s %(u)s %(t)s       
                             "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"                
                               errorlog: -                                      
                               loglevel: debug                                  
                               capture_output: False                            
                               logger_class: gunicorn.glogging.Logger           
                               logconfig: None                                  
                               logconfig_dict: {}                               
                               logconfig_json: log_conf.json                    
                               syslog_addr: udp://localhost:514                 
                               syslog: False                                    
                               syslog_prefix: None                              
                               syslog_facility: user                            
                               enable_stdio_inheritance: False                  
                               statsd_host: None                                
                               dogstatsd_tags:                                  
                               statsd_prefix:                                   
                               proc_name: None                                  
                               default_proc_name: app.main:app                  
                               pythonpath: None                                 
                               paste: None                                      
                               on_starting: <function OnStarting.on_starting at 
                             0x7f6f655817e0>                                    
                               on_reload: <function OnReload.on_reload at       
                             0x7f6f65581900>                                    
                               when_ready: <function WhenReady.when_ready at    
                             0x7f6f65581a20>                                    
                               pre_fork: <function Prefork.pre_fork at          
                             0x7f6f65581b40>                                    
                               post_fork: <function Postfork.post_fork at       
                             0x7f6f65581c60>                                    
                               post_worker_init: <function                      
                             PostWorkerInit.post_worker_init at 0x7f6f65581d80> 
                               worker_int: <function WorkerInt.worker_int at    
                             0x7f6f65581ea0>                                    
                               worker_abort: <function WorkerAbort.worker_abort 
                             at 0x7f6f65581fc0>                                 
                               pre_exec: <function PreExec.pre_exec at          
                             0x7f6f655820e0>                                    
                               pre_request: <function PreRequest.pre_request at 
                             0x7f6f65582200>                                    
                               post_request: <function PostRequest.post_request 
                             at 0x7f6f65582290>                                 
                               child_exit: <function ChildExit.child_exit at    
                             0x7f6f655823b0>                                    
                               worker_exit: <function WorkerExit.worker_exit at 
                             0x7f6f655824d0>                                    
                               nworkers_changed: <function                      
                             NumWorkersChanged.nworkers_changed at              
                             0x7f6f655825f0>                                    
                               on_exit: <function OnExit.on_exit at             
                             0x7f6f65582710>                                    
                               ssl_context: <function NewSSLContext.ssl_context 
                             at 0x7f6f65582830>                                 
                               proxy_protocol: False                            
                               proxy_allow_ips: ['127.0.0.1']                   
                               keyfile: None                                    
                               certfile: None                                   
                               ssl_version: 2                                   
                               cert_reqs: 0                                     
                               ca_certs: None                                   
                               suppress_ragged_eofs: True                       
                               do_handshake_on_connect: False                   
                               ciphers: None                                    
                               raw_paste_global_conf: []                        
                               strip_header_spaces: False                       
                               permit_unconventional_http_method: False         
                               permit_unconventional_http_version: False        
                               casefold_http_method: False                      
                               header_map: drop                                 
                               tolerate_dangerous_framing: False                
[07/09/24 08:08:04] INFO     Starting gunicorn 22.0.0                           
[07/09/24 08:08:04] DEBUG    Arbiter booted                                     
[07/09/24 08:08:04] INFO     Listening at: http://[::]:3003 (9)                 
[07/09/24 08:08:04] INFO     Using worker: app.config.CustomUvicornWorker       
[07/09/24 08:08:04] INFO     Booting worker with pid: 10                        
[07/09/24 08:08:05] DEBUG    1 workers                                          
[07/09/24 08:08:07] INFO     Started server process [10]                        
[07/09/24 08:08:07] INFO     Waiting for application startup.                   
[07/09/24 08:08:07] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[07/09/24 08:08:07] INFO     Initialized request thread pool with 12 threads.   
[07/09/24 08:08:07] INFO     Application startup complete.                      
[07/09/24 08:08:09] INFO     Attempt #2 to load visual model 'ViT-B-32__openai' 
                             to memory                                          
[07/09/24 08:08:09] INFO     Setting execution providers to                     
                             ['ROCMExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
[07/09/24 08:14:33] INFO     Attempt #2 to load detection model 'buffalo_l' to  
                             memory                                             
[07/09/24 08:14:33] INFO     Setting execution providers to                     
                             ['ROCMExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
[07/09/24 08:14:40] INFO     Attempt #2 to load recognition model 'buffalo_l' to
                             memory                                             
[07/09/24 08:14:40] INFO     Setting execution providers to                     
                             ['ROCMExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
2024-07-09 08:14:42.365570961 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
2024-07-09 08:14:44.317171965 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
2024-07-09 08:16:37.649607562 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
2024-07-09 08:16:39.591888065 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {14,512} for output 683
2024-07-09 08:16:39.709968422 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {14,512} for output 683
2024-07-09 08:16:39.857047762 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
2024-07-09 08:16:39.949829937 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
2024-07-09 08:16:41.378309168 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {11,512} for output 683
2024-07-09 08:16:42.632102887 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {10,512} for output 683
2024-07-09 08:16:42.888718403 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {11,512} for output 683
2024-07-09 08:16:43.137063029 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {9,512} for output 683
2024-07-09 08:16:43.301533520 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
...
2024-07-09 08:16:44.743544801 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
2024-07-09 08:16:45.168172998 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
2024-07-09 08:16:45.361697910 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
[07/09/24 08:21:47] INFO     Shutting down due to inactivity.                   
[07/09/24 08:21:47] INFO     Shutting down                                      
[07/09/24 08:21:47] INFO     Waiting for application shutdown.                  
[07/09/24 08:21:47] INFO     Application shutdown complete.                     
[07/09/24 08:21:47] INFO     Finished server process [10]                       
[07/09/24 08:21:47] ERROR    Worker (pid:10) was sent SIGINT!                   
[07/09/24 08:21:47] INFO     Booting worker with pid: 272                       
[07/09/24 08:21:49] INFO     Started server process [272]                       
[07/09/24 08:21:49] INFO     Waiting for application startup.                   
[07/09/24 08:21:49] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[07/09/24 08:21:49] INFO     Initialized request thread pool with 12 threads.   
[07/09/24 08:21:49] INFO     Application startup complete.

Logs in error (parallel jobs)

MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:536: No invoker was registered for convolution forward. Was find executed?
2024-07-09 08:07:49.149919780 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
2024-07-09 08:07:49.149934950 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_796' Status Message: MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
[07/09/24 08:07:49] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:151 in predict             │
                             │                                                 │
                             │   148 │   │   inputs = text                     │
                             │   149 │   else:                                 │
                             │   150 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 151 │   response = await run_inference(inputs │
                             │   152 │   return ORJSONResponse(response)       │
                             │   153                                           │
                             │   154                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:174 in run_inference       │
                             │                                                 │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172 │                                         │
                             │   173 │   without_deps, with_deps = entries     │
                             │ ❱ 174 │   await asyncio.gather(*[_run_inference │
                             │   175 │   if with_deps:                         │
                             │   176 │   │   await asyncio.gather(*[_run_infer │
                             │   177 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   167 │   │   │   │   raise HTTPException(400,  │
                             │   168 │   │   model = await load(model)         │
                             │ ❱ 169 │   │   output = await run(model.predict, │
                             │   170 │   │   outputs[model.identity] = output  │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:187 in run                 │
                             │                                                 │
                             │   184 │   if thread_pool is None:               │
                             │   185 │   │   return func(*args, **kwargs)      │
                             │   186 │   partial_func = partial(func, *args, * │
                             │ ❱ 187 │   return await asyncio.get_running_loop │
                             │   188                                           │
                             │   189                                           │
                             │   190 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/concurre │
                             │ nt/futures/thread.py:58 in run                  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/clip/visual.py:23 in        │
                             │ _predict                                        │
                             │                                                 │
                             │   20 │                                          │
                             │   21 │   def _predict(self, inputs: Image.Image │
                             │      NDArray[np.float32]:                       │
                             │   22 │   │   image = decode_pil(inputs)         │
                             │ ❱ 23 │   │   res: NDArray[np.float32] = self.se │
                             │   24 │   │   return res                         │
                             │   25 │                                          │
                             │   26 │   @abstractmethod                        │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/site-pac │
                             │ kages/onnxruntime/capi/onnxruntime_inference_co │
                             │ llection.py:220 in run                          │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running Conv node.      
                             Name:'Conv_796' Status Message: MIOPEN failure 7:  
                             miopenStatusUnknownError ; GPU=0 ;                 
                             hostname=311a035848f1 ;                            
                             file=/code/onnxruntime/onnxruntime/core/providers/r
                             ocm/nn/conv.cc ; line=336 ;                        
                             expr=miopenConvolutionForward(miopen_handle,       
                             &alpha, s_.x_tensor, s_.x_data, s_.w_desc,         
                             s_.w_data, s_.conv_desc, s_.fwd_algo, &beta,       
                             s_.y_tensor, s_.y_data, workspace.get(),           
                             s_.workspace_bytes);                               
MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:536: No invoker was registered for convolution forward. Was find executed?
2024-07-09 08:07:49.252215641 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
2024-07-09 08:07:49.252230731 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_796' Status Message: MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
[07/09/24 08:07:49] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:151 in predict             │
                             │                                                 │
                             │   148 │   │   inputs = text                     │
                             │   149 │   else:                                 │
                             │   150 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 151 │   response = await run_inference(inputs │
                             │   152 │   return ORJSONResponse(response)       │
                             │   153                                           │
                             │   154                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:174 in run_inference       │
                             │                                                 │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172 │                                         │
                             │   173 │   without_deps, with_deps = entries     │
                             │ ❱ 174 │   await asyncio.gather(*[_run_inference │
                             │   175 │   if with_deps:                         │
                             │   176 │   │   await asyncio.gather(*[_run_infer │
                             │   177 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   167 │   │   │   │   raise HTTPException(400,  │
                             │   168 │   │   model = await load(model)         │
                             │ ❱ 169 │   │   output = await run(model.predict, │
                             │   170 │   │   outputs[model.identity] = output  │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:187 in run                 │
                             │                                                 │
                             │   184 │   if thread_pool is None:               │
                             │   185 │   │   return func(*args, **kwargs)      │
                             │   186 │   partial_func = partial(func, *args, * │
                             │ ❱ 187 │   return await asyncio.get_running_loop │
                             │   188                                           │
                             │   189                                           │
                             │   190 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/concurre │
                             │ nt/futures/thread.py:58 in run                  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/clip/visual.py:23 in        │
                             │ _predict                                        │
                             │                                                 │
                             │   20 │                                          │
                             │   21 │   def _predict(self, inputs: Image.Image │
                             │      NDArray[np.float32]:                       │
                             │   22 │   │   image = decode_pil(inputs)         │
                             │ ❱ 23 │   │   res: NDArray[np.float32] = self.se │
                             │   24 │   │   return res                         │
                             │   25 │                                          │
                             │   26 │   @abstractmethod                        │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/site-pac │
                             │ kages/onnxruntime/capi/onnxruntime_inference_co │
                             │ llection.py:220 in run                          │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running Conv node.      
                             Name:'Conv_796' Status Message: MIOPEN failure 7:  
                             miopenStatusUnknownError ; GPU=0 ;                 
                             hostname=311a035848f1 ;                            
                             file=/code/onnxruntime/onnxruntime/core/providers/r
                             ocm/nn/conv.cc ; line=336 ;                        
                             expr=miopenConvolutionForward(miopen_handle,       
                             &alpha, s_.x_tensor, s_.x_data, s_.w_desc,         
                             s_.w_data, s_.conv_desc, s_.fwd_algo, &beta,       
                             s_.y_tensor, s_.y_data, workspace.get(),           
                             s_.workspace_bytes);                               
MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:536: No invoker was registered for convolution forward. Was find executed?
2024-07-09 08:07:49.339291222 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
2024-07-09 08:07:49.339306632 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_796' Status Message: MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
[07/09/24 08:07:49] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:151 in predict             │
                             │                                                 │
                             │   148 │   │   inputs = text                     │
                             │   149 │   else:                                 │
                             │   150 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 151 │   response = await run_inference(inputs │
                             │   152 │   return ORJSONResponse(response)       │
                             │   153                                           │
                             │   154                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:174 in run_inference       │
                             │                                                 │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172 │                                         │
                             │   173 │   without_deps, with_deps = entries     │
                             │ ❱ 174 │   await asyncio.gather(*[_run_inference │
                             │   175 │   if with_deps:                         │
                             │   176 │   │   await asyncio.gather(*[_run_infer │
                             │   177 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   167 │   │   │   │   raise HTTPException(400,  │
                             │   168 │   │   model = await load(model)         │
                             │ ❱ 169 │   │   output = await run(model.predict, │
                             │   170 │   │   outputs[model.identity] = output  │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:187 in run                 │
                             │                                                 │
                             │   184 │   if thread_pool is None:               │
                             │   185 │   │   return func(*args, **kwargs)      │
                             │   186 │   partial_func = partial(func, *args, * │
                             │ ❱ 187 │   return await asyncio.get_running_loop │
                             │   188                                           │
                             │   189                                           │
                             │   190 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/concurre │
                             │ nt/futures/thread.py:58 in run                  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/clip/visual.py:23 in        │
                             │ _predict                                        │
                             │                                                 │
                             │   20 │                                          │
                             │   21 │   def _predict(self, inputs: Image.Image │
                             │      NDArray[np.float32]:                       │
                             │   22 │   │   image = decode_pil(inputs)         │
                             │ ❱ 23 │   │   res: NDArray[np.float32] = self.se │
                             │   24 │   │   return res                         │
                             │   25 │                                          │
                             │   26 │   @abstractmethod                        │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/site-pac │
                             │ kages/onnxruntime/capi/onnxruntime_inference_co │
                             │ llection.py:220 in run                          │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running Conv node.      
                             Name:'Conv_796' Status Message: MIOPEN failure 7:  
                             miopenStatusUnknownError ; GPU=0 ;                 
                             hostname=311a035848f1 ;                            
                             file=/code/onnxruntime/onnxruntime/core/providers/r
                             ocm/nn/conv.cc ; line=336 ;                        
                             expr=miopenConvolutionForward(miopen_handle,       
                             &alpha, s_.x_tensor, s_.x_data, s_.w_desc,         
                             s_.w_data, s_.conv_desc, s_.fwd_algo, &beta,       
                             s_.y_tensor, s_.y_data, workspace.get(),           
                             s_.workspace_bytes);                               
MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:536: No invoker was registered for convolution forward. Was find executed?
2024-07-09 08:07:49.428857644 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
2024-07-09 08:07:49.428872834 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_796' Status Message: MIOPEN failure 7: miopenStatusUnknownError ; GPU=0 ; hostname=311a035848f1 ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv.cc ; line=336 ; expr=miopenConvolutionForward(miopen_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.fwd_algo, &beta, s_.y_tensor, s_.y_data, workspace.get(), s_.workspace_bytes); 
[07/09/24 08:07:49] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:151 in predict             │
                             │                                                 │
                             │   148 │   │   inputs = text                     │
                             │   149 │   else:                                 │
                             │   150 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 151 │   response = await run_inference(inputs │
                             │   152 │   return ORJSONResponse(response)       │
                             │   153                                           │
                             │   154                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:174 in run_inference       │
                             │                                                 │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172 │                                         │
                             │   173 │   without_deps, with_deps = entries     │
                             │ ❱ 174 │   await asyncio.gather(*[_run_inference │
                             │   175 │   if with_deps:                         │
                             │   176 │   │   await asyncio.gather(*[_run_infer │
                             │   177 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   167 │   │   │   │   raise HTTPException(400,  │
                             │   168 │   │   model = await load(model)         │
                             │ ❱ 169 │   │   output = await run(model.predict, │
                             │   170 │   │   outputs[model.identity] = output  │
                             │   171 │   │   response[entry["task"]] = output  │
                             │   172                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:187 in run                 │
                             │                                                 │
                             │   184 │   if thread_pool is None:               │
                             │   185 │   │   return func(*args, **kwargs)      │
                             │   186 │   partial_func = partial(func, *args, * │
                             │ ❱ 187 │   return await asyncio.get_running_loop │
                             │   188                                           │
                             │   189                                           │
                             │   190 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/concurre │
                             │ nt/futures/thread.py:58 in run                  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/clip/visual.py:23 in        │
                             │ _predict                                        │
                             │                                                 │
                             │   20 │                                          │
                             │   21 │   def _predict(self, inputs: Image.Image │
                             │      NDArray[np.float32]:                       │
                             │   22 │   │   image = decode_pil(inputs)         │
                             │ ❱ 23 │   │   res: NDArray[np.float32] = self.se │
                             │   24 │   │   return res                         │
                             │   25 │                                          │
                             │   26 │   @abstractmethod                        │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/conda/envs/py_3.10/lib/python3.10/site-pac │
                             │ kages/onnxruntime/capi/onnxruntime_inference_co │
                             │ llection.py:220 in run                          │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running Conv node.      
                             Name:'Conv_796' Status Message: MIOPEN failure 7:  
                             miopenStatusUnknownError ; GPU=0 ;                 
                             hostname=311a035848f1 ;                            
                             file=/code/onnxruntime/onnxruntime/core/providers/r
                             ocm/nn/conv.cc ; line=336 ;                        
                             expr=miopenConvolutionForward(miopen_handle,       
                             &alpha, s_.x_tensor, s_.x_data, s_.w_desc,         
                             s_.w_data, s_.conv_desc, s_.fwd_algo, &beta,       
                             s_.y_tensor, s_.y_data, workspace.get(),           
                             s_.workspace_bytes);

mertalev Jul 9, 2024
Maintainer

Awesome! Someone tried this a few months ago but ran into this issue leading to this PR, so glad to hear you don't have this issue.

Regarding concurrency, could you make an ONNX Runtime issue for that? I think this is a thread safety bug. Not a blocker for getting this into immich of course, just good to solve it.

mertalev Jul 9, 2024
Maintainer

Oh, I didn't see the first log entry. It looks like it's actually the same issue. You can add a comment about it to the issue I linked. You can also try building with that PR.

Edit: But merge/rebase the PR on main first.

Zelnes Jul 10, 2024

Wow, I'm so grateful for your tips. I have managed to build the Onnxruntime using the PR, and a small patch needed after the rebase, and now I can run in parallel (I'm currently running both FACE DETECTION and SMART SEARCH with 4 jobs, and it runs smoothly.
I'm gonna try to make a clean Dockerfile in a multistage mode, so I can only install the onnxruntime-rocm into the prod image.

I'll keep you up-to-date in this channel!

Zelnes Jul 12, 2024

I'm finally back, with great news!
I've managed to reduce the size of the image to "only" 33GB, and the process is now part of the multi-stage Dockerfile.
You'll find a working example in this PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU machine learning? #4883

{{title}}

Replies: 1 comment 9 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

AMD GPU machine learning? #4883

terryrankine Nov 7, 2023

Replies: 1 comment · 9 replies

bo0tzz Nov 7, 2023 Maintainer

Zelnes Jul 9, 2024

Dockerfile

docker-compose.yml

Immich side

mertalev Jul 9, 2024 Maintainer

mertalev Jul 9, 2024 Maintainer

Zelnes Jul 10, 2024

Zelnes Jul 12, 2024

terryrankine
Nov 7, 2023

Replies: 1 comment 9 replies

bo0tzz
Nov 7, 2023
Maintainer

`docker-compose.yml`

mertalev Jul 9, 2024
Maintainer

mertalev Jul 9, 2024
Maintainer