jinaai/jina-embeddings-v2-base-* not working #11

TimPietrusky · 2024-08-05T13:59:17Z

When using the worker with the image runpod/worker-infinity-embedding:stable-cuda12.1.0, with this env var MODEL_NAMES: jinaai/jina-embeddings-v2-base-de, we see this error:

The transformation of the model "JinaBertModel" to BetterTransformer failed

According to michaelfeil/infinity#115 (comment) we should be able to solve this by setting these env variables:

INFINITY_DISABLE_OPTIMUM: TRUE
INFINITY_DISABLE_COMPILE : TRUE

But this is not working, we still see an error:

2024-08-05T13:35:35.781469998Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Started.", "level": "INFO"}
2024-08-05T13:35:36.794152731Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/4nv0a16wv8ef1p/job-done/1az42tjeq0sk40/6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1?gpu=NVIDIA+RTX+A4500&isStream=false')", "level": "ERROR"}
2024-08-05T13:35:36.794207183Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Finished.", "level": "INFO"}

Request

{
  "input": {
    "model": "jina-embeddings-v2-base-de",
    "input": "Hello World"
  }
}

Output

{
  "delayTime": 1125,
  "executionTime": 1049,
  "id": "8a3de0e1-6b43-41e4-a4af-5ad1473463a1-e1",
  "status": "COMPLETED"
}

So it looks like everything is completed, but there is no expected output (the embeddings).

OpenAI-compatible API

The behavior is the same when using the OpenAI-compatible API: It doesn't work, just provides the same output as above.

The text was updated successfully, but these errors were encountered:

TimPietrusky · 2024-08-05T14:01:15Z

@michaelfeil do you maybe have another idea on how to get this sorted?

michaelfeil · 2024-08-05T17:01:35Z

@TimPietrusky The output you posted are not really descriptive for the problem that occures.

The environment variables that are currently usable are not up to date. Here are all the functions that generate env variables in infinity. They are however just generating the defaults, I think there is nothing to do here.

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/env.py

Runpod-infinity currently uses:
infinity-emb[all,onnxruntime-gpu]==0.0.35 (See: https://github.com/runpod-workers/worker-infinity-embedding/blob/main/builder/requirements.txt)
Does you model run with this version?

TimPietrusky · 2024-08-06T09:51:38Z

@michaelfeil thanks for your quick response.

Sorry for the unusable error messages, this is what we get in our UI. I hopefully get access to more in depth logging or find someone how can actually help out here.

infinity-emb[all,onnxruntime-gpu]==0.0.35

Which version do you recommend? Should we try 0.0.53?

michaelfeil · 2024-08-06T18:49:32Z

Yeah, maybe 0.0.53 fixes this? Can you try running infinity from this and 0.0.35 & see if it works?
Then try and run the image from runpod locally.
if both work, check the UI for additional messages, not the other way around. :)

TimPietrusky · 2024-08-06T19:40:54Z

@michaelfeil awesome, thank you! Will do.

TimPietrusky · 2024-08-09T11:37:12Z

@pandyamarut please let us know when you had time to update this 🙏 Then I can do the testing.

TimPietrusky · 2024-08-15T10:12:30Z

Thanks to @pandyamarut we have updated the version in the worker runpod/worker-infinity-text-embedding:0.0.1-cuda12.1.0, but it will still not produce the desired outcome when using the same request / env vars as before:

2024-08-15T10:07:37.654523432Z INFO     2024-08-15 10:07:37,650 datasets INFO: PyTorch version     config.py:59
2024-08-15T10:07:37.654545312Z          2.5.0.dev20240618+cu121 available.                                     
2024-08-15T10:07:38.125792094Z INFO     2024-08-15 10:07:38,124 infinity_emb INFO:           select_model.py:57
2024-08-15T10:07:38.125805034Z          model=`jinaai/jina-embeddings-v2-base-de` selected,                    
2024-08-15T10:07:38.125806844Z          using engine=`torch` and device=`None`                                 
2024-08-15T10:07:38.384517276Z INFO     2024-08-15 10:07:38,381                      SentenceTransformer.py:189
2024-08-15T10:07:38.384544466Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.384548407Z          INFO: Use pytorch device_name: cuda                                    
2024-08-15T10:07:38.387090089Z INFO     2024-08-15 10:07:38,384                      SentenceTransformer.py:197
2024-08-15T10:07:38.387112810Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.387116550Z          INFO: Load pretrained SentenceTransformer:                             
2024-08-15T10:07:38.387119610Z          jinaai/jina-embeddings-v2-base-de                                      
2024-08-15T10:07:44.040260125Z WARNING  2024-08-15 10:07:44,038 infinity_emb WARNING:        acceleration.py:35
2024-08-15T10:07:44.040299926Z          DEPRECATED `INFINITY_DISABLE_OPTIMUM` - setting                        
2024-08-15T10:07:44.040303436Z          optimizations via                                                      
2024-08-15T10:07:44.040305626Z          BetterTransformer,INFINITY_DISABLE_OPTIMUM is no                       
2024-08-15T10:07:44.040307746Z          longer supported, please use the CLI / ENV for that.                   
2024-08-15T10:07:44.041460234Z INFO     2024-08-15 10:07:44,040 infinity_emb INFO:   sentence_transformer.py:81
2024-08-15T10:07:44.041485354Z          Switching to half() precision (cuda: fp16).                            
2024-08-15T10:07:44.059055655Z --- Starting Serverless Worker |  Version 1.7.0 ---
2024-08-15T10:07:45.357533272Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:45.360670254Z INFO     2024-08-15 10:07:45,358 infinity_emb INFO:         batch_handler.py:321
2024-08-15T10:07:45.360695094Z          creating batching engine                                               
2024-08-15T10:07:45.363612029Z INFO     2024-08-15 10:07:45,360 infinity_emb INFO: ready   batch_handler.py:384
2024-08-15T10:07:45.363629230Z          to batch requests.                                                     
2024-08-15T10:07:46.110630606Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/sync-9ba78228-2771-4781-a1b8-16aac1718974-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.110678748Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Finished.", "level": "INFO"}
2024-08-15T10:07:46.634466067Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:46.735420699Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/79345661-90dd-42ef-a3bc-5a2b82f70d66-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.735438659Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Finished.", "level": "INFO"}

I guess this also doesn't help with finding anything. What could be the next steps here to debug what is going on @pandyamarut? Or maybe you have another idea @michaelfeil?

TimPietrusky · 2024-08-15T15:35:21Z

Talked with @pandyamarut: We will try to debug what is going on here!

madhiemw · 2024-09-23T08:44:25Z

i also got the same problem in here seems like the error has relation with the size of the embed data that server try to send back to the client. because i tried to hit the endpoint with different length of list that contain sentence, when i limit the length of the list its just work fine. i've been test it with intfloat/multilingual-e5-large and also nvidia/NV-Embed-v2 embedding model.

and here is the data that i use for testing https://www.kaggle.com/datasets/rtatman/questionanswer-dataset i use the question data only.

TimPietrusky changed the title ~~jinaai/jina-embeddings-v2-base-es not working~~ jinaai/jina-embeddings-v2-base-* not working Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jinaai/jina-embeddings-v2-base-* not working #11

jinaai/jina-embeddings-v2-base-* not working #11

TimPietrusky commented Aug 5, 2024 •

edited

Loading

TimPietrusky commented Aug 5, 2024

michaelfeil commented Aug 5, 2024 •

edited

Loading

TimPietrusky commented Aug 6, 2024 •

edited

Loading

michaelfeil commented Aug 6, 2024

TimPietrusky commented Aug 6, 2024

TimPietrusky commented Aug 9, 2024

TimPietrusky commented Aug 15, 2024

TimPietrusky commented Aug 15, 2024

madhiemw commented Sep 23, 2024 •

edited

Loading

jinaai/jina-embeddings-v2-base-* not working #11

jinaai/jina-embeddings-v2-base-* not working #11

Comments

TimPietrusky commented Aug 5, 2024 • edited Loading

Request

Output

OpenAI-compatible API

TimPietrusky commented Aug 5, 2024

michaelfeil commented Aug 5, 2024 • edited Loading

TimPietrusky commented Aug 6, 2024 • edited Loading

michaelfeil commented Aug 6, 2024

TimPietrusky commented Aug 6, 2024

TimPietrusky commented Aug 9, 2024

TimPietrusky commented Aug 15, 2024

TimPietrusky commented Aug 15, 2024

madhiemw commented Sep 23, 2024 • edited Loading

TimPietrusky commented Aug 5, 2024 •

edited

Loading

michaelfeil commented Aug 5, 2024 •

edited

Loading

TimPietrusky commented Aug 6, 2024 •

edited

Loading

madhiemw commented Sep 23, 2024 •

edited

Loading