Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jinaai/jina-embeddings-v2-base-* not working #11

Open
TimPietrusky opened this issue Aug 5, 2024 · 9 comments
Open

jinaai/jina-embeddings-v2-base-* not working #11

TimPietrusky opened this issue Aug 5, 2024 · 9 comments

Comments

@TimPietrusky
Copy link

TimPietrusky commented Aug 5, 2024

When using the worker with the image runpod/worker-infinity-embedding:stable-cuda12.1.0, with this env var MODEL_NAMES: jinaai/jina-embeddings-v2-base-de, we see this error:

The transformation of the model "JinaBertModel" to BetterTransformer failed

According to michaelfeil/infinity#115 (comment) we should be able to solve this by setting these env variables:

  • INFINITY_DISABLE_OPTIMUM: TRUE
  • INFINITY_DISABLE_COMPILE : TRUE

But this is not working, we still see an error:

2024-08-05T13:35:35.781469998Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Started.", "level": "INFO"}
2024-08-05T13:35:36.794152731Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/4nv0a16wv8ef1p/job-done/1az42tjeq0sk40/6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1?gpu=NVIDIA+RTX+A4500&isStream=false')", "level": "ERROR"}
2024-08-05T13:35:36.794207183Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Finished.", "level": "INFO"}

Request

{
  "input": {
    "model": "jina-embeddings-v2-base-de",
    "input": "Hello World"
  }
}

Output

{
  "delayTime": 1125,
  "executionTime": 1049,
  "id": "8a3de0e1-6b43-41e4-a4af-5ad1473463a1-e1",
  "status": "COMPLETED"
}

So it looks like everything is completed, but there is no expected output (the embeddings).

OpenAI-compatible API

The behavior is the same when using the OpenAI-compatible API: It doesn't work, just provides the same output as above.

@TimPietrusky TimPietrusky changed the title jinaai/jina-embeddings-v2-base-es not working jinaai/jina-embeddings-v2-base-* not working Aug 5, 2024
@TimPietrusky
Copy link
Author

@michaelfeil do you maybe have another idea on how to get this sorted?

@michaelfeil
Copy link
Contributor

michaelfeil commented Aug 5, 2024

@TimPietrusky The output you posted are not really descriptive for the problem that occures.

The environment variables that are currently usable are not up to date. Here are all the functions that generate env variables in infinity. They are however just generating the defaults, I think there is nothing to do here.

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/env.py

Runpod-infinity currently uses:
infinity-emb[all,onnxruntime-gpu]==0.0.35 (See: https://github.com/runpod-workers/worker-infinity-embedding/blob/main/builder/requirements.txt)
Does you model run with this version?

@TimPietrusky
Copy link
Author

TimPietrusky commented Aug 6, 2024

@michaelfeil thanks for your quick response.

Sorry for the unusable error messages, this is what we get in our UI. I hopefully get access to more in depth logging or find someone how can actually help out here.

infinity-emb[all,onnxruntime-gpu]==0.0.35

Which version do you recommend? Should we try 0.0.53?

@michaelfeil
Copy link
Contributor

Yeah, maybe 0.0.53 fixes this? Can you try running infinity from this and 0.0.35 & see if it works?
Then try and run the image from runpod locally.
if both work, check the UI for additional messages, not the other way around. :)

@TimPietrusky
Copy link
Author

@michaelfeil awesome, thank you! Will do.

@TimPietrusky
Copy link
Author

@pandyamarut please let us know when you had time to update this 🙏 Then I can do the testing.

@TimPietrusky
Copy link
Author

Thanks to @pandyamarut we have updated the version in the worker runpod/worker-infinity-text-embedding:0.0.1-cuda12.1.0, but it will still not produce the desired outcome when using the same request / env vars as before:

2024-08-15T10:07:37.654523432Z INFO     2024-08-15 10:07:37,650 datasets INFO: PyTorch version     config.py:59
2024-08-15T10:07:37.654545312Z          2.5.0.dev20240618+cu121 available.                                     
2024-08-15T10:07:38.125792094Z INFO     2024-08-15 10:07:38,124 infinity_emb INFO:           select_model.py:57
2024-08-15T10:07:38.125805034Z          model=`jinaai/jina-embeddings-v2-base-de` selected,                    
2024-08-15T10:07:38.125806844Z          using engine=`torch` and device=`None`                                 
2024-08-15T10:07:38.384517276Z INFO     2024-08-15 10:07:38,381                      SentenceTransformer.py:189
2024-08-15T10:07:38.384544466Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.384548407Z          INFO: Use pytorch device_name: cuda                                    
2024-08-15T10:07:38.387090089Z INFO     2024-08-15 10:07:38,384                      SentenceTransformer.py:197
2024-08-15T10:07:38.387112810Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.387116550Z          INFO: Load pretrained SentenceTransformer:                             
2024-08-15T10:07:38.387119610Z          jinaai/jina-embeddings-v2-base-de                                      
2024-08-15T10:07:44.040260125Z WARNING  2024-08-15 10:07:44,038 infinity_emb WARNING:        acceleration.py:35
2024-08-15T10:07:44.040299926Z          DEPRECATED `INFINITY_DISABLE_OPTIMUM` - setting                        
2024-08-15T10:07:44.040303436Z          optimizations via                                                      
2024-08-15T10:07:44.040305626Z          BetterTransformer,INFINITY_DISABLE_OPTIMUM is no                       
2024-08-15T10:07:44.040307746Z          longer supported, please use the CLI / ENV for that.                   
2024-08-15T10:07:44.041460234Z INFO     2024-08-15 10:07:44,040 infinity_emb INFO:   sentence_transformer.py:81
2024-08-15T10:07:44.041485354Z          Switching to half() precision (cuda: fp16).                            
2024-08-15T10:07:44.059055655Z --- Starting Serverless Worker |  Version 1.7.0 ---
2024-08-15T10:07:45.357533272Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:45.360670254Z INFO     2024-08-15 10:07:45,358 infinity_emb INFO:         batch_handler.py:321
2024-08-15T10:07:45.360695094Z          creating batching engine                                               
2024-08-15T10:07:45.363612029Z INFO     2024-08-15 10:07:45,360 infinity_emb INFO: ready   batch_handler.py:384
2024-08-15T10:07:45.363629230Z          to batch requests.                                                     
2024-08-15T10:07:46.110630606Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/sync-9ba78228-2771-4781-a1b8-16aac1718974-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.110678748Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Finished.", "level": "INFO"}
2024-08-15T10:07:46.634466067Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:46.735420699Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/79345661-90dd-42ef-a3bc-5a2b82f70d66-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.735438659Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Finished.", "level": "INFO"}

I guess this also doesn't help with finding anything. What could be the next steps here to debug what is going on @pandyamarut? Or maybe you have another idea @michaelfeil?

@TimPietrusky
Copy link
Author

Talked with @pandyamarut: We will try to debug what is going on here!

@madhiemw
Copy link

madhiemw commented Sep 23, 2024

i also got the same problem in here seems like the error has relation with the size of the embed data that server try to send back to the client. because i tried to hit the endpoint with different length of list that contain sentence, when i limit the length of the list its just work fine. i've been test it with intfloat/multilingual-e5-large and also nvidia/NV-Embed-v2 embedding model.

and here is the data that i use for testing https://www.kaggle.com/datasets/rtatman/questionanswer-dataset i use the question data only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants