Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable timeout for LLM task #224

Closed
jbvioix opened this issue Jun 14, 2024 · 5 comments
Closed

Add configurable timeout for LLM task #224

jbvioix opened this issue Jun 14, 2024 · 5 comments
Labels
feature request New feature or request

Comments

@jbvioix
Copy link

jbvioix commented Jun 14, 2024

I've successfully tried Ollama on GPU to generate keywords. However, when I use it on a CPU, I get no results. I've done a few tests in Python, the calculation time on CPU is much longer with correct results. I think there's a timeout somewhere that stops the Ollama task. Is it possible to configure it so that the CPU can be used (on a single-user lightweight server) for labelling?

@jbvioix jbvioix changed the title [feature request] Add configurable timeout for LLM task Add configurable timeout for LLM task Jun 14, 2024
@MohamedBassem
Copy link
Collaborator

This makes a lot of sense. Can you share with me the "timeout" logs that you're getting? I want to know where exactly we're timing out to make it configurage. Is it that the background job itself timesout, or is it the call to ollama that timesout

@MohamedBassem MohamedBassem added the feature request New feature or request label Jun 14, 2024
@jbvioix
Copy link
Author

jbvioix commented Jun 15, 2024

With GPU enabled, I've got these logs:

workers-1      | 2024-06-15T07:16:31.518Z info: [inference][99] Starting an inference job for bookmark with id "ouhm96clwfw25pkbmdcrlj3o"
ollama-1       | [GIN] 2024/06/15 - 07:17:16 | 200 | 44.993621295s |      172.25.0.7 | POST     "/api/chat"
workers-1      | 2024-06-15T07:17:16.537Z info: [inference][99] Inferring tag for bookmark "ouhm96clwfw25pkbmdcrlj3o" used 1656 tokens and inferred: Python,History,ProgrammingLanguage,ComputerScience,DevelopmentEnvironment
workers-1      | 2024-06-15T07:17:16.584Z info: [inference][99] Completed successfully

Perfect job, no problem.
If I disabled GPU, I've got this:

workers-1      | 2024-06-15T07:19:46.715Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
ollama-1       | [GIN] 2024/06/15 - 07:24:47 | 200 |          5m1s |      172.25.0.7 | POST     "/api/chat"
workers-1      | 2024-06-15T07:24:47.971Z error: [inference][100] inference job failed: TypeError: fetch failed
workers-1      | 2024-06-15T07:29:50.926Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
...
ollama-1       | [GIN] 2024/06/15 - 07:29:49 | 200 |          5m1s |      172.25.0.7 | POST     "/api/chat"
workers-1      | 2024-06-15T07:29:49.832Z error: [inference][100] inference job failed: TypeError: fetch failed
workers-1      | 2024-06-15T07:29:50.926Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
...
ollama-1       | [GIN] 2024/06/15 - 07:34:52 | 200 |          5m1s |      172.25.0.7 | POST     "/api/chat"
workers-1      | 2024-06-15T07:34:52.254Z error: [inference][100] inference job failed: TypeError: fetch failed
...

After the first fail, a new inference job is automatically launched. There are 5 minutes between each job events. I think it's a timeout somewhere...

@kirincorleone
Copy link

kirincorleone commented Aug 23, 2024

Hi,

Loving Hoarder, thanks for this app!

I am using ollama on my Synology DS920+
CPU: J4125
GPU: None
Text Model: tinydolphin
Image Model: None

Workers Docker Environment Variables:

OLLAMA_BASE_URL: http://[address]
INFERENCE_TEXT_MODEL: tinydolphin

#INFERENCE_IMAGE_MODEL: llava

Just saying Hi on Open-webUI takes many minutes for a reply. So, I definitely need longer time for Hoarder Inference to do its job.

Here are my logs, I hope they help:

stdout 2024-08-22T19:37:34.312Z error: [inference][197] inference job failed: Error: Timeout
stdout 2024-08-22T19:37:35.033Z info: [inference][199] Starting an inference job for bookmark with id "hf59xhijya0mxxka9jxvr9rf"
stdout Getting text from response
stdout 2024-08-22T19:38:05.913Z info: [inference][199] Starting an inference job for bookmark with id "hf59xhijya0mxxka9jxvr9rf"
stdout Getting text from response
stdout 2024-08-22T19:38:36.580Z info: [inference][199] Starting an inference job for bookmark with id "hf59xhijya0mxxka9jxvr9rf"
stdout Getting text from response
stdout 2024-08-22T19:39:07.237Z info: [inference][199] Starting an inference job for bookmark with id "hf59xhijya0mxxka9jxvr9rf"
stdout Getting text from response
stdout 2024-08-22T19:39:37.233Z error: [inference][199] inference job failed: Error: Timeout
stdout 2024-08-22T19:39:37.945Z info: [inference][201] Starting an inference job for bookmark with id "qcgxclnnymm0n2lb8eks5u2x"
stdout Getting text from response
stdout 2024-08-22T19:40:08.519Z info: [inference][201] Starting an inference job for bookmark with id "qcgxclnnymm0n2lb8eks5u2x"
stdout Getting text from response
stdout 2024-08-22T19:40:38.922Z info: [inference][201] Starting an inference job for bookmark with id "qcgxclnnymm0n2lb8eks5u2x"
stdout Getting text from response
stdout 2024-08-22T19:41:09.717Z info: [inference][201] Starting an inference job for bookmark with id "qcgxclnnymm0n2lb8eks5u2x"
stdout Getting text from response
stdout 2024-08-22T19:41:39.712Z error: [inference][201] inference job failed: Error: Timeout
stdout 2024-08-22T19:41:40.875Z info: [inference][203] Starting an inference job for bookmark with id "smevi67ztewic14e3qv1aime"
stdout Getting text from response
stdout 2024-08-22T19:42:11.337Z info: [inference][203] Starting an inference job for bookmark with id "smevi67ztewic14e3qv1aime"
stdout Getting text from response
stdout 2024-08-22T19:42:42.016Z info: [inference][203] Starting an inference job for bookmark with id "smevi67ztewic14e3qv1aime"
stdout Getting text from response
stdout 2024-08-22T19:43:12.583Z info: [inference][203] Starting an inference job for bookmark with id "smevi67ztewic14e3qv1aime"
stdout Getting text from response
stdout 2024-08-22T19:43:42.581Z error: [inference][203] inference job failed: Error: Timeout
stdout 2024-08-22T19:43:44.484Z info: [inference][205] Starting an inference job for bookmark with id "vgynzse8zf56agjkbywu1d38"
stdout Getting text from response
stdout 2024-08-22T19:44:15.008Z info: [inference][205] Starting an inference job for bookmark with id "vgynzse8zf56agjkbywu1d38"
stdout Getting text from response
stdout 2024-08-22T19:44:46.119Z info: [inference][205] Starting an inference job for bookmark with id "vgynzse8zf56agjkbywu1d38"
stdout Getting text from response
stdout 2024-08-22T19:45:16.759Z info: [inference][205] Starting an inference job for bookmark with id "vgynzse8zf56agjkbywu1d38"
stdout Getting text from response
stdout 2024-08-22T19:45:46.757Z error: [inference][205] inference job failed: Error: Timeout
stdout 2024-08-22T19:45:47.462Z info: [inference][207] Starting an inference job for bookmark with id "ziwptxbaqofot2yud3qth5e7"
stdout Getting text from response
stdout 2024-08-22T19:46:18.110Z info: [inference][207] Starting an inference job for bookmark with id "ziwptxbaqofot2yud3qth5e7"
stdout Getting text from response
stdout 2024-08-22T19:46:48.969Z info: [inference][207] Starting an inference job for bookmark with id "ziwptxbaqofot2yud3qth5e7"
stdout Getting text from response
stdout 2024-08-22T19:47:19.313Z info: [inference][207] Starting an inference job for bookmark with id "ziwptxbaqofot2yud3qth5e7"
stdout Getting text from response
stdout 2024-08-22T19:47:49.305Z error: [inference][207] inference job failed: Error: Timeout

@wbste
Copy link

wbste commented Sep 13, 2024

Yeah, I'd like a adjustable timeout too. I don't really need "instant" replies for tags or images...just get there someday 😄

2024-09-12 22:33:38 web-1 | 2024-09-13T05:33:38.704Z error: [inference][3] inference job failed: Error: Timeout

Edit: Actually after looking a bit more, maybe it's the ollama side.

time=2024-09-12T22:43:48.279-07:00 level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet"

I think there's an issue on my mini pc swapping the models, or maybe how a tag and image info are requested? I'll dig in more...

@MohamedBassem
Copy link
Collaborator

This is going to be available in the next release. Sorry for how long it took me to get to this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants