less runner: actually allow concurrent predictions and refactor runner #1499

technillogue · 2024-01-26T21:54:52Z

add concurrency to config
this basically works!
more descriptive names for predict functions
maybe pass through prediction id and try to make cancelation do both?
don't cancel from signal handler if a loop is running. expose worker busy state to runner
move handle_event_stream to PredictionEventHandler
make setup and canceling work

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9)

this is the first step towards supporting continuous batching and concurrent predictions. eventually, you will be configure it so your predict function will be called concurrently * bare minimum to support async predict * add async tests Signed-off-by: technillogue <technillogue@gmail.com>

* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b)

* async Worker._wait and its consequences * AsyncPipe so that we can process idempotent endpoint and cancellation rather than _wait blocking the event loop * test_prediction_cancel can be flaky on some machines * separate _process_list to be less surprising than isasyncgen * sleep wasn't needed * suggestions from review * suggestions from review * even more suggestions from review --------- Signed-off-by: technillogue <technillogue@gmail.com> Co-authored-by: Nick Stenning <nick@whiteink.com>

* race utility for racing awaitables * start mux, tag events with id, read pipe in a task, get events from mux * use async pipe for async child loop * _shutting_down vs _terminating * race with shutdown event * keep reading events during shutdown, but call terminate after the last Done * emit heartbeats from mux.read * don't use _wait. instead, setup reads event from the mux too * worker semaphore and prediction ctx * where _wait used to raise a fatal error, have _read_events set an error on Mux, and then Mux.read can raise the error in the right context. otherwise, the exception is stuck in a task and doesn't propagate correctly * fix event loop errors for <3.9 * keep track of predictions in flight explicitly and use that to route logs * don't wait for executor shutdown * progress: check for cancelation in task done_handler * let mux check if child is alive and set mux shutdown after leaving read event loop * close pipe when exiting * predict requires IDLE or PROCESSING * try adding a BUSY state distinct from PROCESSING when we no longer have capacity * move resetting events to setup() instead of _read_events() previously this was in _read_events because it's a coroutine that will have the correct event loop. however, _read_events actually gets created in a task, which can run *after* the first mux.read call by setup. since setup is now the first async entrypoint in worker and in tests, we can safely move it there * state_from_predictions_in_flight instead of checking the value of semaphore * make prediction_ctx "private" Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

…busy state to runner Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

…point return the same result and fix tests somewhat Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

technillogue changed the base branch from main to async January 26, 2024 21:55

technillogue marked this pull request as draft January 26, 2024 21:55

technillogue force-pushed the syl/kill-runner branch 2 times, most recently from 8c494a8 to 0b1ad74 Compare January 29, 2024 18:50

technillogue changed the title ~~less runner~~ less runner: actually allow concurrent predictions and refactor runner Jan 29, 2024

technillogue changed the base branch from async to syl/mux January 30, 2024 06:02

technillogue requested review from nickstenning and mattt January 30, 2024 21:20

technillogue force-pushed the syl/mux branch 2 times, most recently from 7559564 to ec19c1e Compare February 2, 2024 19:40

technillogue mentioned this pull request Feb 2, 2024

Mux prediction events #1405

Merged

technillogue force-pushed the syl/kill-runner branch from c2a6750 to ed9970a Compare February 2, 2024 20:47

yorickvP added the async label Feb 8, 2024

Base automatically changed from syl/mux to async February 12, 2024 21:09

technillogue force-pushed the syl/kill-runner branch from ed9970a to 2a4fa1f Compare February 12, 2024 21:10

technillogue mentioned this pull request Feb 12, 2024

omnibus actual concurrency and major refactor #1530

Merged

async runner (#1352)

0fb45ec

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9)

technillogue force-pushed the async branch from fb41455 to 55c0468 Compare February 13, 2024 07:33

technillogue force-pushed the async branch from 55c0468 to cc246ac Compare February 13, 2024 07:37

technillogue and others added 5 commits February 13, 2024 02:38

lints

fa92040

format

9efc0e4

technillogue force-pushed the syl/kill-runner branch from 2a4fa1f to b898f7a Compare February 13, 2024 07:40

technillogue force-pushed the async branch 3 times, most recently from 85d2814 to f57474d Compare February 13, 2024 07:45

add concurrency to config

12fef3c

Signed-off-by: technillogue <technillogue@gmail.com>

technillogue added 15 commits February 19, 2024 13:01

this basically works!

f620cd3

Signed-off-by: technillogue <technillogue@gmail.com>

more descriptive names for predict functions

0ad9375

Signed-off-by: technillogue <technillogue@gmail.com>

maybe pass through prediction id and try to make cancelation do both?

0cce811

Signed-off-by: technillogue <technillogue@gmail.com>

don't cancel from signal handler if a loop is running. expose worker …

66737b8

…busy state to runner Signed-off-by: technillogue <technillogue@gmail.com>

move handle_event_stream to PredictionEventHandler

f88384f

Signed-off-by: technillogue <technillogue@gmail.com>

make setup and canceling work

3f6914f

Signed-off-by: technillogue <technillogue@gmail.com>

drop some checks around cancelation

92eef33

Signed-off-by: technillogue <technillogue@gmail.com>

try out eager_predict_state_change

e172339

Signed-off-by: technillogue <technillogue@gmail.com>

keep track of multiple runner prediction tasks to make idempotent end…

08c4342

…point return the same result and fix tests somewhat Signed-off-by: technillogue <technillogue@gmail.com>

fix idempotent tests

7ba2d3a

Signed-off-by: technillogue <technillogue@gmail.com>

fix remaining errors?

7e59565

Signed-off-by: technillogue <technillogue@gmail.com>

worker predict_generator shouldn't be eager

924c17d

Signed-off-by: technillogue <technillogue@gmail.com>

wip: make the stuff that handles events and sends webhooks etc async

def884e

Signed-off-by: technillogue <technillogue@gmail.com>

drop Runner._result

8b009bd

Signed-off-by: technillogue <technillogue@gmail.com>

drop comments

a7a1937

Signed-off-by: technillogue <technillogue@gmail.com>

technillogue force-pushed the syl/kill-runner branch from b898f7a to a7a1937 Compare February 19, 2024 18:01

technillogue force-pushed the async branch 4 times, most recently from 1e8c300 to 335f67b Compare February 21, 2024 21:16

technillogue mentioned this pull request Mar 12, 2024

replace requests with httpx and factor out clients #1574

Merged

technillogue closed this Apr 12, 2024

technillogue deleted the syl/kill-runner branch June 18, 2024 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

less runner: actually allow concurrent predictions and refactor runner #1499

less runner: actually allow concurrent predictions and refactor runner #1499

technillogue commented Jan 26, 2024 •

edited

Loading

less runner: actually allow concurrent predictions and refactor runner #1499

less runner: actually allow concurrent predictions and refactor runner #1499

Conversation

technillogue commented Jan 26, 2024 • edited Loading

technillogue commented Jan 26, 2024 •

edited

Loading