Ability to execute prompts against an asyncio (non-blocking) API #507

simonw · 2024-06-10T04:54:53Z

Datasette in particular needs a neat way to run different models via an await ... async method. Having a way for model plugins to provide this - especially the API based plugins like OpenAI and Claude - would be really helpful and would unblock a bunch of Datasette plugins that could benefit from an LLM abstraction layer.

The text was updated successfully, but these errors were encountered:

simonw · 2024-11-06T02:17:51Z

Two options:

Plugins have to specifically upgrade to offer this
Works for all plugins - those that have not yet upgraded (like most of the local model ones) can run in a thread and LLM will provide an async interface to them anyway

I'm going to build 1. I will treat 2. as an optional nice-to-have in the future.

I expect that any of the API-driven plugins which do not use a client library (like llm-mistral) will want a mechanism where they can define how the HTTP requests work and then some abstraction makes that work with both blocking and asyncio httpx.

simonw · 2024-11-06T02:19:37Z

Two options for the API design:

Keep the current model classes, but add await model.aprompt(...) style methods, similar to the Django ORM's await author.books.afirst() style of API design
Offer a separate async class - so model = llm.get_async_model("gpt-4o") and then async for token in model.prompt(...)

I'm torn on these two. Not sure how to decide.

simonw · 2024-11-06T02:26:10Z

I used Claude as an electric bicycle for the mind and I've decided to go with the separate async class. I think having a separate class will make things like type hints more obvious. https://gist.github.com/simonw/f69e6a1fe21df5007ce038dfe91c62f4

simonw · 2024-11-06T02:29:57Z

I'm going to add a llm prompt ... --async option purely for my own testing which switches to the async version of a model.

simonw · 2024-11-06T02:30:50Z

I'm going to have to do this for embedding models, too.

simonw · 2024-11-06T02:33:58Z

This may be tricky:

llm/llm/cli.py

Lines 361 to 363 in fe1e097

    
           prompt_method = model.prompt 
        
           if conversation: 
        
               prompt_method = conversation.prompt

That's a reminder that sometimes we do model.prompt() and sometimes we do conversation.prompt() - which means we'll need an async mechanism for conversations, too - or some kind of thing where we can get an async alternative version of a conversation.

One option: instead of conversation.prompt() we go back to model.prompt() and add conversation= as a keyword argument.

Reminder though: the reason we have conversation.prompt() is the idea was to add tool support as a feature of conversations, since when you run a tool-function you may end up having a single prompt that executes multiple rounds against the LLM as it executes functions and sends in their outputs.

simonw · 2024-11-06T02:53:45Z

From https://community.openai.com/t/asynchronous-use-of-the-library/479414/4 it looks like OpenAI used to have a .acreate() method and switched to a AsyncOpenAI() client instead.

simonw · 2024-11-06T02:55:03Z

OpenAI async docs are here: https://github.com/openai/openai-python?tab=readme-ov-file#async-usageimport

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()


async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
    )

simonw · 2024-11-06T03:02:32Z

I got the first prototype of this working (minus logging to the database):

llm hi --async

Hello! How can I assist you today?

The --no-stream option is currently broken though:

llm hi --async --no-stream

Error: object AsyncResponse can't be used in 'await' expression

simonw · 2024-11-06T03:06:01Z

Claude helped with the refactoring of Model into AsyncModel: https://gist.github.com/simonw/3554b00117b16eb65889a747e32a96ae

simonw · 2024-11-06T03:06:47Z

Before I commit further to this API, let's see how this feels from Python library code.

simonw · 2024-11-06T03:07:49Z

This is interesting:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> model.prompt("say hi in spanish")
/Users/simon/Dropbox/Development/llm/llm/models.py:390: RuntimeWarning: coroutine 'AsyncResponse.text' was never awaited
  return "<Response prompt='{}' text='{}'>".format(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
<Response prompt='say hi in spanish' text='<coroutine object AsyncResponse.text at 0x3133d84a0>'>

That's because the default __repr__() method for a response tries to use self.text() which is now an awaitable.

Refs #507 (comment)

simonw · 2024-11-06T03:11:37Z

Using python -m asynco again:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> response = model.prompt("say hi in spanish")
>>> response
<Response prompt='say hi in spanish' text='... not yet awaited ...'>
>>> await response.text()
'¡Hola!'
>>> response
<Response prompt='say hi in spanish' text='¡Hola!'>

simonw · 2024-11-06T03:13:27Z

This output tokens one at a time:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> async for token in model.prompt("describe a good dog in french"):
...     print(token, end="", flush=True)
... 
Un bon chien est fidèle, affectueux et protecteur. Il aime passer du temps avec sa famille et est toujours prêt à jouer. Sa loyauté est sans égale, et il fait tout pour garder ses maîtres heureux. Un bon chien est aussi intelligent et facile à dresser, ce qui lui permet d'apprendre rapidement des commandes et des tours. Sa présence apporte réconfort et joie, et il sait être un excellent compagnon, toujours là dans les moments de bonheur comme dans les moments difficiles. En somme, un bon chien est un véritable ami pour la vie.

simonw · 2024-11-06T03:16:43Z

This async API feels pretty good to me. I think this may be the right design.

Still to figure out:

How do plugins register their async versions? Might need a register_async_models() hook.
Get another plugin working - Anthropic or Gemini would be good
How does logging to the DB work? That's currently a method on Model, needs to move out and an async version needs to be considered.
How about loading FROM the database? Will that work for things like attachments too?
Testing for all of this
Maybe try getting a llm-gguf plugin to work with this? Might help confirm the API design further by showing it working with a blocking plugin (maybe via a run-in-thread hack)
Are there any other weird knock-on impacts of this I haven't considered yet?
How should embeddings work? Can I punt on that for the first release, maybe for the first alpha?

simonw · 2024-11-06T03:30:28Z

Reminder: model registration currently looks like this:

llm/llm/default_plugins/openai_models.py

Lines 24 to 45 in fe1e097

    
           @hookimpl 
        
           def register_models(register): 
        
               register(Chat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt")) 
        
               register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k")) 
        
               register(Chat("gpt-4"), aliases=("4", "gpt4")) 
        
               register(Chat("gpt-4-32k"), aliases=("4-32k",)) 
        
               # GPT-4 Turbo models 
        
               register(Chat("gpt-4-1106-preview")) 
        
               register(Chat("gpt-4-0125-preview")) 
        
               register(Chat("gpt-4-turbo-2024-04-09")) 
        
               register(Chat("gpt-4-turbo"), aliases=("gpt-4-turbo-preview", "4-turbo", "4t")) 
        
               # GPT-4o 
        
               register(Chat("gpt-4o", vision=True), aliases=("4o",)) 
        
               register(Chat("gpt-4o-mini", vision=True), aliases=("4o-mini",)) 
        
               # o1 
        
               register(Chat("o1-preview", can_stream=False, allows_system_prompt=False)) 
        
               register(Chat("o1-mini", can_stream=False, allows_system_prompt=False)) 
        
               # The -instruct completion model 
        
               register( 
        
                   Completion("gpt-3.5-turbo-instruct", default_max_tokens=256), 
        
                   aliases=("3.5-instruct", "chatgpt-instruct"), 
        
               )

If I add a new separate register_async_models() hook I need to think about things like aliases - it wouldn't make sense for a sync model and an async model to have different aliases from each other.

Given that, I think I want the existing register_models() hook and register() callback function to handle both types of model.

So what could that look like? One option:

def register_models(register):
    register(Chat("gpt-3.5-turbo"), AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

So register() takes an optional second argument.

Problem there is that I may eventually have models which are available only in their async variant.

So I could do this:

def register_models(register):
    register(model=Chat("gpt-3.5-turbo"), async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

That would free me up to support this:

def register_models(register):
    register(async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

That's not terrible, it may be the way to go.

simonw · 2024-11-06T03:31:17Z

I think the llm.get_model() and llm.get_async_model() methods should be able to spot when a model is available in sync but not async variants (and vice versa) and return a more useful error message than just "model not found".

simonw · 2024-11-06T03:32:47Z

The only place that calls that plugin hook (and hence the only place that defines the register() method right now) is here:

llm/llm/__init__.py

Lines 66 to 85 in fe1e097

    
           def get_models_with_aliases() -> List["ModelWithAliases"]: 
        
               model_aliases = [] 
        
               # Include aliases from aliases.json 
        
               aliases_path = user_dir() / "aliases.json" 
        
               extra_model_aliases: Dict[str, list] = {} 
        
               if aliases_path.exists(): 
        
                   configured_aliases = json.loads(aliases_path.read_text()) 
        
                   for alias, model_id in configured_aliases.items(): 
        
                       extra_model_aliases.setdefault(model_id, []).append(alias) 
        
               def register(model, aliases=None): 
        
                   alias_list = list(aliases or []) 
        
                   if model.model_id in extra_model_aliases: 
        
                       alias_list.extend(extra_model_aliases[model.model_id]) 
        
                   model_aliases.append(ModelWithAliases(model, alias_list)) 
        
               pm.hook.register_models(register=register) 
        
               return model_aliases

simonw · 2024-11-06T03:46:33Z

Got this working:

% llm -m claude-3.5-sonnet hi     
Hello! How can I help you today?
% llm -m claude-3.5-sonnet hi --async
Error: 'Unknown async model (sync model exists): claude-3.5-sonnet'

Refs #507 (comment)

simonw · 2024-11-06T03:48:39Z

I have a hunch the saving-to and loading-from the database bit may end up producing new plugin hooks. I'll need those for Datasette at any rate since it has its own mechanism for executing database writes.

Although I may not need plugin hooks for that if I stick to keeping the database persistence code in the CLI layer.

But... I already have a need for plugins like llm-cmd and llm-jq to implement their own database logging, which they currently skip out on entirely.

So it may be that it's not so much a plugin as a documented interface for blocking database logging (for use in other blocking plugins) and a documented "roll your own" policy for async database logging.

Which may grow into plugin hooks for database persistence that can be both sync and async friendly at a later date.

simonw · 2024-11-06T04:15:04Z

I'm doing this in the prototype right now and I really don't like it:

llm/llm/default_plugins/openai_models.py

Line 444 in 44e6be1

class AsyncChat(AsyncModel, Chat):

That's multiple inheritance AND I'm inheriting from the non-async model class!

I'm going to refactor those to a common base class instead.

simonw · 2024-11-06T04:21:38Z

Got Claude to refactor that for me: https://gist.github.com/simonw/3872e758c129917980605eed49cbce7f

Refs #507 (comment)

simonw · 2024-11-13T02:41:14Z

I should do an async Ollama spike: https://github.com/ollama/ollama-python/blob/main/examples/async-chat-stream/main.py

Then I can try this against local models like these:

Refs #507

Refs #25 Refs simonw/llm#507 Refs simonw/llm#613

…els (#613) - #507 (comment) * register_model is now async aware Refs #507 (comment) * Refactor Chat and AsyncChat to use _Shared base class Refs #507 (comment) * fixed function name * Fix for infinite loop * Applied Black * Ran cog * Applied Black * Add Response.from_row() classmethod back again It does not matter that this is a blocking call, since it is a classmethod * Made mypy happy with llm/models.py * mypy fixes for openai_models.py I am unhappy with this, had to duplicate some code. * First test for AsyncModel * Still have not quite got this working * Fix for not loading plugins during tests, refs #626 * audio/wav not audio/wave, refs #603 * Black and mypy and ruff all happy * Refactor to avoid generics * Removed obsolete response() method * Support text = await async_mock_model.prompt("hello") * Initial docs for llm.get_async_model() and await model.prompt() Refs #507 * Initial async model plugin creation docs * duration_ms ANY to pass test * llm models --async option Refs #613 (comment) * Removed obsolete TypeVars * Expanded register_models() docs for async * await model.prompt() now returns AsyncResponse Refs #613 (comment) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Refs #507, #599, #600, #603, #608, #611, #612, #613, #614, #615, #616, #621, #622, #623, #626, #629

* Tip about pytest --record-mode once Plus mechanism for setting API key during tests with PYTEST_ANTHROPIC_API_KEY * Async support for Claude models Closes #25 Refs simonw/llm#507 Refs simonw/llm#613 * Depend on llm>=0.18a0, refs #25

Refs #25 Refs simonw/llm#507 Refs simonw/llm#613

Refs #507, #600, #603, #608, #611, #612, #614

Refs #25, #20, #24 Refs simonw/llm#507 Refs simonw/llm#613

Refs #507, #600, #603, #608, #611, #612, #614

Refs simonw/llm#507

simonw added the enhancement New feature or request label Jun 10, 2024

simonw added a commit that referenced this issue Nov 6, 2024

First WIP prototype of async mode, refs #507

e26e7f7

simonw added a commit that referenced this issue Nov 6, 2024

Fix for llm hi --async --no-stream, refs #507

1d8c3f8

simonw added a commit that referenced this issue Nov 6, 2024

Fix for coroutine in __repr__

b27b275

Refs #507 (comment)

simonw added a commit that referenced this issue Nov 6, 2024

register_model is now async aware

44e6be1

Refs #507 (comment)

simonw changed the title ~~asyncio support~~ Ability to execute prompts against an asyncio (non-blocking) API Nov 6, 2024

simonw added a commit that referenced this issue Nov 6, 2024

Refactor Chat and AsyncChat to use _Shared base class

2b6f5cc

Refs #507 (comment)

simonw mentioned this issue Nov 6, 2024

Asynchronous API support #487

Closed

simonw mentioned this issue Nov 6, 2024

llm.get_async_model(), llm.AsyncModel base class and OpenAI async models #613

Merged

9 tasks

simonw pinned this issue Nov 6, 2024

simonw mentioned this issue Nov 8, 2024

Should LLM be tightly coupled to SQLite? #619

Open

simonw mentioned this issue Nov 13, 2024

async support for embeddings #628

Open

simonw added a commit that referenced this issue Nov 13, 2024

Initial docs for llm.get_async_model() and await model.prompt()

ceb60d2

Refs #507

simonw mentioned this issue Nov 14, 2024

Async support simonw/llm-claude-3#25

Closed

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 14, 2024

Async support for Claude models

041386e

Refs #25 Refs simonw/llm#507 Refs simonw/llm#613

simonw added a commit that referenced this issue Nov 14, 2024

Release 0.18a0

041730d

Refs #507, #599, #600, #603, #608, #611, #612, #613, #614, #615, #616, #621, #622, #623, #626, #629

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 14, 2024

Release 0.9a0

b1bab19

Refs #25 Refs simonw/llm#507 Refs simonw/llm#613

This was referenced Nov 14, 2024

Proxy APIs for making calls to models simonw/datasette-llm#3

Open

Use new LLM async support to get models other than just Anthropic datasette/datasette-query-assistant#13

Open

simonw unpinned this issue Nov 17, 2024

simonw added a commit that referenced this issue Nov 17, 2024

Release 0.18

a6d62b7

Refs #507, #600, #603, #608, #611, #612, #614

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 17, 2024

Release 0.9

1e6ffef

Refs #25, #20, #24 Refs simonw/llm#507 Refs simonw/llm#613

simonw added a commit that referenced this issue Nov 18, 2024

Release 0.18

fcdac08

Refs #507, #600, #603, #608, #611, #612, #614

simonw mentioned this issue Nov 19, 2024

Async support for LLM 0.18+ simonw/llm-mistral#13

Closed

simonw added a commit to simonw/llm-mistral that referenced this issue Nov 19, 2024

Async model support, closes #13

98e0ae7

Refs simonw/llm#507

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to execute prompts against an asyncio (non-blocking) API #507

Ability to execute prompts against an asyncio (non-blocking) API #507

simonw commented Jun 10, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 13, 2024 •

edited

Loading

Ability to execute prompts against an asyncio (non-blocking) API #507

Ability to execute prompts against an asyncio (non-blocking) API #507

Comments

simonw commented Jun 10, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 13, 2024 • edited Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 13, 2024 •

edited

Loading