Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register_models() plugin hook #65

Merged
merged 66 commits into from
Jul 10, 2023
Merged

register_models() plugin hook #65

merged 66 commits into from
Jul 10, 2023

Conversation

simonw
Copy link
Owner

@simonw simonw commented Jun 26, 2023

TODO:

llm/cli.py Outdated Show resolved Hide resolved
@simonw
Copy link
Owner Author

simonw commented Jun 26, 2023

For continue mode, it's all about building the prompt.

Here's the current implementation:

llm/llm/cli.py

Lines 176 to 184 in 9190051

chat_id, history = get_history(_continue)
history_model = None
if history:
for entry in history:
if entry.get("system"):
messages.append({"role": "system", "content": entry["system"]})
messages.append({"role": "user", "content": entry["prompt"]})
messages.append({"role": "assistant", "content": entry["response"]})
history_model = entry["model"]

The key bit of get_history() is this:

llm/llm/cli.py

Lines 552 to 555 in 9190051

rows = db["log"].rows_where(
"id = ? or chat_id = ?", [chat_id, chat_id], order_by="id"
)
return chat_id, rows

So maybe this is all about building a prompt an alternative way, perhaps like this:

# History is a list of logged messages:
prompt = Prompt.from_history(history, prompt, system)

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2023

OpenAI chat models work by building up that messages= array, but other models could potentially also support conversations through custom prompt assembly - by putting together a User: X\nAssistant: Y\nUser: Z\nAssistant: string for example.

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2023

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2023

Untangling chat mode for PaLM 2 is a bigger job than I want to take on right now - more notes here: #20 (comment)

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2023

I'm not happy with how streaming works yet.

I think models should have the following methods:

  • .prompt(...) - run a prompt and return the response
  • .stream(...) - run a prompt and return an object that can be used to iterate over a streaming response
  • .chain(...) - run a prompt with the ability for it to trigger additional prompt executions (needed for things like OpenAI functions). Returns an iterator over those prompt responses.
  • .chain_stream(...) - same again, but each returned object can also be iterated over to obtain tokens from a stream.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

I was contemplating if a Model even needs to be a class - could I instead define it purely in terms of functions?

Then I remembered that there are some LLMs that run locally that have significant startup costs (loading data into memory/the GPU) - which is a good fit for a class, because that way they can be loaded once and then reused for subsequent calls to .prompt() and suchlike.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Copying in some miscellaneous notes I made on the API design:

The streaming vs not streaming thing still feels a bit ugly.

The key problem I'm trying to solve is providing a generic interface to an LLM that will enable the following:

  • CLI access through the same tool
  • Comparison prompts across multiple LLMs
  • Unified storage of prompts in SQLite
  • Web interface for all of them
  • Unified prompt template interface
  • The continue conversation feature
  • Basic secret key management

I want the simplest possible API to implement in order to solve all of these - adding new models should be as easy as possible.

So maybe two methods: prompt(...) and stream(...)

Perhaps have a Model and StreamingModel(Model) class - models can subclass either, if you subclass StreamingModel then you have to implement both methods.

If there is ever a model that only does streaming the subclass can still implement prompt() and have that call stream().

Maybe Prompt and Response could be inner classes on the Model subclass:

class VertexModel(Model):
    class Options(Model.Options):
        temperature: float
    class Response:
        ..

Prompt andResponse chould be responsible for turning themselves into loggable records.

It would be good to have a Python library utility mechanism for this logging, since then users could log prompts that their app is using easily.

Maybe it's the Model that manages logging since that can see both prompt and response?

Logging should be off by default but easy to opt into, maybe by providing a Log instance or callback?

A Record dataclass might be good for representing that. And a llm.log.Database class that can write those to SQLite.

So the internals documentation ends up covering these concepts:

  • Model
  • Template
  • Prompt
  • Response
  • Result
  • Logger?
  • Key
  • alias

For OpenAI functions I could introduce a .sequence() method which can run more then one prompt (or .multi() or .chain()).

So the API for a model is

response = model.prompt(...)

Or

for chunk in model.stream(...)

for response in model.chain(...):
    print(response.text())

Maybe this:

for response in model.chain_stream(...):
    for token in response:
        print(token, end="")

I'm going with response.text() and not response.text because you cannot await a property.

What abstraction could I build so other chains can be easily constructed? Like there's a thing where the user gets to define a function that takes a response and decides if there should be another prompt (which the functions stuff can then be built on):

def next(response, prompt):
    return None # or str or Prompt

model.chain(next)

model.chain(lambda response: "fact check this: $input", once=True)

So once=True means only do the chain function once. Or should they be the default and use repeat=True to keep it going?

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

I'm going to try a bit of documentation-driven development here.

simonw added a commit that referenced this pull request Jul 1, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Here's that first draft of the internals documentation - next step, actually implement what's described there: https://github.com/simonw/llm/blob/c2ec8a9c60ac38d152ed48ba8c7c067c8d2c9859/docs/python-api.md

@simonw simonw mentioned this pull request Jul 1, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

The default implementation of continue mode can be really basic: it just grabs the text version of the prompts and responses from the previous messages into a list and joins them with newlines.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

I'm still not clear on the best way to truncate messages in continue mode, right now I'm going to leave that and allow the model to return an error - but it would be good to have a strategy for that involving automatic truncating later on.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

I don't think I should solve continuation mode until I've re-implemented logging to the database.

Although... I do want continue mode to be possible even without having database logging, at least at the Python API layer.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Got all the tests passing, partly by disabling the DB logging test.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

E       AssertionError: assert 'id          ...03T20:26:40\n' == 'id          ...03T13:26:40\n'
E         Skipping 89 identical leading characters in diff, use -v to show
E         - 020-05-03T13:26:40
E         ?           ^^
E         + 020-05-03T20:26:40
E         ?           ^^
E         - babbage:2020-05-03    openai      2020-05-03T13:26:40
E         ?                                              ^^
E         + babbage:2020-05-03    openai      2020-05-03T20:26:40
E         ?                                              ^^

I think that test fails because of timezone differences between GitHub Actions and my laptop.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Docs can also be previewed here: https://llm--65.org.readthedocs.build/en/65/python-api.html

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

I think a Response should have a .prompt property that leads back to the prompt used for that response.

Maybe there should be a way in which these chain together, for the purposes of modeling conversations in situations where the SQLite log isn't being used?

Then perhaps you could do this:

response = model.prompt("Ten names for a pet pelican")
print(response.text())
response2 = response.reply("Make them more grandiose")
print(response2.text())

Problem with .reply() is: is that the same thing as .prompt() or .stream() or something else?

Does it make sense for that reply() method to exist on Response as opposed to on Model or even on Prompt?

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Another option: the model itself could keep a in-memory cache of its previous prompts, such that you can then reply via the model.

I'm not keen on this though, because the conversation state shouldn't be shared by every user of the model instance in a situation like llm web where the model may be serving multiple conversations at once.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

The current signature for Response:

llm/llm/models.py

Lines 28 to 33 in 69ce584

class Response(ABC):
def __init__(self, prompt: Prompt):
self.prompt = prompt
self._chunks = []
self._debug = {}
self._done = False

If it also grew a .model property for tracking the model instance that created it, it would have enough to execute replies to itself.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

What would this look like if everything was represented in terms of chains of Prompts and Responses? Those chains could then be serialized and deserialized to SQLite, or to JSON or other formats too.

Especially when function start coming into play, there's something very interesting about storing a high fidelity representation of the full sequence of prompts and responses that got to the most recent state.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Whether or not something should stream is currently a property of the Response. That works: the .reply() method could pass on the streaming state to the next prompt that is executed.

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

_______________ ERROR collecting tests/test_cli_openai_models.py _______________
ImportError while importing test module '/home/runner/work/llm/llm/tests/test_cli_openai_models.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_cli_openai_models.py:2: in <module>
    from llm.cli import cli
/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/llm/cli.py:7: in <module>
    from .plugins import pm, get_plugins, get_model_aliases, get_models_with_aliases
/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/llm/plugins.py:21: in <module>
    mod = importlib.import_module(plugin)
/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'llm.default_plugins'

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Something very broken there. I tried python setup.py sdist and then installing llm into a fresh virtualenv:

pip install ~/Dropbox/Development/llm/dist/llm-0.4.1.tar.gz

And I get this error when I run it:

% llm --help
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/foop-q39PfZK-/bin/llm", line 5, in <module>
    from llm.cli import cli
  File "/Users/simon/.local/share/virtualenvs/foop-q39PfZK-/lib/python3.11/site-packages/llm/cli.py", line 7, in <module>
    from .plugins import pm, get_plugins, get_model_aliases, get_models_with_aliases
  File "/Users/simon/.local/share/virtualenvs/foop-q39PfZK-/lib/python3.11/site-packages/llm/plugins.py", line 21, in <module>
    mod = importlib.import_module(plugin)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'llm.default_plugins'

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Fixed the setup.py bundling issue. Tests all pass now!

@simonw
Copy link
Owner Author

simonw commented Jul 1, 2023

Now that some models live in llm.default_models.openai_models they are pretty inconvenient to import.

I'm tempted to add this helper:

from llm import get_model
gpt4 = get_model("gpt-4")

This will provide Python API level access to both the model plugins mechanism and the aliases mechanism.

@simonw simonw mentioned this pull request Jul 1, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

I wanted to rebase this branch, but GitHub said there were conflicts.

Following https://stackoverflow.com/a/50012219/6083 I ran these commands locally:

git checkout register-models
git rebase main
# There was a conflict in setup.py which I fixed
git add setup.py
git rebase --continue
git push --force-with-lease

@simonw simonw merged commit 845d0f0 into main Jul 10, 2023
10 checks passed
simonw added a commit that referenced this pull request Jul 10, 2023
@simonw simonw deleted the register-models branch July 10, 2023 15:39
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

I don't like how that git push --force-with-lease seems to have reset the commit date on all of my commits though.

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Looks like I can fix that with:

git filter-repo --commit-callback '
    if commit.committer_date == b"Mon Jul 10 08:39:00 2023 -0700":
        commit.committer_date = commit.author_date
' --force
Parsed 172 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 7ac007a Show error for --continue mode, remove deleted code
Enumerating objects: 907, done.
Counting objects: 100% (907/907), done.
Delta compression using up to 12 threads
Compressing objects: 100% (271/271), done.
Writing objects: 100% (907/907), done.
Total 907 (delta 621), reused 907 (delta 621), pack-reused 0
Completely finished after 0.16 seconds.

simonw added a commit that referenced this pull request Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

That didn't quite work - it didn't actually update the commit dates.

This helped debug it:

git filter-repo --commit-callback '
    print(repr(commit.committer_date))
' --force

Turns out those dates look like this:

b'1689001152 -0700'
b'1689002848 -0700'
b'1689003110 -0700'
b'1689003276 -0700'

So I ran this command:

git filter-repo --commit-callback '
    if commit.committer_date == b"1689003540 -0700":
        commit.committer_date = commit.author_date
' --force

And then force pushed it all to main:

git push --force-with-lease origin main

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Turned that into a TIL: https://til.simonwillison.net/git/git-filter-repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant