Goose extends the capabilities of high-performing LLMs through a small collection of tools. This lets you instruct goose, currently via a CLI interface, to automatically solve problems on your behalf. It attempts to not just tell you how you can do something, but to actually do it for you.
The primary mode of goose (the "developer" toolkit) has access to tools to
- maintain a plan
- run shell commands
- read, create, and edit files
Together these can solve all kinds of problems, and we emphasize performance on tasks like fully automating adhoc scripts and tasks, interacting with existing code bases, and teaching how to use new technology.
Here are some of the key design decisions about how we drive performance on these tasks with goose, that you should be able to observe by using it.
- Encouraging it to write and maintain a plan, to allow it to accomplish longer sequences of automation
- Using tool usage as a generalizable and increasingly tuned approach to adding new capabilities (including plugins)
- Relying on reflection at every possible part of the stack
- Showing it clear output of each tool use
- Surfacing all possible errors to the model to give it a chance to correct
- Surfacing the plan to document what has been accomplished
Tip
In addition, there are some implementation choices that we've found very performance driving. The share a theme of working well by default without constraining the model.
- Encouraging the model to use
ripgrep
via the shell performs very well for navigating filesystems. It mostly just works, but enables the model to get clever with regexes or even additional shell operations as needed. - Using a replace operation for editing files requires fewer tokens to be generated and avoids laziness on large files, but we allow fall back to whole file overwrites to let it more coherently handle major refactors.
The core execution logic for generation and tool calling is handled by exchange. It hooks python functions into the model tool use loop, while defining very careful error handling so any failures in tools are surfaced to the model.
Once we've created an exchange object, running the process is effectively just calling
exchange.reply()
.
The key is setting up an exchange with the capabilities we need.
Goose builds that exchange:
- allows users to configure a profile to customize capabilities
- provides a pluggable system for adding tools and prompts
- sets up the tools to interact with state
We expect that goose will have multiple UXs over time, and be run in different
environments. The UX is expected to be able to load a Profile
(e.g. in the CLI
we read profiles out of a config) and to provide a Notifier
(e.g. in the CLI we put
notifications on stdout).
Goose then constructs the exchange for the UX, the UX only interacts with that exchange.
def build_exchange(profile: Profile, notifier: Notifier) -> Exchange:
...
But to setup a configurable system, Goose uses Toolkit
s:
(Profile, Notifier) -> [Toolkits] -> Exchange
A profile specifies some basic configuration in Goose, such as which models it should use, as well as which toolkits it should include.
processor: openai:gpt-4o
accelerator: openai:gpt-4o-mini
moderator: passive
toolkits:
- assistant
- calendar
- contacts
- name: scheduling
requires:
assistant: assistant
calendar: calendar
contacts: contacts
The notifier is a concrete implementation of the Notifier base class provided by each UX. It needs to support two methods
class Notifier:
def log(self, RichRenderable):
...
def status(self, str):
...
Log is meant to record something concrete that happened, such as a tool being called, and status is intended
for transient displays of the current status. For example, while a shell command is running, it might use
.log
to record the command that started, and then update the status to "shell command running"
. Log is durable
while Status is ephemeral.
Toolkits are a collection of tools, along with the state and prompting they require. Toolkits are what gives Goose its capabilities.
Tools need a way to report what's happening back to the user, which we treat similarly to logging. To make that possible, toolkits get a reference to the interface described above.
class ScheduleToolkit(Toolkit):
def __init__(self, notifier: Notifier, requires: Requirements, **kwargs):
super().__init__(notifier, requires, **kwargs) # handles the interface, exchangeview
# for a class that has requirements, you can get them like this
self.calendar = requires.get("calendar")
self.assistant = requires.get("assistant")
self.contacts = requires.get("contacts")
self.appointments_state = []
def prompt(self) -> str:
return "Try out the example tool."
@tool
def example(self):
self.interface.log(f"An example tool was called, current state is {self.state}")
Dependencies: Toolkits can depend on each other, to make it easier to get plugins to extend or modify existing capabilities. In the config above, you can see this used for the scheduling toolkit. You can refer to those requirements in code through:
@tool
def example_dependency(self):
appointments = self.dependencies["calendar"].appointments
...
ExchangeView: It can also be useful for tools to have a read-only copy of the history
of the loop so far. So for advanced use cases, toolkits also have access to an
ExchangeView
object.
@tool
def example_history(self):
last_message = self.exchange_view.processor.messages[-1]
...