Framework for building actionable agents to achieve goals specified by user, powered by OpenAI GPT-4 and GPT-3.5
The optimal way to run Golem-GPT is to use the Docker image or Docker Compose.
- Docker or Python 3.8+ environment
- OpenAI API key
Put credentials to .env
:
OPENAI_API_KEY=...
OPENAI_ORG_ID=...
OPENAI_MODEL=gpt-4
(For gpt-4
model you should have an early access enabled, it's not publicly available yet).
Run it via Docker Compose:
docker compose build && docker compose run app
or via Python:
pip install --upgrade golem-gpt
python -m golemgpt
❗️ It's safer to run it inside Docker to have it isolated. Because Golem can access an environment and filesystem, so it's better to keep it inside a container.
We introduce a novel framework for building Golems (actionable agents) which is based on the following high-level concepts:
-
Goals: a set of goals, initially defined by user's input. Goals could be high-level definitions, like "I want to build a web app", or low-level definitions, like "I want to create a new file with content 'Hello, world!'"
-
Cognitron: a language model, which interprets input text and produces an action plan, or other kind of structured output. It runs on top of OpenAI models, which could be potentially replaced with any other language model.
-
Lexicon: a set of rules and dictionary to generate prompts for Cognitron and interpret its structured output.
-
Action plan: a structured output of Cognitron, which is a set of actions to be executed for achieving goals.
-
Actions: a predefined executables or functions, which can interact with the environment to achieve goals. Actions could also be recursive or delegate their execution to other Golems.
-
Memory: a storage for the Golem's current state, which can also be saved and loaded to continue the job later.
-
Codex: a built-in Golem's moderator, which is responsible for checking the agent's actions and preventing it from doing something unexpected. Codex has its own Cognitrion and Lexicon.
NOTE: In our implementation, Actions are implemented as Python functions
How is it different from AutoGPT?
We build it because we like it. Our implementation is not as advanced as AutoGPT to the moment, but it has some unique focus:
- Keeping it simple and easy to modify, also with minimal dependencies
- While keeping the core simple, we aim to make it extensible via custom actions, roles, policies, and other components
- We think of it as interactive tool, not necessarily to be fully autonomous
- Also we test it with GPT-3.5, which porbably sounds not super-hyped, but it's waaay cheaper and delivers results good enough for many use cases
- We are going to utilize it in our own development cycle, and refine it to fit real needs in software development
The pipeline consists of the following actions:
- ask_human_input(query)
- get_os_details()
- get_local_date()
- read_file(filename)
- write_file(filename, content)
- summarize_file(filename, hint, to_filename)
- http_download(url, method, headers, body, to_filename)
- create_python_script(name, description, in_files, out_files)
- create_shell_script(name, description, in_files, out_files)
- run_script(name)
- ask_google(query, to_filename)
- delegate_job(goal, role, in_files, out_files)
- explain(comment)
- reject_job(message)
- finish_job(message)
Setup virtualenv:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip pip-tools
pip-compile
pip install -r requirements.txt
Put variables to .env
:
OPENAI_API_KEY=...
OPENAI_ORG_ID=...
Start a new job:
python -m golemgpt
Continue saved job:
python -m golemgpt -j <job key>
Terminate simply with ^C or empty input.
Golem-GPT is licensed under the Apache-2.0.
Authors:
- Alexey Kinev rudy@05bit.com