Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aider with rich terminal display #56

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 51 additions & 35 deletions agent/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,55 @@
# How to run baseline

Step 1: Go to `config/aider.yaml` and change the config

Step 2: Run the following command
# Agent for Commit0
`agent config [OPTIONS] AGENT_NAME`: Setup the config you want agent to run with
`agent run [OPTIONS] BRANCH`: running agent on specific branch

You can also run the following command to know more information
```bash
python baselines/run_aider.py
agent -h
agent config -h
agent run -h
```
## Configure Agent
Here are all configs you can choose when you run `agent config [OPTIONS] AGENT_NAME`

`--agent_name: str`: Agent to use, we only support [aider](https://aider.chat/) for now. [Default: `aider`]
`--model-name: str`: Model to use, check [here](https://aider.chat/docs/llms.html) for more information. [Default: `claude-3-5-sonnet-20240620`]
`--use-user-prompt: bool`: Use the user prompt instead of the default prompt. [Default: `False`]
`--user-prompt: str`: The prompt sent to agent. [Default: Refer to code.]
`--run-tests: bool`: Run the tests after the agent modified the code to get feedback. [Default `False`]
`--max-iteration: int`: Maximum number of iterations for agent to run. [Default: `3`]
`--use-repo-info: bool`: Use the repository information. [Default: `False`]
`--max-repo-info-length: int`: Maximum length of the repository information to use. [Default: `10000`]
`--use-unit-tests-info: bool`: Use the unit tests information. [Default: `False`]
`--max-unit-tests-info-length: int`: Maximum length of the unit tests information to use. [Default: `10000`]
`--use-spec-info: bool`: Use the spec information. [Default: `False`]
`--max-spec-info-length: int`: Maximum length of the spec information to use. [Default: `10000`]
`--use-lint-info: bool`: Use the lint information. [Default: `False`]
`--max-lint-info-length: int`: Maximum length of the lint information to use. [Default: `10000`]
`--pre-commit-config-path: str`: Path to the pre-commit config file. [Default: `.pre-commit-config.yaml`]
`--agent-config-file: str`: Path to write the agent config. [Default: `.agent.yaml`]

## Running Agent
Here are all configs you can choose when you run `agent run [OPTIONS] BRANCH`

`--branch: str`: Branch to run the agent on, you can specific the name of the branch
`--backend: str`: Test backend to run the agent on, ignore this option if you are not adding `run_tests` option to agent. [Default: `modal`]
`--log-dir: str`: Log directory to store the logs. [Default: `logs/aider`]
`--max-parallel-repos: int`: Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. [Default: `1`]
`--display-repo-progress-num: int`: Number of repo progress displayed when running. [Default: `5`]


### Agent Example: aider
Step 1: `agent config aider`
Step 2: `agent run aider_branch`

### Other Agent:
Refer to `class Agents` in `agent/agents.py`. You can design your own agent by inheriting `Agents` class and implement the `run` method.

## Notes

### Automatically retry
Please refer to [here](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L17) for the type fo API error that aider will automatically retry.

### Large files in repo
Currently, agent will skip file with more than 1500 lines.(check `agent/agent_utils.py#L199`)

## Config

`commit0_config`:

- `base_dir`: Repos dir. Default `repos`.
- `dataset_name`: commit0 HF dataset name. Default: `wentingzhao/commit0_docstring`.
- `dataset_split`: commit0 dataset split. Default: `test`.
- `repo_split`: commit0 repo split. Default: `simpy`.
- `num_workers`: number of workers to run in parallel. Default: `10`.

`aider_config`:

- `llm_name`: LLM model name. Default: `claude-3-5-sonnet-20240620`.
- `use_user_prompt`: Whether to use user prompt. Default: `false`.
- `user_prompt`: User prompt. Default: `""`.
- `use_repo_info`: Whether to use repo info. Default: `false`.
- Repo info
- skeleton of the repo(filenames under each dir)
- function stubs

- `use_unit_tests_info`: Whether to use unit tests: unit_tests that target will be tested with. Default: `false`.
- `use_reference_info`: Whether to use reference: reference doc/pdf/website. Default: `false`.
- `use_lint_info`: Whether to use lint: lint info. Default: `false`.
- `pre_commit_config_path`: Path to pre-commit config. Default: `.pre-commit-config.yaml`.
- `run_tests`: Whether to run tests. Default: `true`.
- `max_repo_info_length`: Max length of repo info. Default: `10000`.
- `max_unit_tests_info_length`: Max length of unit tests info. Default: `10000`.
- `max_reference_info_length`: Max length of reference info. Default: `10000`.
- `max_lint_info_length`: Max length of lint info. Default: `10000`.
120 changes: 103 additions & 17 deletions agent/commit0_utils.py → agent/agent_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from pathlib import Path
from typing import List
import fitz
import yaml

from agent.class_types import AgentConfig

Expand Down Expand Up @@ -118,24 +119,95 @@ def get_file_info(file_path: Path, prefix: str = "") -> str:
return "\n".join(filter(None, tree_string))


def get_target_edit_files(target_dir: str) -> list[str]:
def collect_test_files(directory: str) -> list[str]:
"""Collect all the test files in the directory."""
test_files = []
subdirs = []

# Walk through the directory
for root, dirs, files in os.walk(directory):
if root.endswith("/"):
root = root[:-1]
# Check if 'test' is part of the folder name
if (
"test" in os.path.basename(root).lower()
or os.path.basename(root) in subdirs
):
for file in files:
# Process only Python files
if file.endswith(".py"):
file_path = os.path.join(root, file)
test_files.append(file_path)
for d in dirs:
subdirs.append(d)

return test_files


def collect_python_files(directory: str) -> list[str]:
"""List to store all the .py filenames"""
python_files = []

# Walk through the directory recursively
for root, _, files in os.walk(directory):
for file in files:
# Check if the file ends with '.py'
if file.endswith(".py"):
file_path = os.path.join(root, file)
python_files.append(file_path)

return python_files


def _find_files_to_edit(base_dir: str, src_dir: str, test_dir: str) -> list[str]:
"""Identify files to remove content by heuristics.
We assume source code is under [lib]/[lib] or [lib]/src.
We exclude test code. This function would not work
if test code doesn't have its own directory.

Args:
----
base_dir (str): The path to local library.
src_dir (str): The directory containing source code.
test_dir (str): The directory containing test code.

Returns:
-------
list[str]: A list of files to be edited.

"""
files = collect_python_files(os.path.join(base_dir, src_dir))
test_files = collect_test_files(os.path.join(base_dir, test_dir))
files = list(set(files) - set(test_files))

# don't edit __init__ files
files = [f for f in files if "__init__" not in f]
# don't edit __main__ files
files = [f for f in files if "__main__" not in f]
# don't edit confest.py files
files = [f for f in files if "conftest.py" not in f]
return files


def get_target_edit_files(target_dir: str, src_dir: str, test_dir: str) -> list[str]:
"""Find the files with functions with the pass statement."""
files = []
for root, _, filenames in os.walk(target_dir):
for filename in filenames:
if filename.endswith(".py"):
file_path = os.path.join(root, filename)
with open(file_path, "r") as file:
if " pass" in file.read():
files.append(file_path)
files = _find_files_to_edit(target_dir, src_dir, test_dir)
filtered_files = []
for file_path in files:
with open(file_path, "r", encoding="utf-8", errors="ignore") as file:
content = file.read()
if len(content.splitlines()) > 1500:
continue
if " pass" in content:
filtered_files.append(file_path)

# Remove the base_dir prefix
files = [file.replace(target_dir, "").lstrip("/") for file in files]

filtered_files = [
file.replace(target_dir, "").lstrip("/") for file in filtered_files
]
# Only keep python files
files = [file for file in files if file.endswith(".py")]

return files
return filtered_files


def get_message(
Expand Down Expand Up @@ -288,12 +360,12 @@ def get_changed_files(repo: git.Repo) -> list[str]:
return files_changed


def get_lint_cmd(repo: git.Repo, use_lint_info: bool) -> str:
"""Generate a linting command based on whether to include files changed in the latest commit.
def get_lint_cmd(repo_name: str, use_lint_info: bool) -> str:
"""Generate a linting command based on whether to include files.

Args:
----
repo (git.Repo): An instance of GitPython's Repo object representing the Git repository.
repo_name (str): The name of the repository.
use_lint_info (bool): A flag indicating whether to include changed files in the lint command.

Returns:
Expand All @@ -304,7 +376,21 @@ def get_lint_cmd(repo: git.Repo, use_lint_info: bool) -> str:
"""
lint_cmd = "python -m commit0 lint "
if use_lint_info:
lint_cmd += " ".join(get_changed_files(repo))
lint_cmd += repo_name + " --files "
else:
lint_cmd = ""
return lint_cmd


def write_agent_config(agent_config_file: str, agent_config: dict) -> None:
"""Write the agent config to the file."""
with open(agent_config_file, "w") as f:
yaml.dump(agent_config, f)


def read_yaml_config(config_file: str) -> dict:
"""Read the yaml config from the file."""
if not os.path.exists(config_file):
raise FileNotFoundError(f"The config file '{config_file}' does not exist.")
with open(config_file, "r") as f:
return yaml.load(f, Loader=yaml.FullLoader)
73 changes: 52 additions & 21 deletions agent/agents.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,51 @@
import sys
import os
from abc import ABC, abstractmethod
from pathlib import Path
import logging

from aider.coders import Coder
from aider.models import Model
from aider.io import InputOutput
from tenacity import retry, wait_exponential
import re


def handle_logging(logging_name: str, log_file: Path) -> None:
"""Handle logging for agent"""
logger = logging.getLogger(logging_name)
logger.setLevel(logging.INFO)
logger.propagate = False
logger_handler = logging.FileHandler(log_file)
logger_handler.setFormatter(
logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
)
logger.addHandler(logger_handler)


class AgentReturn(ABC):
def __init__(self, log_file: Path):
self.log_file = log_file
self.last_cost = self.get_money_cost()

def get_money_cost(self) -> float:
"""Get accumulated money cost from log file"""
last_cost = 0.0
with open(self.log_file, "r") as file:
for line in file:
if "Tokens:" in line and "Cost:" in line:
match = re.search(
r"Cost: \$\d+\.\d+ message, \$(\d+\.\d+) session", line
)
if match:
last_cost = float(match.group(1))
return last_cost


class Agents(ABC):
def __init__(self, max_iteration: int):
self.max_iteration = max_iteration

@abstractmethod
def run(self) -> None:
def run(self) -> AgentReturn:
"""Start agent"""
raise NotImplementedError

Expand All @@ -25,17 +55,14 @@ def __init__(self, max_iteration: int, model_name: str):
super().__init__(max_iteration)
self.model = Model(model_name)

@retry(
wait=wait_exponential(multiplier=1, min=4, max=10),
)
def run(
self,
message: str,
test_cmd: str,
lint_cmd: str,
fnames: list[str],
log_dir: Path,
) -> None:
) -> AgentReturn:
"""Start aider agent"""
if test_cmd:
auto_test = True
Expand All @@ -50,10 +77,6 @@ def run(
input_history_file = log_dir / ".aider.input.history"
chat_history_file = log_dir / ".aider.chat.history.md"

print(
f"check {os.path.abspath(chat_history_file)} for prompts and lm generations",
file=sys.stderr,
)
# Set up logging
log_file = log_dir / "aider.log"
logging.basicConfig(
Expand All @@ -66,15 +89,9 @@ def run(
sys.stdout = open(log_file, "a")
sys.stderr = open(log_file, "a")

# Configure httpx logging
httpx_logger = logging.getLogger("httpx")
httpx_logger.setLevel(logging.INFO)
httpx_logger.propagate = False # Prevent propagation to root logger
httpx_handler = logging.FileHandler(log_file)
httpx_handler.setFormatter(
logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
)
httpx_logger.addHandler(httpx_handler)
# Configure httpx and backoff logging
handle_logging("httpx", log_file)
handle_logging("backoff", log_file)

io = InputOutput(
yes=True,
Expand All @@ -91,14 +108,28 @@ def run(
io=io,
)
coder.max_reflection = self.max_iteration
coder.stream = False
coder.stream = True

# Run the agent
coder.run(message)

# #### TMP
# import time
# import random

# time.sleep(random.random() * 5)
# n = random.random() / 10
# with open(log_file, "a") as f:
# f.write(
# f"> Tokens: 33k sent, 1.3k received. Cost: $0.12 message, ${n} session. \n"
# )
# #### TMP

# Close redirected stdout and stderr
sys.stdout.close()
sys.stderr.close()
# Restore original stdout and stderr
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__

return AgentReturn(log_file)
Loading
Loading