Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs3 #59

Merged
merged 4 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions docs/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Commit0 provides a command-line `agent` for configuring and
running AI agents to assist with code development and testing.
In this example we use [Aider](https://aider.chat/) as the
baseline code completion agent

```bash
pip install aider
```

First we assume there is an underlying `commit0`
project that is configured. To create a new project,
run the commit0 `setup` command.

```bash
commit0 setup lite
```

Next we need to configure the backend for the agent.
Currently we only support the aider backend. Config
can also be used to pass in arguments.

```bash
export ANTHROPIC_API_KEY="..."
agent config aider
```

Finally we run the underlying agent. This will create a display
that shows the current progress of the agent.

```bash
agent run
```


### Extending
Refer to `class Agents` in `agent/agents.py`. You can design your own agent by inheriting `Agents` class and implement the `run` method.

## Notes


* Aider automatically retries certain API errors. For details, see [here](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L17).
* When increasing --max-parallel-repos, be mindful of aider's [60-second retry timeout](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L39). Set this value according to your API tier to avoid RateLimitErrors stopping processes.
* Currently, agent will skip file with more than 1500 lines. See `agent/agent_utils.py#L199` for details.
* Running a full `all` commit0 split costs approximately $100 with Claude Sonnet 3.5.
135 changes: 135 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
## Commit0

Commit0 provides several commands to facilitate the process of cloning, building, testing, and evaluating repositories. Here's an overview of the available commands:

### Setup

Use `commit0 setup [OPTIONS] REPO_SPLIT` to clone a repository split.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `repo_split` | str | Split of repositories to clone | |
| `--dataset-name` | str | Name of the Huggingface dataset | `wentingzhao/commit0_combined` |
| `--dataset-split` | str | Split of the Huggingface dataset | `test` |
| `--base-dir` | str | Base directory to clone repos to | `repos/` |
| `--commit0-dot-file-path` | str | Storing path for stateful commit0 configs | `.commit0.yaml` |

### Build

Use `commit0 build [OPTIONS]` to build the Commit0 split chosen in the Setup stage.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `--num-workers` | int | Number of workers | `8` |
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
| `--verbose` | int | Verbosity level (1 or 2) | `1` |

### Get Tests

Use `commit0 get-tests REPO_NAME` to get tests for a Commit0 repository.

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `repo_name` | str | Name of the repository to get tests for | |

### Test

Use `commit0 test [OPTIONS] REPO_OR_REPO_PATH [TEST_IDS]` to run tests on a Commit0 repository.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `repo_or_repo_path` | str | Directory of the repository to test | |
| `test_ids` | str | Test IDs to run | |
| `--branch` | str | Branch to test | |
| `--backend` | str | Backend to use for testing | `modal` |
| `--timeout` | int | Timeout for tests in seconds | `1800` |
| `--num-cpus` | int | Number of CPUs to use | `1` |
| `--reference` | bool | Test the reference commit | `False` |
| `--coverage` | bool | Get coverage information | `False` |
| `--rebuild` | bool | Rebuild an image | `False` |
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
| `--stdin` | bool | Read test names from stdin | `False` |

### Evaluate

Use `commit0 evaluate [OPTIONS]` to evaluate the Commit0 split chosen in the Setup stage.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `--branch` | str | Branch to evaluate | |
| `--backend` | str | Backend to use for evaluation | `modal` |
| `--timeout` | int | Timeout for evaluation in seconds | `1800` |
| `--num-cpus` | int | Number of CPUs to use | `1` |
| `--num-workers` | int | Number of workers to use | `8` |
| `--reference` | bool | Evaluate the reference commit | `False` |
| `--coverage` | bool | Get coverage information | `False` |
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
| `--rebuild` | bool | Rebuild images | `False` |

### Lint

Use `commit0 lint [OPTIONS] REPO_OR_REPO_DIR` to lint files in a repository.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `repo_or_repo_dir` | str | Directory of the repository to test | |
| `--files` | List[Path] | Files to lint (optional) | |
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
| `--verbose` | int | Verbosity level (1 or 2) | `1` |

### Save

Use `commit0 save [OPTIONS] OWNER BRANCH` to save the Commit0 split to GitHub.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `owner` | str | Owner of the repository | |
| `branch` | str | Branch to save | |
| `--github-token` | str | GitHub token for authentication | |
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |

## Agent

### Config

Use `agent config [OPTIONS] AGENT_NAME` to set up the configuration for an agent.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `agent_name` | str | Agent to use, we only support [aider](https://aider.chat/) for now. | `aider` |
| `--model-name` | str | LLM model to use, check [here](https://aider.chat/docs/llms.html) for all supported models. | `claude-3-5-sonnet-20240620` |
| `--use-user-prompt` | bool | Use a custom prompt instead of the default prompt. | `False` |
| `--user-prompt` | str | The prompt sent to agent. | See code for details. |
| `--run-tests` | bool | Run tests after code modifications for feedback. You need to set up `docker` or `modal` before running tests, refer to commit0 docs. | `False` |
| `--max-iteration` | int | Maximum number of agent iterations. | `3` |
| `--use-repo-info` | bool | Include the repository information. | `False` |
| `--max-repo-info-length` | int | Maximum length of the repository information to use. | `10000` |
| `--use-unit-tests-info` | bool | Include the unit tests information. | `False` |
| `--max-unit-tests-info-length` | int | Maximum length of the unit tests information to use. | `10000` |
| `--use-spec-info` | bool | Include the spec information. | `False` |
| `--max-spec-info-length` | int | Maximum length of the spec information to use. | `10000` |
| `--use-lint-info` | bool | Include the lint information. | `False` |
| `--max-lint-info-length` | int | Maximum length of the lint information to use. | `10000` |
| `--pre-commit-config-path` | str | Path to the pre-commit config file. This is needed for running `lint`. | `.pre-commit-config.yaml` |
| `--agent-config-file` | str | Path to write the agent config. | `.agent.yaml` |

### Running

Use `agent run [OPTIONS] BRANCH` to execute an agent on a specific branch.
Available options include:

| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `branch` | str | Branch to run the agent on, you can specific the name of the branch | |
| `--backend` | str | Test backend to run the agent on, ignore this option if you are not adding `run_tests` option to agent. | `modal` |
| `--log-dir` | str | Log directory to store the logs. | `logs/aider` |
| `--max-parallel-repos` | int | Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. | `1` |
| `--display-repo-progress-num` | int | Number of repo progress displayed when running. | `5` |
Binary file added docs/arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Baseline

Commit0 contains a baseline system based on
the [Aider](https://aider.chat/) code generation
system.

...
Binary file added docs/commit0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 18 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,23 @@

#

Commit-0 is a real-world AI coding challenge.
Can your agent generate a working library from commit 0?
## Overview

Commit-0 is a from scratch AI coding challenge.
Can you create a library from commit 0?

The benchmark consists of 57 core Python libraries.
Libraries are selected based on:
The challenge is to rebuild these libraries and
pass their unit tests. All libraries have:

* Significant unit-test coverage
* Significant test coverage
* Detailed specification and documentation
* Lint and type checking

The [commit0 tool](setup) allows you to:
Commit-0 is an interactive environment that makes it easy
to design and test new agents. You can:

* Efficiently run interactive tests in isolated environemnts
* Efficiently run tests in isolated environemnts
* Distribute testing and development across cloud systems
* Track and log all changes made throughout.

Expand All @@ -25,6 +29,14 @@ To install run:
pip install commit0
```

## Architecture

![](arch.png)


![](commit0.gif)

## Libraries

| | Name | Repo | Commit0 | Tests | |
|--|--------|-------|----|----|------|
Expand Down
37 changes: 37 additions & 0 deletions docs/setupdist.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,40 @@ you can commit to the branch and call with the --branch command.
```bash
commit0 test simpy tests/test_event.py::test_succeed --branch my_branch
```

## Local Mode

To run in local mode you first be sure that you have [docker tools](https://docs.docker.com/desktop/install/mac-install/)
installed. On Debian systems:

```bash
apt install docker
```

To get started, run the `setup` command with the dataset
split that you are interested in working with.
We'll start with the `lite` split.


```bash
commit0 setup lite
```

This will install a clone the code for subset of libraries to your `repos/` directory.

Next run the `build` command which will configure Docker containers for
each of the libraries with isolated virtual environments. The command uses the
[uv](https://github.com/astral-sh/uv) library for efficient builds.

```bash
commit0 build
```

The main operation you can do with these enviroments is to run tests.
Here we run [a test](https://github.com/commit-0/simpy/blob/master/tests/test_event.py#L11) in the `simpy` library.

```bash
commit0 test simpy tests/test_event.py::test_succeed
```

See [distributed setup](/setupdist) for more commands.
2 changes: 1 addition & 1 deletion docs/setuplocal.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ Here we run [a test](https://github.com/commit-0/simpy/blob/master/tests/test_ev
commit0 test simpy tests/test_event.py::test_succeed
```

See [distributed setup](setupdist) for more commands.
See [distributed setup](/setupdist) for more commands.
Loading
Loading