Skip to content

Latest commit

 

History

History
45 lines (40 loc) · 2.03 KB

README.md

File metadata and controls

45 lines (40 loc) · 2.03 KB

Trajectories

The trajectories/ folder is the default location that experiment results (invocations of run.py) will be written to.

At a high level, the experiments folder is organized in the following manner:

trajectories
├── <user 1> 👩‍💻
│ ├── <experiment 1> 🧪
│ │ ├── all_preds.jsonl
│ │ ├── args.yaml
│ │ ├── *.html (Webpage Files)
│ │ └── *.traj (Trajectories)
│ └── <experiment 2> 🧪
│   ├── all_preds.jsonl
│   ├── args.yaml
│   ├── *.html (Webpage Files)
│   └── *.traj (Trajectories)
├── <user 2> 👨‍💻
│ ├── <experiment 1> 🧪
│ │ └── ...
│ └── <experiment 2> 🧪
│   └── ...
...

Where every experiment follows the pattern trajectories/<user name>/<experiment name>. The <user name> is automatically inferred from your system, and the experiment name is inferred from the arguments of the run.py.

How an Experiment Folder is Generated

Each call to run.py produces a single trajectories/<user name>/<experiment name> folder containing the following assets:

  • all_preds.jsonl: A single file containing all of the predictions generated for the experiment (1 prediction per task instance), where each line is formatted as:
{
    "instance_id": "<Unique task instance ID>",
    "model_patch": "<.patch file content string>",
    "model_name_or_path": "<Model name here (Inferred from experiment configs)>",
}
  • args.yaml: A summary of the configurations for the experiment run.
  • <instance_id>.traj: A .json formatted file containing the (thought, action, observation) turns generated by SWE-agent towards solving <instance_id>.
  • <instance_id>.html: An .html single webpage render of the trajectory, which can be directly opened in the browser for easier viewing of the trajectory.

⚠️ Notes

  • Evaluation is not completed by run.py, it is a separate step.
  • all_preds.jsonl can be referenced directly into evaluation/run_eval.sh to run evaluation.