Name	Name	Last commit message	Last commit date
parent directory ..
evaluator	evaluator
inference	inference
README.md	README.md
requirement.txt	requirement.txt

Code Repair

Data

The code review dataset is located in data/code_repair_data.jsonl. The fields of the data are explained below:

Field	Explanation
`id`	The local ID in the task
`src_uid`	Unique identifier of the problem
`description`	The original problem description in natural language
`input_specification`	Description of the form of input data
`output_specification`	Description of the form of output data
`sample_inputs`	Sample inputs
`sample_outputs`	Sample outputs
`notes`	Additional note for the problem
`source_code`	Buggy code submitted by human
`execute_outcome`	The execute outcome of the buggy code
`lang_cluster`	The programming language that buggy code used
`lang`	The specific programming language version of buggy code
`difficulty`	Difficulty of the problem
`human_solution`	Accepted human solution
`testcases`	List of testcases of the problem

Dependence (same as program synthesis and code translation)

cd code_repair
install python>=3.9 (we use python==3.9)
install pytorch (we use pytorch==2.1.1) based on your cuda version
pip install -r requirement.txt

Executor Dependence

Perl Dependence

conda install -c conda-forge perl

Validate the correctness of installation:

perl -v
touch myscript.pl
perl myscript.pl

Programs written in D, and Delphi that need to run on Windows require the following dependencies to be installed:

D Dependencies:

Download dmd 2.105.0 for windows and unzip it to a suitable location. Replace d_path in run.py

Delphi Dependencies:

Download delphi 7 and install it to a suitable location. Replace delphi_path in run.py

Programs written in other languages need to be run using the ExecEval (under the project root directory), and the following dependencies need to be installed:

ExecEval Dependencies:

Install docker-ce
cd ExecEval
docker build . -t exec-eval:1.0

Inference

Run the inference scripts to get the inference results of the targeted LLMs. The inference results code_repair_result_{model_name}.jsonl will be saved under the inference/results folder. The inference logs code_repair_log_{model_name}.log will be saved under the inference/logs folder.

Closed-sourced LLMs

We provide the following closed-sourced LLMs inference scripts for you:

Model Name	Model Version	Script Name
PaLM 2	text-bison-001	run_palm2.py
GPT-4	gpt-4-0613	run_gpt.py
GPT-3.5	gpt-3.5-turbo-0613	run_gpt.py

For PaLM 2, you can run the following command by replacing google_api_key with your own Google API key.

python run_palm.py 
    --api_key your_palm_api_key
    --data_load_name code_repair_data.jsonl
    --candidate_num 1
    --result_save_name code_repair_run_palm.jsonl
    --log_file_name code_repair_run_palm.log

For GPT-4 and GPT-3.5, you can run the following command by replacing openai_api_key with your own OpenAI API key, model_version with specific model version.

python run_gpt.py
    --api_key your_openai_apikey
    --model model_specific_version
    --data_load_name code_repair_data.jsonl
    --candidate_num 1
    --result_save_name code_repair_run_{model_name}.jsonl
    --log_file_name code_repair_run_{model_name}.log

Open-sourced LLMs

We provide the following open-sourced LLMs inference scripts for you:

Model Name	Model Checkpoint	Script Name
Code LLaMA	codellama/CodeLlama-34b-Instruct-hf	run_codellama.py
LLaMA 2	meta-llama/Llama-2-70b-chat-hf	run_llama2.py
StarCoder	HuggingFaceH4/starchat-beta	run_starcoder.py
Vicuna	lmsys/vicuna-13b-v1.5-16k	run_vicuna.py
WizardCoder	WizardLM/WizardCoder-15B-V1.0	run_wizardcoder.py

For HuggingFace models, you can run the following command by replacing huggingface_access_token with your own HuggingFace access token, cache_dir with path to a directory in which a downloaded pretrained model and tokenizer should be cached, model_checkpoint with specific model checkpoint.

python run_{model_name}.py 
    --access_token access_token
    --cache_dir cache_dir 
    --checkpoint your_model_ckpt
    --data_load_name code_repair_data.jsonl
    --candidate_num 1
    --result_save_name code_repair_run_{model_name}.jsonl
    --log_file_name code_repair_run_{model_name}.log

Evaluator (executor & scorer)

The code ready for testing should be stored line by line in your_codes.jsonl and the file should be placed in your_codes_dir. A typical code record is shown below and should contain at least the following keys:

{
    "lang_cluster": "{model_name}",
    "lang": "{model_name}",
    "source_code": "{model_name}",
    "src_uid": "{model_name}",
    "difficulty": 800,
    "testcases": "[{'input': 'input1', 'output': ['output1']}, {'input': 'input2', 'output': ['output2']}]"
}

For all programming languages except Perl, D, and Delphi, example of most typical usage:

docker run -it -p x:y -e NUM_WORKERS=n exec-eval:1.0. This will expose port y (default 5000) as http://localhost:y on the local machine whereas port x is used within the docker container which can be set by environment variable GUNICORN_PORT. It is recommended to not use all cpus, as if cpu goes into 100% load it might affect execution speed of the codes uncontrollably, and keeping some cpus free for evaluation script. A valid example assuming less cpus available: docker run -it -p 5000:5000 -e NUM_WORKERS=5 exec-eval:1.0
python run_execeval.py --codes_dir your_codes_dir --results_dir your_results_dir --code_filename your_codes.jsonl

The results of the run are output to your_results_dir, forming a jsonl file, which compares the input jsonl, with each new entry adding the results of each test case run, stored in the testcases

For Perl, D, and Delphi, example of most typical usage:

python run.py --code_path your_codes_{program_language}.jsonl --output_path result/results.json --cmd_path your_cmd_path

Please change the --code_path with perl/d/delphi code files. The execute results are saved to --output_path, which records the results of accepted, wrong, and error for each key, and each output records the possible error outputs and the type of error.

After the execution, we provide a scorer script to count the number of correct solutions around different languages and difficulties.

Please put all your executed results into --result_dir, include d/perl/delphi and the rest. Then run following command to count the results generated by {model_name}: python score_code_repair.py --result_dir your_result_dir --model_name model_name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code_repair

code_repair

README.md

Code Repair

Data

Dependence (same as program synthesis and code translation)

Executor Dependence

Perl Dependence

D Dependencies:

Delphi Dependencies:

ExecEval Dependencies:

Inference

Closed-sourced LLMs

Open-sourced LLMs

Evaluator (executor & scorer)

Files

code_repair

Directory actions

More options

Directory actions

More options

Latest commit

History

code_repair

Folders and files

parent directory

README.md

Code Repair

Data

Dependence (same as program synthesis and code translation)

Executor Dependence

Perl Dependence

D Dependencies:

Delphi Dependencies:

ExecEval Dependencies:

Inference

Closed-sourced LLMs

Open-sourced LLMs

Evaluator (executor & scorer)