CodeGuard+: Constrained Decoding for Secure Code Generation

This is the repositiory for the paper "Constrained Decoding for Secure Code Generation".

TL; DR

There is a disconnection between benchmarks for Code LLMs that evaluate the security and those that assess correctness. Existing benchmarks, like HumanEval and MBPP only evaluate the correctness, while others like Copilot dataset and SecurityEval only target on the security. To bridge this gap, we present CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Currently, CodeGuard+ supports Python and C/C++, with 91 prompts covering 34 CWEs.

Directory Structure

The directory structure of this repository is as follows:

.
|-- data                       # All prompts.
    |-- base                   # Basic prompts
    |-- perturbed              # Perturbed prompts (pending)
|-- inference                  # Code for inference
|-- unit_test                  # Unit tests for each prompt
    |-- CWE
        |-- prompt
            |-- functional.py  # Individual unit test
|-- requirements.txt           # Python packages needed by prompts and unit tests

Benchmark

Our benchmark CodeGuard+ is adapted from Copilot Dataset, SecurityEval and CodeQL official repository. It now includes 91 prompts covering 34 CWEs, along with corresponding unit tests and CodeQL queries. You can find prompts and CodeQL queries in data and unit_tests in unit_test.

Preparation

Install dependencies

pip install -r requirements.txt

Install CodeQL

bash setup_codeql.sh

Install SonarQube

Verify docker is installed;
Run bash setup_sonar.sh;
Once the sonar server process starts, access the sonar sever gui by opening http://{IP of your server}:9000, and login to the server (the default username and password are admin);
Click the A near the top right corner of the webpage, and then click My Account;
Click on security and use the interface to create a new token of type User Token;
Modify sonar_eval.py with token you just generated and your desired scan path.

Inference and Evaluation

To run inference and evaluation, please run the script eval.sh using the following command template:

bash eval.sh {model_dir} {output_name} {eval_type} {decoding_method}

One example of running inference and evaluation for Llama3-8B is:

bash eval.sh meta-llama/Meta-Llama-3-8B llama3-nucleus base nucleus

The default settings for inference is to use Nucleus Sampling with temperature=0.4 and top_p=0.95 to generate 10 programs. Please check generate.py, codeql_eval.py, sonar_eval.py and correctness_eval.py for more details about arguments, customized inference and customized evaluation.

Work in Progress

This repository is still under construction, thank you for your patience!

Citation

@article{fu2024constrained,
      title={Constrained Decoding for Secure Code Generation}, 
      author={Yanjun Fu and Ethan Baker and Yu Ding and Yizheng Chen},
      year={2024},
      journal={arXiv preprint arXiv:2405.00218}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeGuard+: Constrained Decoding for Secure Code Generation

TL; DR

Directory Structure

Benchmark

Preparation

Install dependencies

Install CodeQL

Install SonarQube

Inference and Evaluation

Work in Progress

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data/base		data/base
inference		inference
unit_test		unit_test
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
codeql_eval.py		codeql_eval.py
correctness_eval.py		correctness_eval.py
count.py		count.py
eval.sh		eval.sh
generate.py		generate.py
new_stats.py		new_stats.py
requirements.txt		requirements.txt
setup_codeql.sh		setup_codeql.sh
setup_sonar.sh		setup_sonar.sh
sonar_eval.py		sonar_eval.py
sonar_scan.py		sonar_scan.py

License

Dynamite321/CodeGuardPlus

Folders and files

Latest commit

History

Repository files navigation

CodeGuard+: Constrained Decoding for Secure Code Generation

TL; DR

Directory Structure

Benchmark

Preparation

Install dependencies

Install CodeQL

Install SonarQube

Inference and Evaluation

Work in Progress

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages