VulRepair Replication Package
A T5-based Automated Software Vulnerability Repair
VulRepair Performance on Top-25 Most Dangerous CWEs in 2021
Rank | CWE Type | Name | %PP | Proportion |
---|---|---|---|---|
1 | CWE-787 | Out-of-bounds Write | 30% | 16/53 |
2 | CWE-79 | Cross-site Scripting | 0 | 0/1 |
3 | CWE-125 | Out-of-bounds Read | 32% | 54/170 |
4 | CWE-20 | Improper Input Validation | 45% | 68/152 |
5 | CWE-78 | OS Command Injection | 33% | 1/3 |
6 | CWE-89 | SQL Injection | 20% | 1/5 |
7 | CWE-416 | Use After Free | 53% | 29/55 |
8 | CWE-22 | Path Traversal | 25% | 2/8 |
9 | CWE-352 | Cross-Site Request Forgery | 0 | 0/2 |
10 | CWE-434 | Dangerous File Type | - | - |
11 | CWE-306 | Missing Authentication for Critical Function | - | - |
12 | CWE-190 | Integer Overflow or Wraparound | 53% | 31/59 |
13 | CWE-502 | Deserialization of Untrusted Data | - | - |
14 | CWE-287 | Improper Authentication | 50% | 3/6 |
15 | CWE-476 | NULL Pointer Dereference | 66% | 46/70 |
16 | CWE-798 | Use of Hard-coded Credentials | - | - |
17 | CWE-119 | Improper Restriction of Operations | 37% | 141/386 |
18 | CWE-862 | Missing Authorization | 0 | 0/2 |
19 | CWE-276 | Incorrect Default Permissions | - | - |
20 | CWE-200 | Exposure of Sensitive Information | 61% | 39/64 |
21 | CWE-522 | Insufficiently Protected Credentials | 0 | 0/4 |
22 | CWE-732 | Incorrect Permission Assignment | 50% | 1/2 |
23 | CWE-611 | Improper Restriction of XML Reference | 0 | 0/3 |
24 | CWE-918 | Server-Side Request Forgery (SSRF) | 0 | 0/1 |
25 | CWE-77 | Command Injection | 100% | 2/2 |
TOTAL | 41% | 434/1048 |
Rank | CWE Type | Name | %PP | Proportion |
---|---|---|---|---|
1 | CWE-755 | Improper Handling of Exceptional Conditions | 100% | 1/1 |
2 | CWE-706 | Use of Incorrectly-Resolved Name or Reference | 100% | 1/1 |
3 | CWE-326 | Inadequate Encryption Strength | 100% | 2/2 |
4 | CWE-667 | Improper Locking | 100% | 1/1 |
5 | CWE-369 | Divide By Zero | 100% | 5/5 |
6 | CWE-77 | Command Injection | 100% | 2/2 |
7 | CWE-388 | Error Handling | 100% | 1/1 |
8 | CWE-436 | Interpretation Conflict | 100% | 1/1 |
9 | CWE-191 | Integer Underflow | 100% | 2/2 |
10 | CWE-285 | Improper Access Control | 75% | 6/8 |
TOTAL | 92% | 22/24 |
Rank | CWE Type | Name | %PP | Proportion |
---|---|---|---|---|
1 | CWE-119 | Improper Restriction of Operations | 37% | 141/386 |
2 | CWE-125 | Out-of-bounds Read | 32% | 54/170 |
3 | CWE-20 | Improper Input Validation | 45% | 68/152 |
4 | CWE-264 | Permissions, Privileges, and Access Controls | 51% | 36/71 |
5 | CWE-476 | NULL Pointer Dereference | 66% | 46/70 |
6 | CWE-200 | Exposure of Sensitive Information | 61% | 39/64 |
7 | CWE-399 | Resource Management Errors | 62% | 37/60 |
8 | CWE-190 | Integer Overflow or Wraparound | 53% | 31/59 |
9 | CWE-416 | Use After Free | 53% | 29/55 |
10 | CWE-362 | Race Condition | 43% | 23/54 |
TOTAL | 44% | 504/1141 |
The raw predictions of VulRepair can be accessed here
[FSE 2022 Technical track] [Paper #152] [7 mins talk]
VulRepair: A T5-based Automated Software Vulnerability Repair
To appear in ESEC/FSE 2022 (14-18 November, 2022).
First of all, clone this repository to your local machine and access the main dir via the following command:
git clone https://github.com/awsm-research/VulRepair.git
cd VulRepair
Then, install the python dependencies via the following command:
pip install transformers
pip install torch
pip install numpy
pip install tqdm
pip install pandas
pip install tokenizers
pip install datasets
pip install gdown
pip install tensorboard
pip install scikit-learn
Alternatively, we provide requirements.txt with version of packages specified to ensure the reproducibility, you may install via the following commands:
pip install -r requirements.txt
If having an issue with the gdown package, try the following commands:
git clone https://github.com/wkentaro/gdown.git
cd gdown
pip install .
cd ..
-
We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.
-
To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.
-
Python 3.9.7 is recommended, which has been fully tested without issues.
All of the dataset has the same number of columns (i.e., 7 cols), we focus on the following 2 columns to conduct our experiments:
- source (str): The localized vulnerable function written in C (preprocessed by Chen et al.)
- target (str): The repair ground-truth (preprocessed by Chen et al.)
source | target |
---|---|
... | ... |
1st Qt. | Median | 3rd Qt. | Avg. | |
---|---|---|---|---|
Function Length | 138 | 280 | 593 | 586 |
Patch Length | 12 | 24 | 48 | 55 |
Cyclomatic Complexity of Functions | 3 | 8 | 19 | 23 |
Note.
-
This dataset is originally provided by Bhandari et al. and Fan et al., and it is further preprocessed by Chen et al.
For more information, please kindly refer to this repository.
-
We process cyclomatic complexity (CC) using Joern tool
Dataset with labelled CC. can be found here
Model Name | Model Specification | Related to RQ |
---|---|---|
M1 (VulRepair) | BPE Tokenizer + Pre-training (PL/NL) + T5 | RQ1, RQ2, RQ3, RQ4 |
M2 (CodeBERT) | BPE Tokenizer + Pre-training (PL/NL) + BERT | RQ1, RQ2, RQ3 |
M3 | BPE Tokenizer + No Pre-training + T5 | RQ2, RQ4 |
M4 | BPE Tokenizer + Pre-training (NL) + T5 | RQ2 |
M5 | BPE Tokenizer + No Pre-training + BERT | RQ2 |
M6 | BPE Tokenizer + Pre-training (NL) + BERT | RQ2 |
M7 | Word-level Tokenizer + Pre-training (PL/NL) + T5 | RQ3, RQ4 |
M8 | BPE Tokenizer + Vanilla XFMR | RQ3 |
M9 | Word-level Tokenizer + Pre-training (PL/NL) + BERT | RQ3 |
M10 | Word-level Tokenizer + No Pre-training + T5 | RQ4 |
- We host our VulRepair on the Model Hub provided by Huggingface Transformers which can be access here.
- All other models can be downloaded from this public Google Cloud Space.
To reproduce the results of our VulRepair (M1 model), run the following commands (Inference only):
cd M1_VulRepair_PL-NL
python vulrepair_main.py \
--output_dir=./saved_models \
--model_name=model.bin \
--tokenizer_name=MickyMike/VulRepair \
--model_name_or_path=MickyMike/VulRepair \
--do_test \
--encoder_block_size 512 \
--decoder_block_size 256 \
--num_beams=50 \
--eval_batch_size 1
Note. please adjust the "num_beams" parameters accordingly to obtain the results we present in the discussion section. (i.e., num_beams= 1, 2, 3, 4, 5, 10)
To retrain the VulRepair model from scratch, run the following commands (Training + Inference):
# training
cd M1_VulRepair_PL-NL
python vulrepair_main.py \
--model_name=model.bin \
--output_dir=./saved_models \
--tokenizer_name=Salesforce/codet5-base \
--model_name_or_path=Salesforce/codet5-base \
--do_train \
--epochs 75 \
--encoder_block_size 512 \
--decoder_block_size 256 \
--train_batch_size 4 \
--eval_batch_size 4 \
--learning_rate 2e-5 \
--max_grad_norm 1.0 \
--evaluate_during_training \
--seed 123456 2>&1 | tee train.log
# Inference
python vulrepair_main.py \
--output_dir=./saved_models \
--model_name=model.bin \
--tokenizer_name=Salesforce/codet5-base \
--model_name_or_path=Salesforce/codet5-base \
--do_test \
--encoder_block_size 512 \
--decoder_block_size 256 \
--num_beams=50 \
--eval_batch_size 1
We recommend to use GPU with 8 GB up memory for training since T5 and BERT architecture is very computing intensive.
Note. If the specified batch size is not suitable for your device, please modify --eval_batch_size and --train_batch_size to fit your GPU memory.
You need to replicate M1(VulRepair) and M2(CodeBERT) to replicate the results of RQ1:
- Click here for the instruction of replicating M1(VulRepair)
- Click here for the instruction of replicating M2(CodeBERT)
You need to replicate M1(VulRepair), M2(CodeBERT), M3, M4, M5, M6 to replicate the results of RQ2:
- Click here for the instruction of replicating M1(VulRepair)
- Click here for the instruction of replicating M2(CodeBERT)
- Click here for the instruction of replicating M3
- Click here for the instruction of replicating M4
- Click here for the instruction of replicating M5
- Click here for the instruction of replicating M6
You need to replicate M1(VulRepair), M2(CodeBERT), M7, M8, M9 to replicate the results of RQ2:
- Click here for the instruction of replicating M1(VulRepair)
- Click here for the instruction of replicating M2(CodeBERT)
- Click here for the instruction of replicating M7
- Click here for the instruction of replicating M8
- Click here for the instruction of replicating M9
You need to replicate M1(VulRepair), M3, M7, M10 to replicate the results of RQ2:
Methods | % Perfect Prediction |
---|---|
VulRepair | 44% |
CodeBERT | 31% |
VRepair | 21% |
T5 | % Perfect Prediction |
---|---|
PL/NL (VulRepair) | 44% |
No Pre-training | 30% |
NL | 6% |
BERT | % Perfect Prediction |
PL/NL (CodeBERT) | 31% |
No Pre-training | 29% |
NL | 1% |
VulRepair | % Perfect Prediction |
---|---|
Subword Tokenizer | 44% |
Word-level Tokenizer | 35% |
VRepair | % Perfect Prediction |
Subword Tokenizer | 34% |
Word-level Tokenizer | 23% |
CodeBERT | % Perfect Prediction |
Subword Tokenizer | 31% |
Word-level Tokenizer | 17% |
VulRepair | % Perfect Prediction |
---|---|
Pre-train + BPE + T5 | 44% |
Pre-train + Word-level + T5 | 35% |
No Pre-train + BPE + T5 | 30% |
No Pre-train + Word-level + T5 | 1% |
- Special thanks to authors of VRepair (Chen et al.)
- Special thanks to authors of CodeT5 (Wang et al.)
- Special thanks to dataset providers of CVEFixes (Bhandari et al.) and Big-Vul (Fan et al.)
@inproceedings{fu2022vulrepair,
title={VulRepair: A T5-based Automated Software Vulnerability Repair},
author={Fu, Michael and Tantithamthavorn, Chakkrit and Le, Trung and Nguyen, Van and Dinh, Phung},
journal={To appear in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},
year={2022}
}