Supplementary Materials for "DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction"
python t5lineDP_main.py --tokenizer_name=D:\model\Salesforce\codet5-base --model_name_or_path=D:\model\Salesforce\codet5-base --do_train --block_size 96 --data_dir=D:\data_sci\line-level-defect-prediction\Dataset
python t5lineDP_main.py --tokenizer_name=/home/hickey/pretrained/Salesforce/codet5-base --model_name_or_path=/home/hickey/pretrained/Salesforce/codet5-base --do_train --block_size 96 --data_dir=/home/hickey/data/line-level-defect-prediction/Dataset --output_dir=/home/hickey/python-workspace/DeepLineDP/output
The datasets are obtained from Wattanakriengkrai et. al. The datasets contain 32 software releases across 9 software projects. The datasets that we used in our experiment can be found in this github.
The file-level datasets (in the File-level directory) contain the following columns
File
: A file name of source codeBug
: A label indicating whether source code is clean or defectiveSRC
: A content in source code file
The line-level datasets (in the Line-level directory) contain the following columns
File
: A file name of source codeCommit
: A commit id of bug-fixing commit of the fileLine_number
: A line number where source code is modifiedSRC
: An actual source code that is modified
For each software project, we use the oldest release to train DeepLineDP models. The subsequent release is used as validation sets. The other releases are used as test sets.
For example, there are 5 releases in ActiveMQ (e.g., R1, R2, R3, R4, R5), R1 is used as training set, R2 is used as validation set, and R3 - R5 are used as test sets.
Our repository contains the following directory
output
: This directory contains the following sub-directories:loss
: This directory stores training and validation lossmodel
: This directory stores trained modelsprediction
: This directory stores prediction (in CSV files) obtained from the trained modelsWord2Vec_model
: This directory stores word2vec models of each software project
script
: This directory contains the following directories and files:preprocess_data.py
: The source code used to preprocess datasets for file-level model training and evaluationexport_data_for_line_level_baseline.py
: The source code used to prepare data for line-level baselinemy_util.py
: The source code used to store utility functionstrain_word2vec.py
: The source code used to train word2vec modelsDeepLineDP_model.py
: The source code that stores DeepLineDP architecturetrain_model.py
: The source code used to train DeepLineDP modelsgenerate_prediction.py
: The source code used to generate prediction (for RQ1-RQ3)generate_prediction_cross_projects.py
: The source code used to generate prediction (for RQ4)get_evaluation_result.R
: The source code used to generate figures for RQ1-RQ3, and show RQ4 resultfile-level-baseline
: The directory that stores implementation of the file-level baselines, andbaseline_util.py
that stores utility function of the baselinesline-level-baseline
: The directory that stores implementation of the line-level baselines
-
clone the github repository by using the following command:
git clone https://github.com/awsm-research/DeepLineDP.git
-
download the dataset from this github and keep it in
./datasets/original/
-
use the following command to install required libraries in conda environment
conda env create -f requirements.yml conda activate DeepLineDP_env
-
install PyTorch library by following the instruction from this link (the installation instruction may vary based on OS and CUDA version)
Download the following package: tidyverse
, gridExtra
, ModelMetrics
, caret
, reshape2
, pROC
, effsize
, ScottKnottESD
We use the following hyper-parameters to train our DeepLineDP model
batch_size
= 32num_epochs
= 10embed_dim (word embedding size)
= 50word_gru_hidden_dim
= 64sent_gru_hidden_dim
= 64word_gru_num_layers
= 1sent_gru_num_layers
= 1dropout
= 0.2lr (learning rate)
= 0.001
-
run the command to prepare data for file-level model training. The output will be stored in
./datasets/preprocessed_data
python preprocess_data.py
-
run the command to prepare data for line-level baseline. The output will be stored in
./datasets/ErrorProne_data/
(for ErrorProne), and./datasets/n_gram_data/
(for n-gram)python export_data_for_line_level_baseline.py
To train Word2Vec models, run the following command:
python train_word2vec.py <DATASET_NAME>
Where <DATASET_NAME> is one of the following: activemq
, camel
, derby
, groovy
, hbase
, hive
, jruby
, lucene
, wicket
To train DeepLineDP models, run the following command:
python train_model.py -dataset <DATASET_NAME>
The trained models will be saved in ./output/model/DeepLineDP/<DATASET_NAME>/
, and the loss will be saved in ../output/loss/DeepLineDP/<DATASET_NAME>-loss_record.csv
To make a prediction of each software release, run the following command:
python generate_prediction.py -dataset <DATASET_NAME>
The generated output is a csv file which contains the following information:
project
: A software project, as specified by <DATASET_NAME>train
: A software release that is used to train DeepLineDP modelstest
: A software release that is used to make a predictionfilename
: A file name of source codefile-level-ground-truth
: A label indicating whether source code is clean or defectiveprediction-prob
: A probability of being a defective fileprediction-label
: A prediction indicating whether source code is clean or defectiveline-number
: A line number of a source code fileline-level-ground-truth
: A label indicating whether the line is modifiedis-comment-line
: A flag indicating whether the line is commenttoken
: A token in a code linetoken-attention-score
: An attention score of a token
The generated output is stored in ./output/prediction/DeepLineDP/within-release/
To make a prediction across software project, run the following command:
python generate_prediction_cross_projects.py -dataset <DATASET_NAME>
The generated output is a csv file which has the same information as above, and is stored in ./output/prediction/DeepLineDP/cross-project/
There are 4 baselines in the experiment (i.e., Bi-LSTM
, CNN
, DBN
and BoW
). To train the file-level baselines, go to ./script/file-level-baseline/
then run the following commands
python Bi-LSTM-baseline.py -data <DATASET_NAME> -train
python CNN-baseline.py -data <DATASET_NAME> -train
python DBN-baseline.py -data <DATASET_NAME> -train
python BoW-baseline.py -data <DATASET_NAME> -train
The trained models will be saved in ./output/model/<BASELINE>/<DATASET_NAME>/
, and the loss will be saved in ../output/loss/<BASELINE>/<DATASET_NAME>-loss_record.csv
where <BASELINE> is one of the following: Bi-LSTM
, CNN
, DBN
or BoW
.
To make a prediction, run the following command:
python Bi-LSTM-baseline.py -data <DATASET_NAME> -predict -target_epochs 6
python CNN-baseline.py -data <DATASET_NAME> -predict -target_epochs 6
python DBN-baseline.py -data <DATASET_NAME> -predict
python BoW-baseline.py -data <DATASET_NAME> -predict
The generated output is a csv file which contains the following information:
project
: A software project, as specified by <DATASET_NAME>train
: A software release that is used to train DeepLineDP modelstest
: A software release that is used to make a predictionfilename
: A file name of source codefile-level-ground-truth
: A label indicating whether source code is clean or defectiveprediction-prob
: A probability of being a defective fileprediction-label
: A prediction indicating whether source code is clean or defective
The generated output is stored in ./output/prediction/<BASELINE>/
There are 2 baselines in this experiment (i.e., N-gram
and ErrorProne
).
To obtain the result from N-gram
, go to /script/line-level-baseline/ngram/
and run code in n_gram.java
. The result will be stored in /n_gram_result/
directory. After all results are obtained, copy the /n_gram_result/
directory to the /output/
directory.
To obtain the result from ErrorProne
, go to /script/line-level-baseline/ErrorProne/
and run code in run_ErrorProne.ipynb
. The result will be stored in /ErrorProne_result/
directory. After all results are obtained, copy the /ErrorProne_result/
directory to the /output/
directory.
Run get_evaluation_result.R
to get the result of RQ1-RQ4 (may run in IDE or by the following command)
Rscript get_evaluation_result.R
The results are figures that are stored in ./output/figures/