Research in identifying vulnerabilities and the commits that introduce them is ongoing. However, many current methods rely heavily on automation, which can lead to a high rate of false positives and require significant error-checking. To address this issue, we developed a tool-assisted pipeline to manually review and examine vulnerabilities and their corresponding commits. Additionally, we collected relevant metadata such as modified lines of code, and the mapping of CVE and CWE categories. This data set can be used to validate automated methods like machine learning approaches.
The complete dataset can be found here.
It is structured in an JSON file with the following fields:
Fieldname | Brief |
---|---|
cwe | Common Weakness Enumeration ID |
introducing | Commit hash that introduces the vulnerability |
intro_stats | Number of lines added/deleted in the introducing commit |
intro_lines | Lines marked as vulnerable in the introducing commit |
fixing_stats | Number of lines added/deleted in the fixing commits |
fixing_lines | Lines marked as fixing the vulnerability in the fixing commit |
days_between | Days between the identified introducing and fixing commits |
{
"cve": "CVE-2019-11274",
"cwe": "CWE-79",
"repository": "https://github.com/cloudfoundry/uaa",
"fixing": [
"a34f55fc97a81966faf21e3ae404ec24f1f31cf7"
],
"introducing": "bb8ff8f4e8969b46fdacffcd27781d223c8c7244",
"intro_stats": {
"bb8ff8f4e8969b46fdacffcd27781d223c8c7244": {
"add": 320,
"del": 7
}
},
"fixing_stats": {
"a34f55fc97a81966faf21e3ae404ec24f1f31cf7": {
"add": 68,
"del": 17
}
},
"days_between": 1836,
"fixing_lines": {
"server/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "168"
},
"introducing_lines": {
"scim/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "190"
}
},
Software | Used Version |
---|---|
Python3 | 3.10.8 |
pip3 | 22.3.1 |
git | 2.29.0 |
Webbrowser of choice | Safari 16.1 |
In order to install all required python packages please run the following command inside the review_pipeline
directory:
python3 -m pip install -R requirements.txt
The pipeline can be executed by the following command inside the review_pipeline
directory:
python3 manual_analysis_pipeline.py <path_to_input_dataset>
The input dataset is expected to be a JSON file with the following fields:
Fieldname | Brief |
---|---|
cve_id | CVE id of the vulnerability |
repository | URL to the repository |
fixing_commits | List of fixing commit SHA-1 hashes |