Database of "VulDeePecker: A Deep Learning-Based System for Vulnerability Detection" (NDSS'18)
Code Gadget Database (CGD) focuses on two types of vulnerabilities in C/C++ programs, buffer error vulnerability (CWE-119) and resource management error vulnerability (CWE-399). Each code gadget is composed of a number of program statements (i.e., lines of code), which are related to each other according to the data flow associated to the arguments of some library/API function calls.
Based on the National Vulnerability Database (NVD) and the NIST Software Assurance Reference Dataset (SARD) project, we collect 520 open source software program files with corresponding diff files and 8,122 test cases for the buffer error vulnerability, and 320 open source software program files with corresponding diff files and 1,729 test cases for the resource management error vulnerability.
In total, the CGD database contains 61,638 code gadgets, including 17,725 code gadgets that are vulnerable and 43,913 code gadgets that are not vulnerable. Among the 17,725 code gadgets that vulnerable, 10,440 corresponds to buffer error vulnerabilities and the rest 7,285 corresponds to resource management error vulnerabilities.
An extension of VulDeePecker: “SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities” (https://arxiv.org/abs/1807.06756)
The source code is published in https://github.com/SySeVR/SySeVR.