A deep learning based approach to detect vulnerability in C/C++ source code.
- Pre-train a language model on a large corpus of C/C++ dataset
- Fine-tuning the model for the classification task of vulnerability detection in C/C++
- Evaluate the model on test dataset
Pre-training Dataset: C/C++ files from 100 GitHub repositories (Link)
Fine-tuning Dataset: Draper VDISC Dataset
Model: DistilBERT
Code: Project code is available on this colab notebook
Progress Tracker: Progress of the project can be tracked through this GitHub Project