GitHub - awsm-research/Awesome-AI4DevSecOps: This repository offers a detailed taxonomy of existing AI-driven security solutions tailored for DevSecOps, highlighting the current research challenges and suggesting future directions for the field. It serves as a resource for researchers, developers, and security professionals interested in the intersection of AI and DevSecOps.

Awesome AI (Machine Learning / Deep Learning) For DevSecOps

Recently, the advancement of artificial intelligence (AI) has revolutionized automation in various software domains, including software security. AI-driven security approaches, particularly those leveraging machine learning or deep learning, hold promise in automating security workflows. They could reduce manual efforts, which can be integrated into DevOps to ensure uninterrupted delivery speed and align with the DevSecOps paradigm simultaneously.

We identified 12 security tasks associated with the DevSecOps process and reviewed current AI-driven security approaches. Through this analysis, we uncovered 15 challenges faced by these approaches and outlined potential opportunities for future research.

🦉 Comprehensive resources for our AI for DevSecOps survey, authored by Michael Fu, Jirat Pasuksmit, and Chakkrit Tantithamthavorn

👩‍🔧 Please let us know if you notice any mistakes or have any suggestions!

🚀 If you find this resource helpful, please consider to star this repository and cite our survey paper:

@article{fu2024ai,
  title={AI for DevSecOps: A Landscape and Future Opportunities},
  author={Fu, Michael and Pasuksmit, Jirat and Tantithamthavorn, Chakkrit},
  journal={arXiv preprint arXiv:2404.04839},
  year={2024}
}

📢 News

📌 [December-11-2024] Our paper has been accepted for publication in the ACM Transactions on Software Engineering and Methodology (TOSEM)!
📌 [August-23-2024] First revision of our AI4DevSecOps survey is completed
📌 [April-07-2024] Our AI4DevSecOps survey (v1) is available on arXiv 📝

🤝 Contributing to Awesome-AI4DevSecOps

We welcome contributions from the community!

If you have a valuable resource, tool, or idea related to AI for DevSecOps, please submit your PR using the following template.

🔥 This is a great opportunity to share and promote your work, research, or projects with a wider audience!

# Description: [Briefly describe your contribution, its purpose, and relevance.]

# Type of Contribution
- [ ] Research Paper
- [ ] Dataset
- [ ] Tool/Library
- [ ] Tutorial/Guide
- [ ] Other (please specify):

# Paper/Resource Link: [Provide the link here]

We will review and merge your contribution if it is appropriate and relevant to our project.

Thank you for helping us improve Awesome-AI4DevSecOps!

Paper Collection

Current Landscape of AI-Driven Security Appoaches in DevSecOps (Section 4 in our paper)

Plan

Threat Modeling
- No Relevant Publications Identified Using Our Defined Search Strategy
Impact Analysis
- No Relevant Publications Identified Using Our Defined Search Strategy

Development

Software Vulnerability Detection (SVD)
- Recurrent Neural Network (RNN)
  - Automatic feature learning for predicting vulnerable software components (TSE, 2018) 📝
  - Automated vulnerability detection in source code using deep representation learning (ICMLA, 2018) 📝
  - Vuldeepecker: A deep learning-based system for vulnerability detection (NDSS, 2018) 📝
  - Vuldeelocator: a deep learning-based fine-grained vulnerability detector (TDSC, 2021) 📝
  - VUDENC: vulnerability detection with deep learning on a natural codebase for Python (IST, 2022) 📝
- Text Convolutional Neural Network (TextCNN)
  - A software vulnerability detection method based on deep learning with complex network analysis and subgraph partition (IST, 2023) 📝
- Graph Neural Network (GNN)
  - Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks (NeurIPS, 2019) 📝
  - Bgnn4vd: Constructing bidirectional graph neural-network for vulnerability detection (IST, 2021) 📝
  - Deep learning based vulnerability detection: Are we there yet (TSE, 2021) 📝
  - Vulnerability detection with fine-grained interpretations (FSE, 2021) 📝
  - LineVD: Statement-level vulnerability detection using graph neural networks (MSR, 2022) 📝
  - mVulPreter: A Multi-Granularity Vulnerability Detection System With Interpretations (TDSC, 2022) 📝
  - VulChecker: Graph-based Vulnerability Localization in Source Code (USENIX, 2022) 📝
  - CPVD: Cross Project Vulnerability Detection Based On Graph Attention Network And Domain Adaptation (TSE, 2023) 📝
  - DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection (ICSE, 2023) 📝
  - Learning Program Semantics for Vulnerability Detection via Vulnerability-Specific Inter-procedural Slicing (FSE, 2023) 📝
  - SedSVD: Statement-level software vulnerability detection based on Relational Graph Convolutional Network with subgraph embedding (IST, 2023) 📝
- Node2Vec
  - Enhancing Deep Learning-based Vulnerability Detection by Building Behavior Graph Model (ICSE, 2023) 📝
- Pre-trained Code Language Model (CLM) (Transformers)
  - Linevul: A transformer-based line-level vulnerability prediction (MSR, 2022) 📝
  - Vulnerability Detection by Learning from Syntax-Based Execution Paths of Code (TSE, 2023) 📝
- LM + GNN
  - VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements (SANER, 2022) 📝
  - Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection (ICSE, 2023) 📝

Benchmarks used in evaluating AI-driven software vulnerability detection

Benchmark	Year	Granularity	Programming Language	Real-World	Synthesis
Firefox	2013	File	C, C++	✔
Android	2014	File	Java	✔
Draper	2018	Function	C, C++	✔	✔
Vuldeepecker	2018	Code Gadget	C, C++	✔	✔
Du et al.	2019	Function	C, C++	✔
Devign	2019	Function	C, C++	✔
FUNDED	2020	Function	C, Java, Swift, PHP	✔	✔
Big-Vul	2020	Function/Line	C, C++	✔
Reveal	2021	Function	C, C++	✔
Cao et al.	2021	Function	C, C++	✔
D2A	2021	Function	C, C++	✔
Deepwukong	2021	Function	C, C++	✔	✔
Vuldeelocator	2021	Line	C, C++	✔	✔
VulCNN	2022	Function	C, C++	✔	✔
VUDENC	2022	Token	Python	✔
DeepVD	2023	Function	C, C++	✔
VulChecker	2023	Instruction	C, C++	✔

Software Vulnerability Classification (SVC)
- Machine Learning (ML)
  - Automation of vulnerability classification from its description using machine learning (ISCC, 2020) 📝
  - A machine learning approach to classify security patches into vulnerability types (CNS, 2020) 📝
- RNN
  - Vuldeepecker: A deep learning-based system for vulnerability detection (NDSS, 2018) 📝
  - μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection (TDSC, 2019) 📝
- Text Recurrent Convolutional Neural Network (TextRCNN)
  - DeKeDVer: A deep learning-based multi-type software vulnerability classification framework using vulnerability description and source code (IST, 2023) 📝
- Vanilla Transformer
  - Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach (CSE, 2021) 📝
- Pre-trained Language Model (LM) (Transformers)
  - V2w-bert: A framework for effective hierarchical multiclass classification of software vulnerabilities (DSAA, 2021) 📝
  - Prediction of Vulnerability Characteristics Based on Vulnerability Description and Prompt Learning (SANER, 2023) 📝
- CLM
  - VulExplainer: A Transformer-based Hierarchical Distillation for Explaining Vulnerability Types (TSE, 2023) 📝
  - AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities (EMSE, 2023) 📝
- CLM + RNN
  - Fine-grained commit-level vulnerability type prediction by CWE tree structure (ICSE, 2023) 📝

Benchmarks used in evaluating AI-driven software vulnerability classification

Benchmark	Year	Granularity	Programming Language	Real-World	Synthesis
μVulDeePecker	2019	Code Gadget	C, C++	✔	✔
TreeVul	2023	Commit	C, C++, Java, and Python	✔

Automated Vulnerability Repair (AVR)
- ML
  - Sqlifix: Learning based approach to fix sql injection vulnerabilities in source code (SANER, 2021) 📝
- CNN
  - Coconut: combining context-aware neural translation models using ensemble for program repair (ISSTA, 2020) 📝
- RNN
  - Sequencer: Sequence-to-sequence learning for end-to-end program repair (TSE, 2019) 📝
  - A controlled experiment of different code representations for learning-based program repair (EMSE, 2022) 📝
- Tree-based RNN
  - Dlfix: Context-based code transformation learning for automated program repair (ICSE, 2020) 📝
- GNN
  - Hoppity: Learning graph transformations to detect and fix bugs in programs (ICLR, 2020) 📝
- Vanilla Transformer
  - A syntax-guided edit decoder for neural program repair (FSE, 2021) 📝
  - Neural transfer learning for repairing security vulnerabilities in c code (TSE, 2022) 📝
  - Seqtrans: automatic vulnerability fix via sequence to sequence learning (TSE, 2022) 📝
  - Tare: Type-aware neural program repair (ICSE, 2023) 📝
- CLM
  - Cure: Code-aware neural machine translation for automatic program repair (ICSE, 2021) 📝
  - Applying codebert for automated program repair of java simple bugs (MSR, 2021) 📝
  - Tfix: Learning to fix coding errors with a text-to-text transformer (PMLR, 2021) 📝
  - VulRepair: a T5-based automated software vulnerability repair (FSE, 2022) 📝
  - Improving automated program repair with domain adaptation (TOSEM, 2022) 📝
  - Vision Transformer-Inspired Automated Vulnerability Repair (TOSEM, 2023) 📝
  - Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework (ICSME, 2023) 📝
  - Pre-trained model-based automated software vulnerability repair: How far are we? (TDSC, 2023) 📝
  - Examining zero-shot vulnerability repair with large language models (SP, 2023) 📝
  - Inferfix: End-to-end program repair with llms (FSE, 2023) 📝
  - Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning (ASE, 2023) 📝

Benchmarks used in evaluating AI-driven just-in-time (JIT) automated program/vulnerability repair

Benchmark	Year	Programming Language	Real-World	Synthesis
Defects4J	2014	Java	✔
ManyBugs	2015	C	✔
BugAID	2016	JavaScript	✔
QuixBugs	2017	Java, Python	✔
CodeFlaws	2017	C	✔
Bugs.jar	2018	Java	✔
SequenceR	2019	Java	✔
Bugs2Fix	2019	Java	✔
ManySStuBs4J	2020	Java	✔
Hoppity	2020	JavaScript	✔
CodeXGLUE	2021	Java	✔	✔
TFix	2021	JavaScript	✔
VRepair	2022	C, C++	✔
Namavar et al.	2022	JavaScript	✔
Pearce et al.	2023	C, C++	✔	✔
Function-SStuBs4J	2023	Java	✔
InferFix	2023	Java, C#	✔

Security Tools in IDEs
- LM-based Security Tool
  - AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities (EMSE, 2023) 📝

Code Commit

Dependency Management
- No Relevant Publications Identified Using Our Defined Search Strategy
CI/CD Secure Pipelines
- ML
  - Improving missing issue-commit link recovery using positive and unlabeled data (ASE, 2017) 📝
  - MULTI: Multi-objective effort-aware just-in-time software defect prediction (IST, 2018) 📝
  - Class imbalance evolution and verification latency in just-in-time software defect prediction (ICSE, 2019) 📝
  - Fine-grained just-in-time defect prediction (JSS, 2019) 📝
  - Effort-aware semi-supervised just-in-time defect prediction (IST, 2020) 📝
  - Just-in-time defect identification and localization: A two-phase framework (TSE, 2020) 📝
  - Adapting bug prediction models to predict reverted commits at Wayfair (FSE, 2020) 📝
  - JITLine: A simpler, better, faster, finer-grained just-in-time defect prediction (MSR, 2021) 📝
  - Enhancing just-in-time defect prediction using change request-based metrics (SANER, 2021) 📝
- Explainable AI (XAI) For ML
  - Pyexplainer: Explaining the predictions of just-in-time defect models (ASE, 2021) 📝
- RNN
  - DeepLink: Recovering issue-commit links based on deep learning (JSS, 2019) 📝
  - Deeplinedp: Towards a deep learning approach for line-level defect prediction (TSE, 2022) 📝
- Tree-based RNN
  - Lessons learned from using a deep tree-based model for software defect prediction in practice (MSR, 2019) 📝
- Vanilla Transformer
  - Deep just-in-time defect localization (TSE, 2021) 📝
- LM
  - BTLink: automatic link recovery between issues and commits based on pre-trained BERT model (EMSE, 2023) 📝
- CLM
  - EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery (ASE, 2023) 📝
- ML-based Just-In-Time (JIT) Software Defect Prediction (SDP) Tool
  - JITBot: an explainable just-in-time defect prediction bot (ASE, 2020) 📝
  - JITO: a tool for just-in-time defect identification and localization (FSE, 2020) 📝
- ML-based Change Analysis Tool
  - Rex: Preventing bugs and misconfiguration in large services using correlated change analysis (USENIX, 2020) 📝

Benchmarks used in evaluating AI-driven just-in-time (JIT) software defect prediction

Benchmark	Year	Granularity	Programming Language	Real-World
PROMISE	2007	Commit	Java	✔
Kamei et al.	2012	Commit	C, C++, Java, JavaScript, Perl	✔
Qt & OpenStack	2018	Commit/Line	C++, Python	✔
Cabral et al.	2019	Commit/File	Java, JavaScript, Python	✔
Yan et al.	2020	Commit/File	Java	✔
Wattanakriengkrai et al.	2020	Commit	Java	✔
Suh	2020	Commit/File	JavaScript, PHP	✔

Build, Test, and Deployment

Configuration Validation
- ML
  - Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model (JSS, 2021) 📝
  - KGSecConfig: A Knowledge Graph Based Approach for Secured Container Orchestrator Configuration (SANER, 2022) 📝
  - CoMSA: A Modeling-Driven Sampling Approach for Configuration Performance Testing (ASE, 2023) 📝
- Feed-Forward Neural Network (FFNN)
  - DeepPerf: Performance prediction for configurable software with deep sparse neural network (ICSE, 2019) 📝
- Generative Adversarial Network (GAN)
  - ACTGAN: automatic configuration tuning for software systems with generative adversarial networks (ASE, 2019) 📝
  - Perf-AL: Performance prediction for configurable software through adversarial learning (ESEM, 2020) 📝
Infrastructure Scanning
- ML
  - Characterizing defective configuration scripts used for continuous deployment (ICST, 2018) 📝
  - Source code properties of defective infrastructure as code scripts (IST, 2019) 📝
  - Within-project defect prediction of infrastructure-as-code using product and process metrics (TSE, 2021) 📝
- Word2Vec-CBOW (Continuous Bag of Words)
  - FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code (EMSE, 2022) 📝

Benchmarks used in evaluating AI-driven infrastructure as code

Benchmark	Year	Real-World	Synthesis
Rahman and Williams	2018	✔
Rahman and Williams	2019	✔
Dalla et al.	2021	✔
Borovits et al.	2022		✔

Operation & Monitoring

Log Analysis & Anomaly Detection
- ML
  - An anomaly detection system based on variable N-gram features and one-class SVM (IST, 2017) 📝
  - Anomaly detection and diagnosis for cloud services: Practical experiments and lessons learned (JSS, 2018) 📝
  - Adaptive performance anomaly detection in distributed systems using online svms (TDSC, 2018) 📝
  - Log-based anomaly detection with robust feature extraction and online learning (TIFS, 2021) 📝
  - Try with Simpler--An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection (TOSEM, 2023) 📝
  - On the effectiveness of log representation for log-based anomaly detection (EMSE, 2023) 📝
- RNN
  - Deeplog: Anomaly detection and diagnosis from system logs through deep learning (CCS, 2017) 📝
  - Robust log-based anomaly detection on unstable log data (FSE, 2019) 📝
  - Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs (IJCAI, 2019) 📝
  - Anomaly detection in operating system logs with deep learning-based sentiment analysis (TDSC, 2020) 📝
  - SwissLog: Robust anomaly detection and localization for interleaved unstructured logs (TDSC, 2022) 📝
  - DeepSyslog: Deep Anomaly Detection on Syslog Using Sentence Embedding and Metadata (TIFS, 2022) 📝
  - LogOnline: A Semi-Supervised Log-Based Anomaly Detector Aided with Online Learning Mechanism (ASE, 2023) 📝
  - On the effectiveness of log representation for log-based anomaly detection (EMSE, 2023) 📝
- RNN-based AutoEncoder (AE)
  - Lifelong anomaly detection through unlearning (CCS, 2019) 📝
  - Recompose event sequences vs. predict next events: A novel anomaly detection approach for discrete event logs (CCS, 2021) 📝
- GNN
  - LogGraph: Log Event Graph Learning Aided Robust Fine-Grained Anomaly Diagnosis (TDSC, 2023) 📝
- Vanilla Transformer
  - Log-based anomaly detection without log parsing (ASE, 2021) 📝
- XAI For Deep Learning (DL)
  - Deepaid: Interpreting and improving deep learning-based anomaly detection in security applications (CCS, 2021) 📝
  - Towards an interpretable autoencoder: A decision-tree-based autoencoder and its application in anomaly detection (TDSC, 2022) 📝
- Conditional Diffusion Model
  - Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion (ASE, 2023) 📝

Benchmarks used in evaluating AI-driven log analysis and anomaly detection

Benchmark	Year	Real-World	Synthesis
Yahoo! Webscope	2006	✔	✔
BGL	2007	✔
HDFS	2009	✔
ADFA-LD	2013		✔
SDS	2015		✔
UNSW-NB15	2015		✔
OpenStack	2017	✔
Microsoft	2019	✔
LogHub	2020	✔
Studiawan et al.	2020	✔
Yang et al.	2023	✔

Cyber-Physical Systems
- ML
  - TABOR: A graphical model-based approach for anomaly detection in industrial control systems (CCS, 2018) 📝
  - Adaptive-Correlation-aware Unsupervised Deep Learning for Anomaly Detection in Cyber-physical Systems (TDSC, 2023) 📝
- RNN + GNN
  - Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems (TOSEM, 2023) 📝
- GAN
  - Digital twin-based anomaly detection in cyber-physical systems (ICST, 2021) 📝
- Variational AutoEncoder (VAE)
  - From Point-wise to Group-wise: A Fast and Accurate Microservice Trace Anomaly Detection Approach (FSE, 2023) 📝
- Vanilla Transformer
  - Twin Graph-Based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System (ASE, 2023) 📝
- LM + RNN
  - KDDT: Knowledge Distillation-Empowered Digital Twin for Anomaly Detection (FSE, 2023) 📝

Benchmarks used in evaluating AI-driven log analysis and anomaly detection

Benchmark	Year	Real-World	Synthesis
Gas Pipeline Dataset	2015	✔
SWaT	2016	✔
WADI	2017	✔
BATADAL	2018		✔
MSDS	2023	✔

Identified 15 Challenges of AI-Driven Security Approach in DevSecOps (Section 5 in our paper)

DevOps Step	Identified Security Task	Themes of Challenges
Plan	Threat Modeling	-
	Software Impact Analysis	-
Development	Software Vulnerability Detection	C1-1 - Data Imbalance
		C4 - Cross Project
		C5 - MBU Vulnerabilities
		C6 - Data Quality
	Software Vulnerability Classification	C1-2 - Data Imbalance
		C7 - Incompleted CWE Tree
	Automated Vulnerability Repair	C2-1 - Model Explainability
		C8 - Sequence Length and Computing Resource
		C9 - Loss of Pre-Trained Knowledge
		C10 - Automated Repair on Real-World Scenarios
	Security Tools in IDEs	C3-1 - Lack of AI Security Tooling in IDEs
Code Commit	CI/CD Secure Pipelines	C2-2 - Model Explainability
		C3-2 - Lack of AI Security Tooling in CI/CD
		C11 - The Use of RNNs
Build, Test, and Deployment	Configuration Validation	C12 - Complex Feature Space
	Infrastructure Scanning	C3-3 - Lack of AI Security Tooling for Infrastructure Scanning
		C13 - Manual Feature Engineering
Operation and Monitoring	Log Analysis and Anomaly Detection	C2-3 - Model Explainability
		C14 - Normality Drift for Zero-Positive Anomaly Detection
	Cyber-Physical Systems	C15 - Monitoring Multiple Cyber-Attacks Simultaneously

Identified 15 Research Directions of AI-Driven Security Approach in DevSecOps (Section 5 in our paper)

DevOps Step	Identified Security Task	Research Opportunity
Plan	Threat Modeling	-
	Software Impact Analysis	-
Development	Software Vulnerability Detection	R1-1 - Data augmentation and logit adjustment
		R4 - Evaluate cross-project SVD with diverse CWE-IDs
		R5 - Evaluate SVD on MBU vulnerabilities
		R6 - Address data inaccuracy from automatic data collection.
	Software Vulnerability Classification	R1-2 - Meta-learning and LLMs
		R7 - Develop advanced tree-based SVC
	Automated Vulnerability Repair	R2-1 - Evidence-based explainable AI (XAI)
		R8 - Explore transformer variants that can process longer sequences
		R9 - Explore different training paradigms during fine-tuning
		R10 - Address limitations of LLMs
	Security Tools in IDEs	R3-1 - AI tool deployment and comprehensive tool evaluation
Code Commit	CI/CD Secure Pipelines	R2-2 - Explainable AI (XAI) for DL Models
		R3-2 - AI tool deployment in CI/CD pipelines
		R11 - Explore LMs and LLMs
Build, Test, and Deployment	Configuration Validation	R12 - Explore transformers for tabular data
	Infrastructure Scanning	R3-3 - AI tool deployment and post-deployment evaluation
		R13 - Explore DL-based techniques
Operation and Monitoring	Log Analysis and Anomaly Detection	R2-3 - Explainable AI (XAI) for ML Models
		R14 - Enhance normality drift detection
	Cyber-Physical Systems	R15 - Distributed anomaly detection and multi-agent systems

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
imgs		imgs
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome AI (Machine Learning / Deep Learning) For DevSecOps

📢 News

🤝 Contributing to Awesome-AI4DevSecOps

Paper Collection