Automating the Cybersecurity Triage Process: A Comparative Study on the Performance of Large Language Models

Abstract

Security analysts have the task of inspecting cybersecurity alarms to filter false positives and identify their severity: triage. The problem with this process is that it is complicated and time-consuming, limiting the depth and speed of investigations. Whereas other proposed optimizations and automations appear to be very promising, rapid advancements in the development of Large Language Models (LLMs) opened up new possibilities to speed up parts of the triage process that previously required human judgment. This research aims to identify ways in which LLMs can optimize triage, evaluate the performance of these techniques and offer a comparison between different LLMs including GPT-4, Aya, Code Llama, Gemma, Llama 3, Mistral and Phi-3. The study shows that GPT-4 is the most capable model, while Llama 3 and Mistral achieve competitively similar results. The findings in this study are expected to help security teams make informed implementation decisions when optimizing the triage process.

Scripts, Prompts and Data

The Jupyter Notebook is available in dataNscripts/llm-triage-automation.ipynb. The used data and prompts are available in dataNscripts/data. The Enron email dataset is excluded due to its size, it is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
dataNscripts		dataNscripts
paper		paper
presentation		presentation
proposal		proposal
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automating the Cybersecurity Triage Process: A Comparative Study on the Performance of Large Language Models

Abstract

Scripts, Prompts and Data

About

Releases

Packages

Languages

PascalNB/llm-triage-automation

Folders and files

Latest commit

History

Repository files navigation

Automating the Cybersecurity Triage Process: A Comparative Study on the Performance of Large Language Models

Abstract

Scripts, Prompts and Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages