This repository contains our benchmark and the code to build and test it. The Benchmark includes raw context and structured obfuscated context information for five software programs (redis, lvgl, fluent, libgit, libvips), as well as inputs for code generation tasks and code completion tasks built on the basis of the context and various obfuscation strategies. Our benchmark is composed as follows:
Soft. | Original Functions | Symbol Obfuscation Functions | Structure Obfuscation Functions | Semantic Obfuscation Functions | Symbol + Structure Obfuscation Functions | Symbol + Semantic Obfuscation Functions |
---|---|---|---|---|---|---|
redis | 681 | 681 | 215 | 106 | 215 | 106 |
libvips | 203 | 203 | 58 | 17 | 58 | 17 |
lvgl | 303 | 303 | 115 | 15 | 115 | 15 |
libgit2 | 78 | 78 | 32 | 10 | 32 | 10 |
fluent | 89 | 89 | 30 | 11 | 30 | 11 |
Total | 1,354 | 1,354 | 450 | 159 | 450 | 159 |
The code contains code that builds the inputs for the generation task and the complementation task, as well as code that automates the testing of the performance of LLM on both tasks.
Files beginning with complete
are used to build the code-completion task input, and the rest are used to build the code-generation task. For the code generation task, the individual files are used to build the original input, the symbol obfuscated input, the struct obfuscated input, and the struct+symbol obfuscated input, respectively. For the code completion task, each file is used to build the original input, symbol obfuscated input, semantic obfuscated input, and semantic+symbol obfuscated input.
We provide a method to automate the testing of LLM code generation and code completion capabilities on five software, the overall process is to give input -> LLM generate code -> code replacement -> execute system tests -> get compilation or test error information.
We provide the benchmark and the result file of the experiment (LLMs_Performance.xlsx). The benchmark includs five different software (redis, lvgl, fluent, libgit, libvips), with each software category containing five main folders:
Extracted context of the corresponding functions from the source code, including two files:
- The original context
- The context after structure expansion
The included columns are:
fileName
: File namefuncName
: Function namefunction_header
: Function headerfunction_code
: Function codecomments
: CommentscontextGenByGpt
: Context generated by GPTcalledFunctions
: Functions calledusedStructs
: Structures usedusedGloVars
: Global variables usedusedMacros
: Macros used
Files related to symbol obfuscation.
Files related to structure obfuscation, including:
struct_confuse_result_xxx.xlsx
: Contains the function's file path, target function header, target function body, functions to be expanded, and the new target function after expansion. All functions have passed compilation and testing.afterconfuse_src_lvgl
: The src folder after replacing all target expanded functions, which can pass compilation and testing.
Files related to semantic obfuscation.
Files related to GPT-generated test inputs and outputs, containing eight subfolders:
original
struct
symbol
symbol+struct
complete original
complete symbol
complete semantic
complete semantic+symbol
Each subfolder contains five files:
input
filegpt-3.5
deepseek
gpt-4-1106
gpt-4-0125