Skip to content

Latest commit

 

History

History
74 lines (57 loc) · 4.71 KB

README.md

File metadata and controls

74 lines (57 loc) · 4.71 KB

OBFUSEVAL

This repository contains our benchmark and the code to build and test it. The Benchmark includes raw context and structured obfuscated context information for five software programs (redis, lvgl, fluent, libgit, libvips), as well as inputs for code generation tasks and code completion tasks built on the basis of the context and various obfuscation strategies. Our benchmark is composed as follows:

Soft. Original Functions Symbol Obfuscation Functions Structure Obfuscation Functions Semantic Obfuscation Functions Symbol + Structure Obfuscation Functions Symbol + Semantic Obfuscation Functions
redis 681 681 215 106 215 106
libvips 203 203 58 17 58 17
lvgl 303 303 115 15 115 15
libgit2 78 78 32 10 32 10
fluent 89 89 30 11 30 11
Total 1,354 1,354 450 159 450 159

The code contains code that builds the inputs for the generation task and the complementation task, as well as code that automates the testing of the performance of LLM on both tasks.

Code

input construct

Files beginning with complete are used to build the code-completion task input, and the rest are used to build the code-generation task. For the code generation task, the individual files are used to build the original input, the symbol obfuscated input, the struct obfuscated input, and the struct+symbol obfuscated input, respectively. For the code completion task, each file is used to build the original input, symbol obfuscated input, semantic obfuscated input, and semantic+symbol obfuscated input.

LLM test

We provide a method to automate the testing of LLM code generation and code completion capabilities on five software, the overall process is to give input -> LLM generate code -> code replacement -> execute system tests -> get compilation or test error information.

experimental data

We provide the benchmark and the result file of the experiment (LLMs_Performance.xlsx). The benchmark includs five different software (redis, lvgl, fluent, libgit, libvips), with each software category containing five main folders:

1. All Context

Extracted context of the corresponding functions from the source code, including two files:

  • The original context
  • The context after structure expansion

The included columns are:

  • fileName: File name
  • funcName: Function name
  • function_header: Function header
  • function_code: Function code
  • comments: Comments
  • contextGenByGpt: Context generated by GPT
  • calledFunctions: Functions called
  • usedStructs: Structures used
  • usedGloVars: Global variables used
  • usedMacros: Macros used

2. Symbol

Files related to symbol obfuscation.

3. Struct

Files related to structure obfuscation, including:

  • struct_confuse_result_xxx.xlsx: Contains the function's file path, target function header, target function body, functions to be expanded, and the new target function after expansion. All functions have passed compilation and testing.
  • afterconfuse_src_lvgl: The src folder after replacing all target expanded functions, which can pass compilation and testing.

4. Semantic

Files related to semantic obfuscation.

5. Input of Test

Files related to GPT-generated test inputs and outputs, containing eight subfolders:

  • original
  • struct
  • symbol
  • symbol+struct
  • complete original
  • complete symbol
  • complete semantic
  • complete semantic+symbol

Each subfolder contains five files:

  1. input file
  2. gpt-3.5
  3. deepseek
  4. gpt-4-1106
  5. gpt-4-0125