Skip to content

BTD - Bin To DNN: A DNN Executables Decompiler

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
COPYING
Notifications You must be signed in to change notification settings

monkbai/DNN-decompiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BTD: DNN Executables Decompiler

Research Artifact for our USENIX Security 2023 paper: "Decompiling x86 Deep Neural Network Executables"

BTD is the first deep neural network (DNN) executables decompiler. BTD takes DNN executables (running on x86 CPUs) compiled by DNN compilers (e.g., TVM, Glow, and NNFusion) and outputs full model specifications, including types of DNN operators, network topology, dimensions, and parameters that are (nearly) identical to those of the input models. BTD is evaluated to be robust against complex compiler optimizations, such as operator fusion and memory layout optimization. More details are reported in our paper published at USENIX Security 2023.

Paper: coming soon

Extended version (25 pages): https://arxiv.org/abs/2210.01075

Artifact Appendix in USENIX format: artifact-appendix.pdf

This repo contains all code and data used in the evaluation of BTD, we also provide a Docker image to ease the AE process.

News:

This artifact is evaluated and awarded with badges: Available, Functional, Reproduced.

Prerequisites

ubuntu 18.04
git
gcc/g++ (7.5.0)
make (4.1)
python3 (3.6.9 or higher)
  - numpy-1.19.5
  - torch (1.9.0 or higher)
  - torchvision (0.11.2)
  - fastBPE (0.1.0)
  - tqdm (4.64.1)
Intel pin (3.14) 
IDA Pro (optional)

You can download pin 3.14 from here, or use the docker image with all prerequisites installed.

IDA

BTD relies on IDA Pro (version 7.5) for disassembly, and because IDA is commercial software, we do not provide it in this repo; instead, in order to reduce the workload of AE reviewers, we provide the disassembly results directly as input for BTD. The scripts used to disassemble DNN executable into assembly functions with IDA are presented in ida/. IDA Pro is not indispensable; any other full-fledged disassembly tool can be used to replace IDA, but we do not provide the relevant code here.

Hardware

We ran our evaluation experiments on a server equipped with Intel Xeon CPU E5-2683, 256GB RAM, and an Nvidia GeForce RTX 2080 GPU. Logging and filtering all traces for all DNN executables in the evaluation takes more than a week (sorry, we currently only provide a single-thread version) and consumes nearly 1TB disk storage. To ease the AE committee to review, we omit the trace logging process and provide the filtered traces in the docker image and evaluation data. The trace logger and filter are provided in MyPinTool/ and the trace_filter.py script. Without logging and filtering, the whole evaluation takes roughly one day and requires less than 120GB of disk space. Besides, the symbolic execution may consume a lot of memory resources, so please make sure that the machine on which the experiment is run has sufficient memory.

Dataset

compilers

dataset-statistics

Our evaluation covers above 7 models compiled with 9 different compiler options, including Glow-2020, Glow-2021, Glow-2022, TVM-v0.7 (O0 and O3), TVM-v0.8 (O0 and O3), TVM-v0.9.dev (O0 and O3), in total 63 DNN excutables. NNFusion-emitted executables are easier to decompile since they contain wrapper functions to invoke target operator implementations in kernel libraries (see our paper for more detailed discussion). Thus, in this evaluation we only focus on decompiling executables compiled by TVM and Glow.

Artifact Evaluation

0. Import Docker Image

Download the packed docker image, then run the command below to unpack the .tar file into a docker image. This may take a while. (You can replace btd-artifact with any image name that would not conflict with existing names).

cat BTD-artifact.tar | docker import - btd-artifact

Create a container named BTD-AE with the docker image:

docker run -dit --name BTD-AE btd-artifact /bin/bash

Open a bash in the container:

docker exec -it BTD-AE /bin/bash
cd /home

You can then run the evaluation commands (listed in Operator Inference and Decompilation & Rebuild below) within this bash. We strongly recommend reviewers use the provided Docker image for artifact evaluation to avoid errors that may be caused by environments.

1. Prepare

If you are using the provided docker image, you can skip this Prepare section and move to Operator Inference.

Download and unzip Intel pin 3.14, then update the pin home directory (pin_home) in config.py.

git clone https://github.com/monkbai/DNN-decompiler.git
mkdir <path_to_pin_home>/source/tools/MyPinTool/obj-intel64
cd DNN-decompiler
git pull
python3 pin_tools.py

pin_tools.py will copy and compile all pin tools listed in MyPinTool/.

Download and unzip the data (BTD-data) used for artifact evaluation, update the data directory DATA_DIR in decompile_eval.sh.

Download data.zip and output.zip and unzip them into the operator_inference/data and operator_inference/output directories, respectively.

2. Operator Inference

The code structure and docs of operator inference is provided in operator_inference/README.

cd DNN-decompiler
git pull
./op_infer_eval.sh

The ./op_infer_eval.sh will run the operator inference experiments. Inference results are written in operator/output/<compiler_option>/text/test_000.txt. The output would be in format: <Compiler Option>-<Model>-<Operator Name/Type> Pred: output. For example, the output below indicates that a libjit_fc_f (Fully-Connected, FC) operator in the vgg16 model compiled with GLOW_2021 is correctly inferred as matmul (Matrix Multiplication).

GLOW_2021-vgg16-libjit_fc_f Pred: matmul
GLOW_2021-vgg16-libjit_fc_f Label: matmul

3. Decompilation & Rebuild

cd DNN-decompiler
git pull
./decompile_eval.sh

The ./decompile_eval.sh will decompile and rebuild all 63 DNN executables. It takes roughly 24 hours to finish all experiments. The output of rebuilt models and original DNN executables will be printed on screen (see example in Decompilation Correctness below). Corresponding decompilation outputs will be stored in evaluation/<model>_<compiler>_<version>_<opt level>.


Decompilation Output Interpretation

BTD will decompile a DNN executable into ❶ DNN operators and their topological connectivity, ❷ dimensions of each DNN operator, and ❸ parameters of each DNN operator, such as weights and biases.

After executing decompile_eval.sh, for each directory in evaluation/, a topo_list.json containing the network topology (❶), a new_meta_data.json containing dimensions information (❷), and a series of <func_id>.<weights/biases>_<id>.json containing all parameters of the decompiled DNN model (❸) will be generated.

Each item in topo_list.json: ['node id', '<func_id>.txt', 'operator type', [input addresses], 'output address', [input node ids], occurrence index].

Example (vgg16 TVM v0.8 O)):

[
    1,                  // node id
    "0031.txt",         // func id (func name)
    "bias_add",         // operator type
    [                   // input addresses
        "0x50a5e0",     // output address of previous node
        "0x22e2b1e0"    // biases address
    ],
    "0x114b1e0",        // output address
    [
        0               // input node id
    ],
    0                   // occurrence index of the func
],

Each item in new_meta_data.json: ['<func_id>.txt', [operator dimensions], 'operator entry address (in executable)', 'operator type', with_parameter, stride (if exists), padding (if exists)].

Example (vgg16 TVM v0.8 O0):

[
    "0049.txt",     // func_id (or func name)
    [               // dimensions
        [           // filter/weights dimensions
            64.0,
            3.0,
            3,
            3
        ],
        [           // input dimensions
            1,
            3.0,
            226.0,
            226.0
        ],
        [           // output dimensions
            1,
            64.0,
            224,
            224
        ],
        [           // weights layout
            2.0,
            1,
            3,
            3,
            3.0,
            32.0
        ]
    ],
    "0x405040",     // operator entry
    "conv2d",       // operator type
    1,              // has parameters
    1,              // stride = 1
    1               // padding = 1
],

Decompilation Correctness

example-input-img

After decompilation, the DNN model is rebuild with decompiled model structure and extracted parameters (stored in .json format). decompile_eval.sh will run each rebuilt model (implemented in pytorch) and the original DNN executable with the above example image in binary format as input. The output would be like this:

 - vgg16_tvm_v09_O3
 - Rebuilt model output:
Result: 282
Confidence: 9.341153
 - DNN Executable output:
The maximum position in output vector is: 282, with max-value 9.341150.
timing: 566.89 ms (create), 0.54 ms (set_input), 4034.66 ms (run), 0.00 ms (get_output), 0.61 ms (
destroy)

In the above exmaple, both rebuilt model and DNN executable output result as 282 (see 1000 classes of ImageNet), and the confidence scores are 9.341153 and 9.341150 respectively. While the confidence scores (or max values) are slightly inconsistent, we interpret that such inconsistency is caused by the floating-point precision loss between pytorch model and DNN executable, i.e., the decompilation is still correct.

4. Results Summarization

Update: We uploaded scripts to summarize the results of the above experiments.

git pull
./summarization.sh

summarization.sh will invoke scripts including:

  • statistic.py, which collects statistics of DNN executables evaluated in our study (Table 2). Note that the statistics may slightly deviate from the numbers in Table 2 depending on the building environment, but this should not affect our claims in the paper.
  • operator/run_accuracy.py, which calculates the average accuracy of operator inference (Table 3). Note that since we have manually fixed the "Add vs. BiasAdd" issue discussed in Operators with Similar Assembly Code of Section 7.1.1, in some cases, the accuracy may be higher (i.e., better results) than results reported in Table 3.
  • parameter_accuracy.py, which calculates the dimension inference accuracy/parameter inference accuracy of TVM Resnet18 (Table 4). Note that it is difficult to compare the recovered dimensions/parameters with the reference due to compiler optimizations (e.g., operator fusion), i.e., the ground truth of optimized models is not available, as discussed in Sec 7.1.3. Hence, #failures in Table 4 equals #dimensions or #parameters that need to be fixed before the recovered models can be compiled into executables showing identical behavior with the references. This script only reproduces results for Resnet18 (accuracies for all other models are 100%, and therefore no need to be included in this script; see results in Sec 7.1.4).
  • recompile_correctness.py, which evaluates the correctness of recompilation (Table 5). Pass means the model is 100% correctly rebuilt. Note that we manually fix errors in TVM Resnet18 as discussed in Sec 7.1.4 to confirm our claim that "all remaining operators in ResNet18 are correctly decompiled". Therefore, we expect to get 63/63 passes by running this script.

When the summarization.sh script finishes running, all results reported in Table 2-5 should be printed to the screen.

Code Structure

├── MyPinTool/          // Pin tools' source code
├── compiler_opt/       // identify the complation provenance
├── evaluation/         // scripts for main evaluation including 63 executables
├── ida/                // ida scripts
├── nlp_models/         // nlp models evaluation
├── nnfusion/           // nnfusion evaluation
├── operator_inference/ // inference the type of a DNN operator
├── recompile/          // recompile decompiled models
├── validation/         // to validate the correctness of rebult models
├── white-box-attack/   // info about white-boix attacks we used 
├── config.py
├── decompile_eval.sh   // script for artifact evaluation
├── explain.py          // heuristics used in BTD
├── fused_trace.py
├── mem_slices.py
├── pin_tools.py        // pin tools compilation and usage
├── se_engine.py        // a trace based emybolic execution engine
├── split_funcs.py      // split disassembly output into functions
├── trace_filter.py     // taint analysis to filter logged trace 
└── utils.py

If you are interested in the interfaces of BTD, you can take a look at the decompilation scripts in evaluation/, e.g., vgg16_tvm_O0_decompile.py.

Data

Our dataset is available at https://doi.org/10.5281/zenodo.7219867.

We also provided all datasets via Dropbox for better download speed.

About

BTD - Bin To DNN: A DNN Executables Decompiler

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published