cross-arch-instr-model.github.io

Thank you for looking at our work! The programs included here were created for the following paper:

"A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis"

Kimberly Redmond, Lannan Luo, and Qiang Zeng

The NDSS Workshop on Binary Analysis Research (BAR), 2019.

############################

The trained cross-architecture instruction embedding model used in our paper are included in the output/ directory. Please remember to unzip the four output files.

Our embeddings were trained on the model Bivec, which is based on Word2Vec. You may find it here:

https://github.com/lmthang/bivec

############################

ABOUT THESE PROGRAMS

All file paths and instruction selections are hard-coded into these programs. For your convenience, they are listed in variables near the top; feel free to modify them for your use.

./senvec.py

Returns ROC plots and AUC scores for cross-architecture basic block similarity tests. Basic block embeddings are calculated by summing instruction embeddings within a block

Similarity is computed using Cosine similarity

./tsne2.py

Returns 2 t-SNE figures with different displays: 1) an unlabeled figure displaying all instructions in one vector space 2) a labeled figure displaying selected instructions in one vector space

./instr_sim.py

Returns 2 ROC plots and AUC scores for instruction-level similarity tests. Instructions are evaluated in pairs, in 2 ways: 1) mono-architecture 2) cross-architecture

The similarity metric used is cosine similarity.

./query.py

Returns a list of the top-5 most similar instructions, given an instruction. Each instruction returns the top 6 instructions from its own architecture (#1 is itself), and the top 5 instructions from the other architecture, according to cosine similarity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cross-arch-instr-model.github.io

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
output		output
README.md		README.md
instr_sim.py		instr_sim.py
query.py		query.py
senvec.py		senvec.py
tsne2.py		tsne2.py

nlp-code-analysis/cross-arch-instr-model

Folders and files

Latest commit

History

Repository files navigation

cross-arch-instr-model.github.io

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages