Skip to content
/ RPAL Public

Code library for the RPAL framework from 'The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers'.

License

Notifications You must be signed in to change notification settings

s2labres/RPAL

Repository files navigation


Recovering from Poisoning through Active Learning (RPAL) Framework

This repository is the official release of the code used for the 'The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers' Paper published in the Workshop on Recent Advances in Resilient and Trustworthy Machine Learning (ARTMAN) 2024, co-located with ACSAC.

If you plan to use this repository in your projects, please cite the following paper:

@inproceedings{mcfadden2024recovery,
  title = {The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers},
  author = {McFadden, Shae and Kan, Zeliang and Cavallaro, Lorenzo and Pierazzi, Fabio},
  booktitle = {Proc. of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops)},
  year = {2024},
}

Disclaimer

Please note that the code in this repository is only a research prototype. This code is released under a "Modified (Non-Commercial) BSD License": see the terms here.


Installation

Please note that this project requires tesseract-ml, which can be found here and installed as follows.

pushd ${PATH_TO}/tesseract-ml
python setup.py install (install tesseract)
popd

Once tesseract-ml has been installed, RPAL can be setup as follows.

pip install NumpyEncoder
cd RPAL;
pip install -r requirements.txt
pip install .

Repository Contents

RPAL

  • RPAL/classification.py: This code handles training & testing of the classifier and returns the results.
  • RPAL/constraints.py: This code enables easy checking of spatial and temporal bias in the data.
  • RPAL/data.py: This code handles the various data manipulations required.
  • RPAL/grapher.py: This code generates the experiment and results plots.
  • RPAL/loader.py: This code handles loading the dataset.
  • RPAL/poison.py: This code performs all the data poisoning.
  • RPAL/recovery.py: This code generates all the recovery data.

Results

  • Results/Data/: Contains the data presented in the paper.
  • Results/Scripts/: Contains the scripts used to generate the plots and table data in the paper.

Experiments:

  • Drebin-Label-Flip-Deep-Tesseract.py: Runs all DNN experiments shown in the paper.
  • Drebin-Label-Flip-RF-Tesseract.py: Runs all RF experiments shown in the paper.
  • Drebin-Label-Flip-SVM-Tesseract.py: Runs all SVM experiments shown in the paper.

Other:

  • deepdrebin.py: Implements a SKLean compatible implementation of the architecture used in 'Adversarial Examples for Malware Detection' by Grosse et al.
  • Clean_Label_Poisoning_Mapping.py: Generates the feature-flip mappings used to mimic the label-flip attack.

About

Code library for the RPAL framework from 'The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers'.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages