Skip to content

Universal Adversarial Perturbations (UAPs) for PyTorch

License

Notifications You must be signed in to change notification settings

kenny-co/sgd-uap-torch

Repository files navigation

Universal Adversarial Perturbations on PyTorch

This repository implements the targeted and untargeted versions of the Stochastic Gradient Descent (SGD) algorithm (also known as Stochastic Projected Gradient Descent (sPGD) in Mummadi et al. and Deng & Karam) for generating Universal Adversarial Perturbations (UAPs). These are a class of adversarial attacks on deep neural networks where a single UAP can fool a model on an entire set of affected inputs. SGD has been shown to create more effective UAPs than the originally proposed iterative-DeepFool by Moosavi-Dezfooli et al.

For undefended models trained on ImageNet, we can expect untargeted UAPs to achieve above 90% evasion rate on the ImageNet validation set at an L-infinity perturbation constraint of 10/255.

   

An example of a targeted UAP for a ResNet18 model on CIFAR-10 is shown above with its effect on the model's output distribution. The original model, which can be downloaded here, achieves 94.02% accuracy on the original test set.

This repository contains sample code, interactive Jupyter notebooks, and pre-computed uaps for the following work:

slider

We encourage you to explore these Python notebooks to generate and evaluate your own UAPs. If you are new to this topic, we suggest running the notebooks for the CIFAR-10 UAPs first.

UAPs for Texture vs. Shape

texture-shape.ipynb visualizes some results discussed in the paper, exploring the UAPs for texture and shape-biased models. We are thankful to Geirhos et al. for making available their models trained on Stylized-ImageNet.

Preparation

Refer to instructions here for downloading and preparing the ImageNet dataset.

A pre-trained ResNet18 for CIFAR-10 is available here that achieves 94.02% accuracy on the test set. Pre-trained ImageNet models are available online via torchvision.

Supported Universal Attacks

Universal attacks on CIFAR-10 and ImageNet models are based on:

We plan to include future support for other universal attacks like procedural noise and adversarial patches.

Acknowledgments

Learn more about the Resilient Information Systems Security (RISS) group at Imperial College London. Kenneth Co is partially supported by DataSpartan.

If you find this project useful in your research, please cite:

@inproceedings{co2021universal,
  title={Universal adversarial robustness of texture and shape-biased models},
  author={Co, Kenneth T and Mu{\~n}oz-Gonz{\'a}lez, Luis and Kanthan, Leslie and Glocker, Ben and Lupu, Emil C},
  booktitle={2021 IEEE International Conference on Image Processing (ICIP)},
  pages={799--803},
  year={2021},
  organization={IEEE}
}

@article{matachana2020robustness,
  title={Robustness and Transferability of Universal Attacks on Compressed Models},
  author={Matachana, Alberto G and Co, Kenneth T and Mu{\~n}oz-Gonz{\'a}lez, Luis and Martinez, David and Lupu, Emil C},
  journal={arXiv preprint arXiv:2012.06024},
  year={2020}
}

This project is licensed under the MIT License, see the LICENSE.md file for details.