Skip to content

yaseryacoob/GAN-Scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 

Repository files navigation

GAN-Scanner: A Detector for Faces of StyleGAN+

This page describes a StyleGAN+ face image detector developed by Yaser Yacoob at Computer Vision Lab, University of Maryland, College Park. A detailed description will be released in a forthcoming paper.

Description

This detector is designed for faces. It works best at medium and high resolution images (at least 128x128, but best range is 256x256 or higher).

This detector employs StyleGAN2 inversion as the core-idea for detection. The hypothesis is that a perfect inversion of a face is highly likely to indicate a GAN generated image, while a real-image is slightly less invertable. Multiple metrics are used to compute a feature vector that reflects the quality of the inversion. The computed feature is then scored (range 0-1.0, where 0 is likely a GAN image and 1.0 is an authentic image). An evidence image is also provided to help a human make a final determination if the result of processing is convincing.

The classifier was trained on a balanced dataset of 81400 images, 40800 authentic (FFHQ+CELEBHQ) and 40800 GAN generated (5100 from each of StyleGAN2-raw+three compressions levels, 10,25,50, StyleGAN-ADA (inversions of StyleGAN2 and CELEBHQ), StyleGAN2 Distillation and SAM (see references at the bottom).

Although the processing is completely automatic, it allows a human to make a final judgement, which balances detection accuracy at large scales of data.

Request Code or Data

Please fill out the Form if you are interested in the software. Please note that access will be granted to "image-consuming entities for public interest". Individuals (including students and researchers), re-packagers, as well as opaque entities will not be granted access.

For questions or feedback, please e-mail yaser@umd.edu with the subject [About the GAN-Scanner].

ROC of the GAN-Scanner Classifier

The trained classifier is shown, in the graph below, under two configurations. The full face configuration assumes that the face is fully visible with a reasonable background area. The second configuration assumes that the face may be partially occluded, either due to photo composition or partial occlusion (for example in a multi-person scene). The performance characterstics deteriorate sightly. From AUC of 0.91 to 0.88.

Open-World Detection Accuracy

Several targeted experiments were conducted to determine how GAN-Scanner performs in an open-world environment. Three datasets of real images were used to assess performance under different/unknown data distributions.

  1. MFC 19-20 is a NIST collected dataset for Forenisc Analysis.

  2. UMD-BLEND is derived from over 50K images from: (a) IARPA JANUS-CS3 Face Recogniton dataset, (b) FaceBook Fairness Dataset, (c) Boston and Marine Corp Marathons, Flicker-based Dataset (UMD), (d) and Women's March in DC Flicker-based Dataset (UMD). The UMD dataset is the most diverse, and therefore the best reflection of real world data at scale. It covers two decades of camera technology, subject matter, resolution and image composition.

  3. FFHQ-Extension is a 27K images previously unreleased, FFHQ-sourced, from Nvidia's original collection (the release is for this specific detection task).

The GAN generators varied, between StyleGAN2-related algorithms (these strongly operate in the same latent space), mixed or significant variants of StyleGAN2, non CNN architectures as well as StyleGAN3. The following detection accuracy reflects open-world performance of GAN-Scanner. Note that for StyleGAN3 we provide 4 different test results that correspond to two configurations provided by Nvidia, config-r and config-t. For each we provide a no-compression performance as well as an unknown rescaled and compressed data that simulates real-world and/or adversarial attacks.

Two accuracies are reported, Accuracy-F and Accuracy-P . The former is for a Full-face classifier, and the latter is for Partial (or boundary) face classifier. The results suggest that the Partial Face classifer is preferrable for overall performance if we prioritize GAN image detection, while the Full face classifier is better at reducing force alarms.

DATASET Type Data Size Detection Accuracy-F Detection Accuracy-P Notes
MFC19-20 Real Data 7.6K 95.1% 93.3% Real-world diversity
UMD-BLEND Real Data 21K 93.6% 85.0% Real-world diversity
FFHQ-Extension Real Data 27K 92.7% 86.7% Real-world diversity
SAM StyleGAN2 13.8K 98.4% 96.4% StyleGAN2-latentspace
NAVIGAN StyleGAN2 8.4K 99.8% 99.2% StyleGAN2-latentspace
StyleMixing StyleGAN2+Pix2pix 93.4K 42.2% 55.8% StyleGAN2+Pix2Pix
MobileStyleGAN.v1 StyleGAN2-reduced 5K 31.2% 37.8% StyleGAN2-varient
MobileStyleGAN.v2 StyleGAN2-reduced 5K 27.1% 35.9% StyleGAN2-varient
CIPS GAN 7K 63.2% 76.5% Non-CNN, positional Encoding
StyleGAN3-config-r StyleGAN3 20K 60.1% 82.8% StyleGAN2+Positional Encoding
StyleGAN3-config-t StyleGAN3 20K 60.1% 82.8% StyleGAN2+Positional Encoding
StyleGAN3-config-r_rescale-compression StyleGAN3 20K 42.1% 66.6% StyleGAN2+Positional Encoding
StyleGAN3-config-t_rescale-compression StyleGAN3 20K 43.2% 65.2% StyleGAN2+Positional Encoding

Open-World ROC

ROC is an important tool for reflecting the overall classifier performance. We provide two experiments that evaluate the classifier on authentic/StyleGAN3 data. The first authentic image dataset is UMD-Blend and the second is FFHQ-Extension. The UMD-Blend is a more diverse dataset, while the FFHQ-Extension shares the image distribution that inspired Nvidia's StyleGANs (i.e., 1, 2 and 3) frameworks. In each case we compare the performance with respect to 4 StyleGAN3 datasets, namely config-r, config-t, and rescaled and compressed versions of these configurations. These configurations were generously provided by Nvidia, and the characterstics of these datasets are not known to us. Consequently, this experiment is a good proxy for an open-world all around on real and generated face images.

The performance has been stable and consistent across these experiments.

Notes

  1. This is Research Code, there is no liability for use or guarantee of performance
  2. A Docker-based deployment (currently tested on linux based systems, but should be executable on Windows) is most suitable. It supports processing single image or folder of images. The technical demands are relatively low (have docker software installed and run the docker image)
  3. This Software requires access to Nvidia GPU in the computing environment.
  4. There are incremental improvements beyond what is described. The purpose of this page is to describe how the GAN-Scanner operated in an open-world upon being frozen in early 2021.

Related Work

  1. NIST Media Forensic Challenge
  2. Only a Matter of Style: Age Transformation Using a Style-Based Regression Model, SIGGRAPH 2021
  3. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation, CVPR 2021
  4. StyleGAN2 Distillation for Feed-forward Image Manipulation, ECCV2020
  5. MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis
  6. Training Generative Adversarial Networks with Limited Data
  7. Image Generators with Conditionally-Independent Pixel Synthesis
  8. Navigating the GAN Parameter Space for Semantic Image Editing, CVPR 2021
  9. Alias-Free Generative Adversarial Networks

Research Acknowledgement

This work is supported by the US Defense Advanced Research Projects Agency (DARPA) Semantic Forensics (SemaFor) Program under HR001120C0124. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the DARPA.

About

GAN image Detector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published