Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datasets, licensing, scenarios, baseline models, and metrics #826

Merged
merged 15 commits into from
Oct 23, 2020
15 changes: 10 additions & 5 deletions docs/adversarial_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,11 @@ Example attack module for image classification scenario:
### Image Datasets
| `name` | `adversarial_key` | Description | Attack | Source Split | x_shape | x_type | y_shape | y_type | Size |
|:------------------------------:|:------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------:|:----------------:|:------:|:-------:|:------:|:--------------:|
| "apricot_dev_adversarial" | "adversarial" | Physical Adversarial Attacks on Object Detection| Targeted, universal patch | dev | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | 138 images |
| "imagenet_adversarial" | "adversarial" | ILSVRC12 adversarial image dataset for ResNet50 | | (N, 224, 224, 3) |uint8 | (N,) | int64 | 1000 images |
| "resisc45_adversarial_224x224" | "adversarial_univpatch" | REmote Sensing Image Scene Classification | Targeted, universal patch | test | (N, 224, 224, 3) | uint8 | (N,) | int64 | 5 images/class |
| "resisc45_adversarial_224x224" | "adversarial_univperturbation" | REmote Sensing Image Scene Classification | Untargeted, universal perturbation | test | (N, 224, 224, 3) | uint8 | (N,) | int64 | 5 images/class |
| "apricot_dev_adversarial" | "adversarial" | Physical Adversarial Attacks on Object Detection| Targeted, universal patch | dev | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | 138 images |


Note: the APRICOT dataset contains labels and bounding boxes for both COCO objects and physical adversarial patches.
The label used to signify the patch is the `ADV_PATCH_MAGIC_NUMBER_LABEL_ID` defined in
Expand All @@ -54,16 +56,19 @@ patch and a varying number of COCO objects (including zero).


### Audio Datasets
| `name` | `adversarial_key` | Description | Attack | Source Split | x_shape | x_type | y_shape | y_type | sampling_rate | Size |
| `name` | `adversarial_key` | Description | Attack | Source Split | x_shape | x_type | y_shape | y_type | sampling_rate | Size |
|:-------------------------:|:-----------------:|:--------------------------------------------------:|:----------------------------------:|:------------:|:---------:|:------:|:-------:|:------:|:-------------:|:--------------:|
| "librispeech_adversarial" | "adversarial" | Librispeech dev dataset for speaker identification | Untargeted, universal perturbation | test | (N, 3000) | int64 | (N,) | int64 | 16 kHz | ~5 sec/speaker |
| "librispeech_adversarial" | "adversarial_perturbation | Librispeech dev dataset for speaker identification | | test | (N, variable_length) | int64 | (N,) | int64 | 16 kHz | ~5 sec/speaker |
| "librispeech_adversarial" | "adversarial_univperturbation" | Librispeech dev dataset for speaker identification | Untargeted, universal perturbation | test | (N, variable_length) | int64 | (N,) | int64 | 16 kHz | ~5 sec/speaker |


### Video Datasets
| `name` | `adversarial_key` | Description | Attack | Source Split | x_shape | x_type | y_shape | y_type | Size |
|:----------------------------:|:--------------------------:|:--------------------------:|:----------------------------------:|:------------:|:---------------------------------:|:------:|:-------:|:------:|:--------------:|
| "ucf101_adversarial_112x112" | "adversarial_patch" | UCF 101 Action Recognition | Untargeted, universal perturbation | test | (N, variable_frames, 112, 112, 3) | uint8 | (N,) | int64 | 5 videos/class |
| "ucf101_adversarial_112x112" | "adversarial_perturbation" | UCF 101 Action Recognition | Targeted, patch | test | (N, variable_frames, 112, 112, 3) | uint8 | (N,) | int64 | 5 videos/class |
| "ucf101_adversarial_112x112" | "adversarial_perturbation" | UCF 101 Action Recognition | Untargeted, universal perturbation | test | (N, variable_frames, 112, 112, 3) | uint8 | (N,) | int64 | 5 videos/class |

### Poison Datasets
To be added
| `name` | `adversarial_key` | Description | Attack | Source Split | x_shape | x_type | y_shape | y_type | Size |
|:------------------------------:|:------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------:|:----------------:|:------:|:-------:|:------:|:--------------:|
| "gtsrb_poison" | None | German Traffic Sign Poison Dataset | Data poisoning | | (N, 48, 48, 3) | float32 | (N,) | int64 | 2220 images |
4 changes: 3 additions & 1 deletion docs/baseline_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ The model files can be found in [armory/baseline_models/keras](../armory/baselin
| Micronnet CNN | |
| MNIST CNN | `undefended_mnist_5epochs.h5` |
| ResNet50 CNN | `resnet50_imagenet_v1.h5` |
| so2sat CNN | `multimodal_baseline_weights.h5` |


### PyTorch
Expand All @@ -36,11 +37,12 @@ The model files can be found in [armory/baseline_models/pytorch](../armory/basel
| Model | S3 weight_files |
|:----------: | :-----------: |
| Cifar10 CNN | |
| DeepSpeech 2 | |
| Sincnet CNN | `sincnet_librispeech_v1.pth` |
| MARS | `mars_ucf101_v1.pth` , `mars_kinetics_v1.pth` |
| ResNet50 CNN | `resnet50_imagenet_v1.pth` |
| MNIST CNN | `undefended_mnist_5epochs.pth` |
| xView Faster-RCNN | xview_model_state_dict_epoch_99_loss_0p67 |
| xView Faster-RCNN | `xview_model_state_dict_epoch_99_loss_0p67` |


### TensorFlow 1
Expand Down
24 changes: 23 additions & 1 deletion docs/dataset_licensing.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ the Creative Commons 4.0 International ShareAlike license and are Copyright Two
| GTSRB | [CC0 Public Domain](https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign)|
| Imagenette | [Apache 2.0](https://github.com/fastai/imagenette/blob/master/LICENSE) |
| UCF101 | Fair use exception |
| RESISC45 | Fair use exception |
| RESISC45 | Fair use exception | (http://xviewdataset.org/)
| xView | [Creative Commons Attribution-Noncommercial-ShareAlike 4.0 International](https://arxiv.org/pdf/1802.07856) |
| so2sat | [Creative Commons 4.0](https://mediatum.ub.tum.de/1454690) |

## Attributions

Expand Down Expand Up @@ -88,6 +90,26 @@ practicable. Please direct inquiries to <armory@twosixlabs.com>.
| Dataset link | https://github.com/fastai/imagenette |
| Modification | (Slight) Representation of images as binary tensors |

### xView
|Attribution | |
|------------------------------|--------------|
| Creator/attribution parties | Defense Innovation Unit Experimental (DIUx) and the National Geospatial-Intelligence Agency (NGA) |
| Copyright notice | |
| Public license notice | http://xviewdataset.org/terms.html |
| Disclaimer notice | |
| Dataset link | http://xviewdataset.org/#dataset |
| Modification | |

### so2sat
|Attribution | |
|------------------------------|--------------|
| Creator/attribution parties | Xiaoxiang Zhu, Jingliang Hu, Chunping Qiu, Yilei Shi, Jian Kang, Lichao Mou, Hossein Bagheri, Matthias Haeberle, Yuansheng Hua, Rong Huang, Lloyd Hughes, Hao Li, Yao Sun, Guichen Zhang, Shiyao Han, Michael Schmitt, and Yuanyuan Wang |
| Copyright notice | |
| Public license notice | https://mediatum.ub.tum.de/1454690 |
| Disclaimer notice | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. |
| Dataset link | https://mediatum.ub.tum.de/1454690 |
| Modification | |

## Fair use notes for RESISC-45 and UCF101
* Two Six Labs does not charge users for access to the Armory repository,
nor the datasets therein, nor does it derive a profit directly from use of the
Expand Down
27 changes: 18 additions & 9 deletions docs/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,36 +17,45 @@ These tfrecord files will be pulled from S3 if not available on your

| Dataset | Description | x_shape | x_dtype | y_shape | y_dtype | splits |
|:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
| [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) | CIFAR 10 classes image dataset | (N, 32, 32, 3) | uint8 | (N,) | int64 | train, test |
| [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) | CIFAR 10 classes image dataset | (N, 32, 32, 3) | float32 | (N,) | int64 | train, test |
| [german_traffic_sign](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset) | German traffic sign dataset | (N, variable_height, variable_width, 3) | uint8 | (N,) | int64 | train, test |
| [imagenette](https://github.com/fastai/imagenette) | Smaller subset of 10 classes from Imagenet | (N, variable_height, variable_width, 3) | uint8 | (N,) | int64 | train, validation |
| [mnist](http://yann.lecun.com/exdb/mnist/) | MNIST hand written digit image dataset | (N, 28, 28, 1) | uint8 | (N,) | int64 | train, test |
| [resisc45](https://arxiv.org/abs/1703.00121) | REmote Sensing Image Scene Classification | (N, 256, 256, 3) | uint8 | (N,) | int64 | train, validation, test |
| imagenet_adversarial | ILSVRC12 adversarial dataset from ResNet50 | (N, 224, 224, 3) | uint8 | (N,) | int64 | NA |
| [xView](https://arxiv.org/pdf/1802.07856) | Objects in Context in Overhead Imagery | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | train, test |
| [resisc45](https://arxiv.org/abs/1703.00121) | REmote Sensing Image Scene Classification | (N, 256, 256, 3) | float32 | (N,) | int64 | train, validation, test |
| [xView](https://arxiv.org/pdf/1802.07856) | Objects in Context in Overhead Imagery | (N, variable_height, variable_width, 3) | float32 | n/a | dict | train, test |

<br>

### Audio Datasets
| Dataset | Description | x_shape | x_dtype | y_shape | y_dtype | sampling_rate | splits |
|:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:-------: |:------: |
| [digit](https://github.com/Jakobovski/free-spoken-digit-dataset) | Audio dataset of spoken digits | (N, variable_length) | int64 | (N,) | int64 | 8 kHz | train, test |
| [librispeech_dev_clean](http://www.openslr.org/12/) | Librispeech dev dataset for speaker identification | (N, variable_length) | int64 | (N,) | int64 | 16 kHz | train, validation, test |
| [librispeech](http://www.openslr.org/12/) | Librispeech dataset for automatic speech recognition | (N, variable_length) | float32 | (N,) | bytes | 16 kHz | dev_clean, dev_other, test_clean, train_clean100 |
| [librispeech_dev_clean](http://www.openslr.org/12/) | Librispeech dev dataset for speaker identification | (N, variable_length) | float32 | (N,) | int64 | 16 kHz | train, validation, test |
| [librispeech_dev_clean_asr](http://www.openslr.org/12) | Librispeech dev dataset for automatic speech recognition | (N, variable_length) | float32 | (N,) | bytes | 16 kHz | train, validation, test |

<br>

### Video Datasets
| Dataset | Description | x_shape | x_dtype | y_shape | y_dtype | splits |
|:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
| [ucf101](https://www.crcv.ucf.edu/data/UCF101.php) | UCF 101 Action Recognition | (N, variable_frames, 240, 320, 3) | uint8 | (N,) | int64 | train, test |
| [ucf101](https://www.crcv.ucf.edu/data/UCF101.php) | UCF 101 Action Recognition | (N, variable_frames, 240, 320, 3) | float32 | (N,) | int64 | train, test |

<br>

### Multimodal Datasets
| Dataset | Description | x_shape | x_dtype | y_shape | y_dtype | splits |
|:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
| [so2sat](https://mediatum.ub.tum.de/1454690) | Co-registered synthetic aperture radar and multispectral optical images | (N, 32, 32, 14) | float32 | (N,) | int64 | train, validation |

<br>

### Preprocessing

Input-modifying preprocessing of datasets occurs as part of a model used within Armory. The cached
datasets are preprocessed into tfrecords, however this preprocessing primarily consists of changing the
representation of inputs, e.g. running pydub on flac audio files.
Armory applies preprocessing to each convert each dataset to canonical form (e.g. normalize the range of values, set the data type).
kmerchant31892 marked this conversation as resolved.
Show resolved Hide resolved
Any additional preprocessing that is desired should occur as part of the model under evaluation.

Canonical preprocessing is not yet supported when `framework` is `tf` or `pytorch`.

### Splits

Expand Down
14 changes: 12 additions & 2 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,23 @@ is a JSON-able dict.
| Name | Type | Description |
|:-------: |:-------: |:-------: |
| categorical_accuracy | Task | Categorical Accuracy |
| top_5_categorical_accuracy | Task | Top-5 Categorical Accuracy |
| object_detection_AP_per_class | Task | Average Precision @ IOU=0.5 |
| top_n_categorical_accuracy | Task | Top-n Categorical Accuracy |
| top_5_categorical_accuracy | Task | Top-5 Categorical Accuracy |
| word_error_rate | Task | Word Error Rate |
| image_circle_patch_diameter | Perturbation | Patch Diameter |
| lp | Perturbation | L-p norm |
| linf | Perturbation | L-infinity norm |
| l2 | Perturbation | L2 norm |
| l1 | Perturbation | L1 norm |
| l0 | Perturbation | L0 "norm" |
| image_circle_patch_diameter | Perturbation | Patch Diameter |
| mars_mean_l2 | Perturbation | Mean L2 norm across video stacks |
| mars_mean_patch | Perturbation | Mean patch diameter across video stacks |
| norm | Perturbation | L-p norm |
| snr | Perturbation | Signal-to-noise ratio |
| snr_db | Perturbation | Signal-to-noise ratio (decibels) |
| snr_spectrogram | Perturbation | Signal-to-noise ratio of spectrogram |
| snr_spectrogram_db | Perturbation | Signal-to-noise ratio of spectrogram (decibels) |

<br>

Expand Down
Loading