twosixlabs · davidslater · Oct 23, 2020 · Oct 21, 2020 · Oct 21, 2020 · Oct 21, 2020
diff --git a/docs/adversarial_datasets.md b/docs/adversarial_datasets.md
@@ -42,9 +42,11 @@ Example attack module for image classification scenario:
 ### Image Datasets
 |             `name`             |        `adversarial_key`       |                Description                |               Attack               | Source Split |      x_shape     | x_type | y_shape | y_type |      Size      |
 |:------------------------------:|:------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------:|:----------------:|:------:|:-------:|:------:|:--------------:|
+| "apricot_dev_adversarial"      | "adversarial"                  | Physical Adversarial Attacks on Object Detection| Targeted, universal patch    | dev          | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | 138 images |
+| "imagenet_adversarial"         | "adversarial"                  | ILSVRC12 adversarial image dataset for ResNet50                                |              | (N, 224, 224, 3) |uint8   | (N,)    | int64  | 1000 images    |
 | "resisc45_adversarial_224x224" |     "adversarial_univpatch"    | REmote Sensing Image Scene Classification |      Targeted, universal patch     |     test     | (N, 224, 224, 3) |  uint8 |   (N,)  |  int64 | 5 images/class |
 | "resisc45_adversarial_224x224" | "adversarial_univperturbation" | REmote Sensing Image Scene Classification | Untargeted, universal perturbation |     test     | (N, 224, 224, 3) |  uint8 |   (N,)  |  int64 | 5 images/class |
-| "apricot_dev_adversarial"      | "adversarial"                  | Physical Adversarial Attacks on Object Detection| Targeted, universal patch    | dev          | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | 138 images |
+
 
 Note: the APRICOT dataset contains labels and bounding boxes for both COCO objects and physical adversarial patches. 
 The label used to signify the patch is the `ADV_PATCH_MAGIC_NUMBER_LABEL_ID` defined in 
@@ -54,16 +56,19 @@ patch and a varying number of COCO objects (including zero).
 
 
 ### Audio Datasets
-|           `name`          | `adversarial_key` |                     Description                    |               Attack               | Source Split |  x_shape  | x_type | y_shape | y_type | sampling_rate |      Size      |
+|           `name`          | `adversarial_key`              |                     Description                    |               Attack               | Source Split |  x_shape  | x_type | y_shape | y_type | sampling_rate |      Size      |
 |:-------------------------:|:-----------------:|:--------------------------------------------------:|:----------------------------------:|:------------:|:---------:|:------:|:-------:|:------:|:-------------:|:--------------:|
-| "librispeech_adversarial" |   "adversarial"   | Librispeech dev dataset for speaker identification | Untargeted, universal perturbation |     test     | (N, 3000) |  int64 |   (N,)  |  int64 |    16 kHz     | ~5 sec/speaker |
+| "librispeech_adversarial" | "adversarial_perturbation      | Librispeech dev dataset for speaker identification |                                    |     test     | (N, variable_length) |  int64 |   (N,)  |  int64 |    16 kHz     | ~5 sec/speaker |
+| "librispeech_adversarial" | "adversarial_univperturbation" | Librispeech dev dataset for speaker identification | Untargeted, universal perturbation |     test     | (N, variable_length) |  int64 |   (N,)  |  int64 |    16 kHz     | ~5 sec/speaker |
 
 
 ### Video Datasets
 |            `name`            |      `adversarial_key`     |         Description        |               Attack               | Source Split |              x_shape              | x_type | y_shape | y_type |      Size      |
 |:----------------------------:|:--------------------------:|:--------------------------:|:----------------------------------:|:------------:|:---------------------------------:|:------:|:-------:|:------:|:--------------:|
 | "ucf101_adversarial_112x112" |     "adversarial_patch"    | UCF 101 Action Recognition | Untargeted, universal perturbation |     test     | (N, variable_frames, 112, 112, 3) |  uint8 |   (N,)  |  int64 | 5 videos/class |
-| "ucf101_adversarial_112x112" | "adversarial_perturbation" | UCF 101 Action Recognition |           Targeted, patch          |     test     | (N, variable_frames, 112, 112, 3) |  uint8 |   (N,)  |  int64 | 5 videos/class |
+| "ucf101_adversarial_112x112" | "adversarial_perturbation" | UCF 101 Action Recognition | Untargeted, universal perturbation          |     test     | (N, variable_frames, 112, 112, 3) |  uint8 |   (N,)  |  int64 | 5 videos/class |
 
 ### Poison Datasets
-To be added
+|             `name`             |        `adversarial_key`       |                Description                |               Attack               | Source Split |      x_shape     | x_type  | y_shape | y_type |      Size      |
+|:------------------------------:|:------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------:|:----------------:|:------:|:-------:|:------:|:--------------:|
+| "gtsrb_poison"                 | None                           | German Traffic Sign Poison Dataset        | Data poisoning                     |              |  (N, 48, 48, 3)  | float32 | (N,)    | int64  | 2220 images    |
diff --git a/docs/baseline_models.md b/docs/baseline_models.md
@@ -28,6 +28,7 @@ The model files can be found in [armory/baseline_models/keras](../armory/baselin
 | Micronnet CNN |  |
 | MNIST CNN | `undefended_mnist_5epochs.h5` |
 | ResNet50 CNN | `resnet50_imagenet_v1.h5` |
+| so2sat CNN | `multimodal_baseline_weights.h5` |
 
 
 ### PyTorch
@@ -36,11 +37,12 @@ The model files can be found in [armory/baseline_models/pytorch](../armory/basel
 | Model   | S3 weight_files   | 
 |:----------: | :-----------: | 
 | Cifar10 CNN |  |  
+| DeepSpeech 2 |   |
 | Sincnet CNN | `sincnet_librispeech_v1.pth` |
 | MARS | `mars_ucf101_v1.pth` , `mars_kinetics_v1.pth` |
 | ResNet50 CNN | `resnet50_imagenet_v1.pth` |
 | MNIST CNN | `undefended_mnist_5epochs.pth` |
-| xView Faster-RCNN | xview_model_state_dict_epoch_99_loss_0p67 |
+| xView Faster-RCNN | `xview_model_state_dict_epoch_99_loss_0p67` |
 
 
 ### TensorFlow 1

diff --git a/docs/dataset_licensing.md b/docs/dataset_licensing.md
@@ -16,7 +16,9 @@ the Creative Commons 4.0 International ShareAlike license and are Copyright Two
 | GTSRB | [CC0 Public Domain](https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign)|
 | Imagenette | [Apache 2.0](https://github.com/fastai/imagenette/blob/master/LICENSE) |  
 | UCF101 | Fair use exception |
-| RESISC45 | Fair use exception |
+| RESISC45 | Fair use exception | (http://xviewdataset.org/)
+| xView | [Creative Commons Attribution-Noncommercial-ShareAlike 4.0 International](https://arxiv.org/pdf/1802.07856) |
+| so2sat | [Creative Commons 4.0](https://mediatum.ub.tum.de/1454690) |
 
 ## Attributions
 
@@ -88,6 +90,26 @@ practicable. Please direct inquiries to <armory@twosixlabs.com>.
 | Dataset link | https://github.com/fastai/imagenette |
 | Modification | (Slight) Representation of images as binary tensors |
 
+### xView
+|Attribution                   |              |  
+|------------------------------|--------------|
+| Creator/attribution parties  | Defense Innovation Unit Experimental (DIUx) and the National Geospatial-Intelligence Agency (NGA)  |
+| Copyright notice             |  |
+| Public license notice        | http://xviewdataset.org/terms.html |
+| Disclaimer notice            | |
+| Dataset link | http://xviewdataset.org/#dataset |
+| Modification |  |
+
+### so2sat
+|Attribution                   |              |  
+|------------------------------|--------------|
+| Creator/attribution parties  | Xiaoxiang Zhu, Jingliang Hu, Chunping Qiu, Yilei Shi, Jian Kang, Lichao Mou, Hossein Bagheri, Matthias Haeberle, Yuansheng Hua, Rong Huang, Lloyd Hughes, Hao Li, Yao Sun, Guichen Zhang, Shiyao Han, Michael Schmitt, and Yuanyuan Wang  |
+| Copyright notice             |  |
+| Public license notice        | https://mediatum.ub.tum.de/1454690 |
+| Disclaimer notice            | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. |
+| Dataset link | https://mediatum.ub.tum.de/1454690 |
+| Modification |  |
+
 ## Fair use notes for RESISC-45 and UCF101
 * Two Six Labs does not charge users for access to the Armory repository, 
 nor the datasets therein, nor does it derive a profit directly from use of the 

diff --git a/docs/datasets.md b/docs/datasets.md
@@ -17,36 +17,45 @@ These tfrecord files will be pulled from S3 if not available on your
 
 | Dataset    | Description | x_shape | x_dtype  | y_shape  | y_dtype | splits |
 |:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
-| [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) | CIFAR 10 classes image dataset | (N, 32, 32, 3) | uint8 | (N,) | int64 | train, test |
+| [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) | CIFAR 10 classes image dataset | (N, 32, 32, 3) | float32 | (N,) | int64 | train, test |
 | [german_traffic_sign](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset) | German traffic sign dataset | (N, variable_height, variable_width, 3) | uint8 | (N,) | int64 | train, test | 
 | [imagenette](https://github.com/fastai/imagenette) | Smaller subset of 10 classes from Imagenet | (N, variable_height, variable_width, 3) | uint8  | (N,) | int64 | train, validation |
 | [mnist](http://yann.lecun.com/exdb/mnist/) | MNIST hand written digit image dataset | (N, 28, 28, 1) | uint8 | (N,) | int64 | train, test | 
-| [resisc45](https://arxiv.org/abs/1703.00121) | REmote Sensing Image Scene Classification | (N, 256, 256, 3) | uint8 | (N,) | int64 | train, validation, test | 
-| imagenet_adversarial | ILSVRC12 adversarial dataset from ResNet50 | (N, 224, 224, 3) | uint8 | (N,) | int64 | NA |
-| [xView](https://arxiv.org/pdf/1802.07856) | Objects in Context in Overhead Imagery | (N, variable_height, variable_width, 3) | uint8 | n/a | dict | train, test | 
+| [resisc45](https://arxiv.org/abs/1703.00121) | REmote Sensing Image Scene Classification | (N, 256, 256, 3) | float32 | (N,) | int64 | train, validation, test | 
+| [xView](https://arxiv.org/pdf/1802.07856) | Objects in Context in Overhead Imagery | (N, variable_height, variable_width, 3) | float32 | n/a | dict | train, test | 
 
 <br>
 
 ### Audio Datasets
 | Dataset    | Description | x_shape | x_dtype  | y_shape  | y_dtype | sampling_rate | splits |
 |:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:-------: |:------: |
 | [digit](https://github.com/Jakobovski/free-spoken-digit-dataset) | Audio dataset of spoken digits | (N, variable_length) | int64 | (N,) | int64 | 8 kHz | train, test |
-| [librispeech_dev_clean](http://www.openslr.org/12/) | Librispeech dev dataset for speaker identification  | (N, variable_length)  | int64 | (N,)  | int64 | 16 kHz | train, validation, test |
+| [librispeech](http://www.openslr.org/12/) | Librispeech dataset for automatic speech recognition  | (N, variable_length)  | float32 | (N,)  | bytes | 16 kHz | dev_clean, dev_other, test_clean, train_clean100 |
+| [librispeech_dev_clean](http://www.openslr.org/12/) | Librispeech dev dataset for speaker identification  | (N, variable_length)  | float32 | (N,)  | int64 | 16 kHz | train, validation, test |
+| [librispeech_dev_clean_asr](http://www.openslr.org/12) | Librispeech dev dataset for automatic speech recognition | (N, variable_length) | float32 | (N,) | bytes | 16 kHz | train, validation, test |
 
 <br>
 
 ### Video Datasets
 | Dataset    | Description | x_shape | x_dtype  | y_shape  | y_dtype | splits |
 |:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
-| [ucf101](https://www.crcv.ucf.edu/data/UCF101.php) | UCF 101 Action Recognition | (N, variable_frames, 240, 320, 3) | uint8 | (N,) | int64 | train, test |
+| [ucf101](https://www.crcv.ucf.edu/data/UCF101.php) | UCF 101 Action Recognition | (N, variable_frames, 240, 320, 3) | float32 | (N,) | int64 | train, test |
+
+<br>
+
+### Multimodal Datasets
+| Dataset    | Description | x_shape | x_dtype  | y_shape  | y_dtype | splits |
+|:----------: |:-----------: |:-------: |:--------: |:--------: |:-------: |:------: |
+| [so2sat](https://mediatum.ub.tum.de/1454690) | Co-registered synthetic aperture radar and multispectral optical images | (N, 32, 32, 14) | float32 | (N,) | int64 | train, validation |
 
 <br>
 
 ### Preprocessing
 
-Input-modifying preprocessing of datasets occurs as part of a model used within Armory. The cached
-datasets are preprocessed into tfrecords, however this preprocessing primarily consists of changing the
-representation of inputs, e.g. running pydub on flac audio files.
+Armory applies preprocessing to each convert each dataset to canonical form (e.g. normalize the range of values, set the data type).
+Any additional preprocessing that is desired should occur as part of the model under evaluation.
+
+Canonical preprocessing is not yet supported when `framework` is `tf` or `pytorch`.
 
 ### Splits
 

diff --git a/docs/metrics.md b/docs/metrics.md
@@ -17,13 +17,23 @@ is a JSON-able dict.
 | Name | Type | Description |
 |:-------: |:-------: |:-------: |
 | categorical_accuracy | Task | Categorical Accuracy |
-| top_5_categorical_accuracy | Task | Top-5 Categorical Accuracy |
 | object_detection_AP_per_class | Task | Average Precision @ IOU=0.5 |
+| top_n_categorical_accuracy | Task | Top-n Categorical Accuracy |
+| top_5_categorical_accuracy | Task | Top-5 Categorical Accuracy |
+| word_error_rate | Task | Word Error Rate |
+| image_circle_patch_diameter | Perturbation | Patch Diameter |
+| lp   | Perturbation | L-p norm |
 | linf | Perturbation | L-infinity norm |
 | l2 | Perturbation | L2 norm |
 | l1 | Perturbation | L1 norm |
 | l0 | Perturbation | L0 "norm" |
-| image_circle_patch_diameter | Perturbation | Patch Diameter |
+| mars_mean_l2 | Perturbation | Mean L2 norm across video stacks |
+| mars_mean_patch | Perturbation | Mean patch diameter across video stacks |
+| norm | Perturbation | L-p norm |
+| snr | Perturbation | Signal-to-noise ratio |
+| snr_db | Perturbation | Signal-to-noise ratio (decibels) |
+| snr_spectrogram | Perturbation | Signal-to-noise ratio of spectrogram |
+| snr_spectrogram_db | Perturbation | Signal-to-noise ratio of spectrogram (decibels) |
 
 <br>