Yet another sleeper agent bug fix (#1886) (#1887)

* correction to finding the poison indices * fix variable name, also raise error instead of warning * better noise values for dp instahide defense * removing leftover line * sleeper agent baseline results update Co-authored-by: lcadalzo <39925313+lcadalzo@users.noreply.github.com> Co-authored-by: swsuggs <15131284+swsuggs@users.noreply.github.com>
twosixlabs · Feb 13, 2023 · f529d88 · f529d88
1 parent 562f150
commit f529d88
Show file tree

Hide file tree

Showing 5 changed files with 57 additions and 64 deletions.
diff --git a/armory/scenarios/poison.py b/armory/scenarios/poison.py
@@ -340,7 +340,7 @@ def fit(self):
                 log.info(f"Training with {type(self.trainer)} Trainer defense...")
                 if self.fit_generator:
                     self.trainer.fit_generator(
-                        self.data_generator, np_epochs=self.train_epochs
+                        data_generator, np_epochs=self.train_epochs
                     )
                 else:
                     self.trainer.fit(
@@ -392,7 +392,8 @@ def load_fairness_metrics(self):
         if explanatory_config:
             self.explanatory_model = ExplanatoryModel.from_config(explanatory_config)
         else:
-            log.warning(
+            # compute_fairness_metrics was true, but there is no explanatory config
+            raise ValueError(
                 "If computing fairness metrics, must specify 'explanatory_model' under 'adhoc'"
             )
 

diff --git a/armory/scenarios/poisoning_sleeper_agent.py b/armory/scenarios/poisoning_sleeper_agent.py
@@ -159,18 +159,13 @@ def poison_dataset(self):
 
             # Manually find the poison indices.  Although the attack can return them, they
             # will be the index within the target class, not the whole dataset.
-            # In addition, they may include images that aren't actually perturbed.
             poison_index = np.array(
                 [
                     i
                     for i in range(len(self.x_clean))
-                    if (self.x_clean[i] != self.x_poison[i]).all()
+                    if (self.x_clean[i] != self.x_poison[i]).any()
                 ]
             )
-            n_target = (self.y_clean == self.target_class).sum()
-            log.info(
-                f"Actual amount of poison returned by attack: {len(poison_index)} samples or {len(poison_index)/n_target} percent"
-            )
 
         else:
             self.x_poison, self.y_poison, poison_index = (

diff --git a/docs/baseline_results/cifar10_sleeper_agent_results.md b/docs/baseline_results/cifar10_sleeper_agent_results.md
@@ -1,76 +1,73 @@
 # Cifar10 Sleeper Agent Baseline Evaluation
 
-Results obtained using Armory 0.16.0
+Results obtained using Armory 0.16.4
 
-## Undefended
+### Undefended
 
-**accuracy_on_benign_test_data_source_class**
+Mean of 3 runs
 
-|Poison percentage                         |run1  |run2  |run3  |**mean**  |std   |
-|------------------------------------------|------|------|------|------|------|
-|0                                         |0.871 |0.881 |0.876 |**0.876** |0.004 |
-|1                                         |0.882 |0.895 |0.88  |**0.886** |0.007 |
-|5                                         |0.878 |0.893 |0.894 |**0.888** |0.007 |
-|10                                        |0.879 |0.882 |0.881 |**0.881** |0.001 |
-|20                                        |0.882 |0.871 |0.879 |**0.877** |0.005 |
-|30                                        |0.88  |0.878 |0.87  |**0.876** |0.004 |
-|50                                        |0.889 |0.881 |0.875 |**0.882** |0.006 |
+| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
+| ------- | ------- | ------- | ------- | ------- |
+| 00 |  0.735 | 0.740 | - | - |
+| 01 |  0.739 | 0.770 | 0.726 | 0.038 |
+| 05 |  0.738 | 0.771 | 0.722 | 0.135 |
+| 10 |  0.739 | 0.788 | 0.715 | 0.212 |
+| 20 |  0.743 | 0.780 | 0.698 | 0.524 |
+| 30 |  0.731 | 0.794 | 0.670 | 0.753 |
 
 
-**accuracy_on_benign_test_data_all_classes**
+### Random Filter
 
-|Poison percentage                         |run1  |run2  |run3  |**mean**  |std   |
-|------------------------------------------|------|------|------|------|------|
-|0                                         |0.852 |0.849 |0.845 |**0.849** |0.003 |
-|1                                         |0.852 |0.852 |0.849 |**0.851** |0.002 |
-|5                                         |0.849 |0.851 |0.85  |**0.85**  |0.001 |
-|10                                        |0.852 |0.848 |0.843 |**0.848** |0.004 |
-|20                                        |0.841 |0.847 |0.845 |**0.845** |0.002 |
-|30                                        |0.843 |0.841 |0.84  |**0.841** |0.001 |
-|50                                        |0.841 |0.84  |0.844 |**0.842** |0.001 |
+Mean of 3 runs
 
+| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
+| ------- | ------- | ------- | ------- | ------- |
+| 00 |  0.690 | 0.761 | - | - |
+| 01 |  0.703 | 0.791 | 0.700 | 0.029 |
+| 05 |  0.713 | 0.777 | 0.696 | 0.176 |
+| 10 |  0.711 | 0.810 | 0.700 | 0.079 |
+| 20 |  0.705 | 0.745 | 0.676 | 0.296 |
+| 30 |  0.708 | 0.745 | 0.678 | 0.346 |
 
-**attack_success_rate**
 
-|Poison percentage                         |run1  |run2  |run3  |**mean**  |std   |
-|------------------------------------------|------|------|------|------|------|
-|0                                         |-     |-     |-     |**-**     |-     |
-|1                                         |0.99  |0.817 |0.077 |**0.628** |0.396 |
-|5                                         |0.917 |0.913 |0.866 |**0.899** |0.023 |
-|10                                        |0.979 |0.451 |0.586 |**0.672** |0.224 |
-|20                                        |0.86  |0.987 |0.713 |**0.853** |0.112 |
-|30                                        |0.617 |0.744 |0.868 |**0.743** |0.102 |
-|50                                        |0.935 |0.598 |1     |**0.844** |0.176 |
+### Perfect Filter
 
+Mean of 3 runs
 
-**accuracy_on_poisoned_test_data_all_classes**
+| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
+| ------- | ------- | ------- | ------- | ------- |
+| 00 |  0.749 | 0.800 | - | - |
+| 01 |  0.727 | 0.694 | 0.715 | 0.045 |
+| 05 |  0.741 | 0.749 | 0.729 | 0.018 |
+| 10 |  0.741 | 0.767 | 0.731 | 0.028 |
+| 20 |  0.731 | 0.778 | 0.725 | 0.009 |
+| 30 |  0.741 | 0.807 | 0.736 | 0.013 |
 
-|Poison percentage                         |run1  |run2  |run3  |**mean**  |std   |
-|------------------------------------------|------|------|------|------|------|
-|0                                         |-     |-     |-     |**-**     |-     |
-|1                                         |0.764 |0.762 |0.761 |**0.762** |0.001 |
-|5                                         |0.762 |0.762 |0.761 |**0.761** |0     |
-|10                                        |0.764 |0.76  |0.754 |**0.759** |0.004 |
-|20                                        |0.753 |0.76  |0.766 |**0.76**  |0.005 |
-|30                                        |0.755 |0.753 |0.753 |**0.754** |0.001 |
-|50                                        |0.752 |0.752 |0.756 |**0.754** |0.002 |
 
+### Activation Clustering
 
-## Perfect Filter
+Mean of 3 runs
 
-Coming soon
+| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
+| ------- | ------- | ------- | ------- | ------- |
+| 00 |  0.650 | 0.659 | - | - |
+| 01 |  0.646 | 0.661 | 0.642 | 0.031 |
+| 05 |  0.652 | 0.647 | 0.647 | 0.053 |
+| 10 |  0.664 | 0.776 | 0.658 | 0.029 |
+| 20 |  0.662 | 0.696 | 0.640 | 0.188 |
+| 30 |  0.666 | 0.668 | 0.630 | 0.462 |
 
 
-## Random Filter
+### Spectral Signatures
 
-Coming soon
+Mean of 3 runs
 
+| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
+| ------- | ------- | ------- | ------- | ------- |
+| 00 |  0.684 | 0.738 | - | - |
+| 01 |  0.675 | 0.768 | 0.671 | 0.044 |
+| 05 |  0.668 | 0.660 | 0.656 | 0.098 |
+| 10 |  0.676 | 0.694 | 0.664 | 0.131 |
+| 20 |  0.661 | 0.709 | 0.632 | 0.356 |
+| 30 |  0.656 | 0.729 | 0.625 | 0.387 |
 
-## Activation Defense
-
-Coming soon
-
-
-## Spectral Signatures Defense
-
-Coming soon
diff --git a/scenario_configs/eval6/poisoning/audio_dlbd/baseline_defenses/audio_p10_dpinstahide.json b/scenario_configs/eval6/poisoning/audio_dlbd/baseline_defenses/audio_p10_dpinstahide.json
@@ -51,7 +51,7 @@
                 1
             ],
             "noise": "laplacian",
-            "scale": 0.1
+            "scale": 0.015
         },
         "module": "art.defences.trainer",
         "name": "DPInstaHideTrainer",

diff --git a/...val6/poisoning/sleeper_agent/baseline_defenses/cifar10_sleeper_agent_p10_dpinstahide.json b/...val6/poisoning/sleeper_agent/baseline_defenses/cifar10_sleeper_agent_p10_dpinstahide.json
@@ -72,7 +72,7 @@
                 1
             ],
             "noise": "laplacian",
-            "scale": 0.1
+            "scale": 0.03
         },
         "module": "art.defences.trainer",
         "name": "DPInstaHideTrainer",