Skip to content

Commit

Permalink
Yet another sleeper agent bug fix (#1886) (#1887)
Browse files Browse the repository at this point in the history
* correction to finding the poison indices

* fix variable name, also raise error instead of warning

* better noise values for dp instahide defense

* removing leftover line

* sleeper agent baseline results update

Co-authored-by: lcadalzo <39925313+lcadalzo@users.noreply.github.com>
Co-authored-by: swsuggs <15131284+swsuggs@users.noreply.github.com>
  • Loading branch information
3 people authored Feb 13, 2023
1 parent 562f150 commit f529d88
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 64 deletions.
5 changes: 3 additions & 2 deletions armory/scenarios/poison.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ def fit(self):
log.info(f"Training with {type(self.trainer)} Trainer defense...")
if self.fit_generator:
self.trainer.fit_generator(
self.data_generator, np_epochs=self.train_epochs
data_generator, np_epochs=self.train_epochs
)
else:
self.trainer.fit(
Expand Down Expand Up @@ -392,7 +392,8 @@ def load_fairness_metrics(self):
if explanatory_config:
self.explanatory_model = ExplanatoryModel.from_config(explanatory_config)
else:
log.warning(
# compute_fairness_metrics was true, but there is no explanatory config
raise ValueError(
"If computing fairness metrics, must specify 'explanatory_model' under 'adhoc'"
)

Expand Down
7 changes: 1 addition & 6 deletions armory/scenarios/poisoning_sleeper_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,18 +159,13 @@ def poison_dataset(self):

# Manually find the poison indices. Although the attack can return them, they
# will be the index within the target class, not the whole dataset.
# In addition, they may include images that aren't actually perturbed.
poison_index = np.array(
[
i
for i in range(len(self.x_clean))
if (self.x_clean[i] != self.x_poison[i]).all()
if (self.x_clean[i] != self.x_poison[i]).any()
]
)
n_target = (self.y_clean == self.target_class).sum()
log.info(
f"Actual amount of poison returned by attack: {len(poison_index)} samples or {len(poison_index)/n_target} percent"
)

else:
self.x_poison, self.y_poison, poison_index = (
Expand Down
105 changes: 51 additions & 54 deletions docs/baseline_results/cifar10_sleeper_agent_results.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,73 @@
# Cifar10 Sleeper Agent Baseline Evaluation

Results obtained using Armory 0.16.0
Results obtained using Armory 0.16.4

## Undefended
### Undefended

**accuracy_on_benign_test_data_source_class**
Mean of 3 runs

|Poison percentage |run1 |run2 |run3 |**mean** |std |
|------------------------------------------|------|------|------|------|------|
|0 |0.871 |0.881 |0.876 |**0.876** |0.004 |
|1 |0.882 |0.895 |0.88 |**0.886** |0.007 |
|5 |0.878 |0.893 |0.894 |**0.888** |0.007 |
|10 |0.879 |0.882 |0.881 |**0.881** |0.001 |
|20 |0.882 |0.871 |0.879 |**0.877** |0.005 |
|30 |0.88 |0.878 |0.87 |**0.876** |0.004 |
|50 |0.889 |0.881 |0.875 |**0.882** |0.006 |
| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
| ------- | ------- | ------- | ------- | ------- |
| 00 | 0.735 | 0.740 | - | - |
| 01 | 0.739 | 0.770 | 0.726 | 0.038 |
| 05 | 0.738 | 0.771 | 0.722 | 0.135 |
| 10 | 0.739 | 0.788 | 0.715 | 0.212 |
| 20 | 0.743 | 0.780 | 0.698 | 0.524 |
| 30 | 0.731 | 0.794 | 0.670 | 0.753 |


**accuracy_on_benign_test_data_all_classes**
### Random Filter

|Poison percentage |run1 |run2 |run3 |**mean** |std |
|------------------------------------------|------|------|------|------|------|
|0 |0.852 |0.849 |0.845 |**0.849** |0.003 |
|1 |0.852 |0.852 |0.849 |**0.851** |0.002 |
|5 |0.849 |0.851 |0.85 |**0.85** |0.001 |
|10 |0.852 |0.848 |0.843 |**0.848** |0.004 |
|20 |0.841 |0.847 |0.845 |**0.845** |0.002 |
|30 |0.843 |0.841 |0.84 |**0.841** |0.001 |
|50 |0.841 |0.84 |0.844 |**0.842** |0.001 |
Mean of 3 runs

| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
| ------- | ------- | ------- | ------- | ------- |
| 00 | 0.690 | 0.761 | - | - |
| 01 | 0.703 | 0.791 | 0.700 | 0.029 |
| 05 | 0.713 | 0.777 | 0.696 | 0.176 |
| 10 | 0.711 | 0.810 | 0.700 | 0.079 |
| 20 | 0.705 | 0.745 | 0.676 | 0.296 |
| 30 | 0.708 | 0.745 | 0.678 | 0.346 |

**attack_success_rate**

|Poison percentage |run1 |run2 |run3 |**mean** |std |
|------------------------------------------|------|------|------|------|------|
|0 |- |- |- |**-** |- |
|1 |0.99 |0.817 |0.077 |**0.628** |0.396 |
|5 |0.917 |0.913 |0.866 |**0.899** |0.023 |
|10 |0.979 |0.451 |0.586 |**0.672** |0.224 |
|20 |0.86 |0.987 |0.713 |**0.853** |0.112 |
|30 |0.617 |0.744 |0.868 |**0.743** |0.102 |
|50 |0.935 |0.598 |1 |**0.844** |0.176 |
### Perfect Filter

Mean of 3 runs

**accuracy_on_poisoned_test_data_all_classes**
| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
| ------- | ------- | ------- | ------- | ------- |
| 00 | 0.749 | 0.800 | - | - |
| 01 | 0.727 | 0.694 | 0.715 | 0.045 |
| 05 | 0.741 | 0.749 | 0.729 | 0.018 |
| 10 | 0.741 | 0.767 | 0.731 | 0.028 |
| 20 | 0.731 | 0.778 | 0.725 | 0.009 |
| 30 | 0.741 | 0.807 | 0.736 | 0.013 |

|Poison percentage |run1 |run2 |run3 |**mean** |std |
|------------------------------------------|------|------|------|------|------|
|0 |- |- |- |**-** |- |
|1 |0.764 |0.762 |0.761 |**0.762** |0.001 |
|5 |0.762 |0.762 |0.761 |**0.761** |0 |
|10 |0.764 |0.76 |0.754 |**0.759** |0.004 |
|20 |0.753 |0.76 |0.766 |**0.76** |0.005 |
|30 |0.755 |0.753 |0.753 |**0.754** |0.001 |
|50 |0.752 |0.752 |0.756 |**0.754** |0.002 |

### Activation Clustering

## Perfect Filter
Mean of 3 runs

Coming soon
| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
| ------- | ------- | ------- | ------- | ------- |
| 00 | 0.650 | 0.659 | - | - |
| 01 | 0.646 | 0.661 | 0.642 | 0.031 |
| 05 | 0.652 | 0.647 | 0.647 | 0.053 |
| 10 | 0.664 | 0.776 | 0.658 | 0.029 |
| 20 | 0.662 | 0.696 | 0.640 | 0.188 |
| 30 | 0.666 | 0.668 | 0.630 | 0.462 |


## Random Filter
### Spectral Signatures

Coming soon
Mean of 3 runs

| Poison Percentage | Benign all classes | Benign source class | Adv. all classes | Attack success rate |
| ------- | ------- | ------- | ------- | ------- |
| 00 | 0.684 | 0.738 | - | - |
| 01 | 0.675 | 0.768 | 0.671 | 0.044 |
| 05 | 0.668 | 0.660 | 0.656 | 0.098 |
| 10 | 0.676 | 0.694 | 0.664 | 0.131 |
| 20 | 0.661 | 0.709 | 0.632 | 0.356 |
| 30 | 0.656 | 0.729 | 0.625 | 0.387 |

## Activation Defense

Coming soon


## Spectral Signatures Defense

Coming soon
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
1
],
"noise": "laplacian",
"scale": 0.1
"scale": 0.015
},
"module": "art.defences.trainer",
"name": "DPInstaHideTrainer",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
1
],
"noise": "laplacian",
"scale": 0.1
"scale": 0.03
},
"module": "art.defences.trainer",
"name": "DPInstaHideTrainer",
Expand Down

0 comments on commit f529d88

Please sign in to comment.