-
Notifications
You must be signed in to change notification settings - Fork 129
Internals
Understanding how Counterfit interacts with a target and a backend framework is crucial to having success and repeatable assessments. This section covers the necessary information for objects, properties, and data flows.
A target is a user-created class that is the interface between a target model and the attacks included in a framework. Microsoft Counterfit implements the user interface this way to handle the wide variety of ways a user could interact with a target model. There is no universal interface to a machine learning endpoint: Counterfit can interact with a model on disk or hosted behind an API. To provide a clean interface, there are some strict requirements imposed on a target class. The requirements can seem restricting at first, but once you understand how Counterfit interacts with backend frameworks through this interface, attacking new targets will become a breeze.
There are two ways to create a new target, either by executing the new command and following the instructions, or simply hacking together a new target from an existing one. On start targets are loaded from a targets folder defined in counterfit/core/config.py, which by default is counterfit/targets. Each target module is contained in its own target folder. This folder is used to store any resources the target may need such as data, models, other python files, etc.
A target class should inherit from the chosen frameworks baseclass. This baseclass acts as an interface to ensure all required information and methods to successfully build, manage, and run attacks for the framework are present. Inside the target class, a user needs to set a few meta-properties and compose two functions. These properties and functions are the same for all targets in Counterfit.
The required properties for a target consist of the following items,
Property | Description |
---|---|
model_name |
Should be unique among all targets. This is used to uniquely identify a target model within Counterfit. For example, list targets . |
model_data_type |
The type of data the target model uses. This is used to attach the relevant attacks to a given target. Models for which model_data_type is text will be compatible with attacks from the TextAttack framework. Adversarial Robustness Toolbox works with numpy and image data types. |
model_endpoint |
API route or model file location where Counterfit will collect outputs. This may be used in the __init__ function to load a model file (when referring to a filename), or the __call__ function during an attack to interact with an API (when referring to an API route). |
model_input_shape |
The shape of the input to a target model. Backend frameworks use this to understand the shape of the sample. The __call__ function expects a batch of inputs of this shape, e.g., an input of size (batch_size, ) + model_input_shape , where batch_size is typically just 1. |
model_output_classes |
A list of all output labels returned by the __call__ function. This is used by Counterfit's outputs_to_labels function to convert numerical outputs to labels that you define. It helps Counterift know whether an attack has been successflu or not. |
X |
The sample data, which is of shape (N, ) + model_input_shape , where N is the number of samples you included in the target definition. |
The required method for a target consists of the following items,
Method | Description |
---|---|
__init__(self) |
This function should load models and load and process input data. |
__call__(self, x) |
This function is the primary interface between a target model and an attack algorithm. An attack algorithm uses this function to submit inputs via x and collect the output via the return value. This function must return a list of probabilities for each input sample. That is, for each row in x there should be an output row of the form [prob_class_0, prob_class_1, ..., prob_class_2] . |
At the top of the file are standard module imports and the selected Counterfit framework. Targets will be loaded only if they inherit from a correct baseclass. There are no limits to what you can import, however, they are loaded on start, so heavy ML libraries could slow down the start process.
import requests
import numpy as np
from counterfit.core.interfaces import ArtTarget
Next, the target class is created, and the required properties are defined.
class CatClassifier(ArtTarget):
model_name = 'catclassifier'
model_data_type = 'image'
model_endpoint = "http://contoso.ai/predict"
model_input_shape = (3, 256, 256)
model_output_classes = ["cat", "not_a_cat"]
X = []
Property | Description |
---|---|
model_name | A unique name. Counterfit references each target by name. All logs and results will be stored in this class. |
model_data_type | The cat classifier target, hosted at http://contoso.ai/predict requires pictures of cats. The model_data_type reflects the input data should be an image. Counterfit uses this field to process and reshape data correctly. |
model_endpoint | This is where Counterfit will collect outputs from the target. This is used in the __call__ function during an attack. |
model_input_shape | This is the shape of our sample data. It is important to note that this is not necessarily the shape of the target model input, but rather the shape of the sample data. |
model_output_classes | Are the possible classes for the samples. The image is either a cat, or not a cat. |
X | Are the sample data. This will be populated in the init function. |
Next, the __init__
function should load the required resources. This function is not called until you interact with a target. Sample data should be loaded into self.X
as a list of lists, arrays, or vectors. Sample selection for both targeted and untargeted attacks are set by referencing an index of this list. As noted earlier, it is important that the shape of an input sample matches the model_input_shape
. There are no limits to what you can do inside __init__
, load models, process samples, execute functions written elsewhere, etc.
def __init__(self):
self.X = [[x1], [x2], [x3], ...]
Finally, the __call__
function. This function is used by an attack algorithm to send a query to the model_endpoint
. x
is the sample the algorithm has provided and is of shape (1,) + model_input_shape
, or ((1, 3, 256, 256))
. Conventionally, ML frameworks use "batches" of inputs, and it is best practice for the __call__
function to include handle an entire batch, e.g., sending each sample in the batch to an API that may handle only a single query at a time. However, since most attacks in Counterfit do not require a batch size greater than one, in this example, we'll use x[0]
to reference the first sample in the batch.
def __call__(self, x):
sample = x[0].tolist()
response = requests.post(self.endpoint, data={"input": sample})
results = response.json()
cat_proba = results["confidence"]
not_a_cat_proba = 1-cat_proba
return [cat_proba, not_a_cat_proba]
__call__
function MUST return a list of probabilities that is the same length and in the same order as model_output_classes
. Backend frameworks use both during attack runtime and if they are not the same length, an error will be thrown. If the ordering of the returned list of probabilities is incorrect, the attack will alter the input incorrectly. There are no limits to what you can do inside __call__
, this includes reshaping arrays to images, executing webhooks, or additional logging. Learn more about the flexibility of Counterfit targets in [Advanced Use]
Note: Pay attention to the channels of an image. By default, backend framework wrappers and Counterfit are configured to use channels first rather than last, (3, 256, 256) vs (256, 256, 3). This can be overridden by adding self.channels_first=False
to the target class.
import requests
import numpy as np
from counterfit.core.interfaces import ArtTarget
class CatClassifier(ArtTarget):
model_name = 'catclassifier'
model_data_type = 'image'
model_endpoint = "http://contoso.ai/predict"
model_input_shape = (3, 256, 256)
model_output_classes = ["cat", "not_a_cat"]
X = []
def __init__(self):
self.X = [[x1], [x2], [x3], ...]
def __call__(self, x):
sample = x[0].tolist()
response = requests.post(self.endpoint, data={"input": sample})
results = response.json()
cat_proba = results["confidence"]
not_a_cat_proba = 1-cat_proba
return [cat_proba, not_a_cat_proba]
Counterfit uses existing adversarial ML frameworks for attack algorithms. Some of them are heavy to load and on start, makes for a slow experience. Instead, Counterfit loads a framework when requested with the load command. Each framework has its own baseclass that handles the information coming from a target class. You will notice not all attacks are in each framework – some are missing because they are Whitebox; others are missing due to incompatibility with Counterfit.
Counterfit includes a number of blackbox attacks against text models from the TextAttack framework. These attacks have no parameters and the user need only set a target_sample
when running an attack. TextAttack requires that self.X
be a list of sentences to be used as input to a model.
When implementing a target for Textattack, please note that TextAttack currently expects that model_output_classes
to be a list of ordered integers beginning at 0 (e.g., [0, 1, 2]) rather than a list of labels (e.g., ['cat', 'dog', 'horse']). This is because it uses the class label as an index.
Counterfit includes a number of blackbox evasion attacks suitable for targets of the 'numpy' or 'image' data type using the Adversarial Robustness Toolbox (ART) . ART expects self.X
to be a list of lists, that is, each row in the list corresponds to an input sample, and each input sample is a list of numbers or images. Thus, it's typical that self.X
is an array of dimensions (N, dim) (for numpy), (N, channels, height, width) for image with channels_first=True) (default), or (N, height, width, channels) for image with channels_first=False.
ART attacks have parameters that may be set to adjust how the algorithm interacts with a target model. Detailing the parameters for each algorithm is out of the scope of this document. Users may use show info
to learn more about an attack algorithm and its parameters.
creditfraud>hop_skip_jump> show info --attack hop_skip_jump
Attack Information
-----------------------------------------------------------------------------------------------------------
attack name hop_skip_jump
attack type evasion
attack category blackbox
attack tags ['image', 'numpy']
attack framework art
attack docs Implementation of the HopSkipJump attack from Jianbo et al. (2019). This is a
powerful black-box attack that only requires final class prediction, and is an
advanced version of the boundary attack. | Paper link:
https://arxiv.org/abs/1904.02144
Attack Parameter (type) Default
---------------------------------------
targeted (bool) False
norm (int) 2
max_iter (int) 50
max_eval (int) 10000
init_eval (int) 100
init_size (int) 100
sample_index (int) 0
target_class (int) 0
In this case, the help shows that more information about hop_skip_jump
can be gleaned by reading the academic paper .
Note that sample_index
and target_class
are properties of all attacks. In particular target_class
may only be used by some algorithms that support a targeted attack, and in cases where targeted is set to be True
.
Commands in Counterfit provide the functionality that allow objects to interact. The commands are structured to provides a similar workflow to other offensive security tools, where you typically interact with one target at a time and execute actions against that target. Though, thanks to cmd2, the ability to script actions against multiple targets is there – to drop into a scripting environment run ipy from the terminal.
Counterfit keeps a state that keeps track of all objects available in the session. A command can access these objects by importing CFState from counterfit.core.state and accessing objects by querying the state via CFState.get_instance(). Commands use cmd2 for command categorization and argparse for argument handling. For example, the interact command.
import argparse
import cmd2
from core.state import CFState
parser = argparse.ArgumentParser()
parser.add_argument("target", choices=CFState.get_instance().loaded_targets.keys())
@cmd2.with_argparser(parser)
@cmd2.with_category("Counterfit Commands")
def do_interact(self, args):
"""Sets the active target."""
CFState.get_instance().set_active_target(args.target)
Adding a new command is simple. Create a new file in the counterfit/core/commands/ folder. Set up the command structure,
import argparse
import cmd2
from core.state import CFState
parser = argparse.ArgumentParser()
parser.add_argument(…)
You could change the category or keep it the same. Changing the category will cause the command to display separately from Counterfit commands. Next, write the function and use the objects to provide information or change the state.
@cmd2.with_argparser(parser)
@cmd2.with_category("Custom Commands")
def do_thing(self, args):
"""Do things with active target."""
active_target = CFState.get_instance().active_target
print(active_target.model_name)
While attacking targets is fun, an attack comes after the target has been written by the user. Because this is something of a development process, there are some convenience commands that will make life a little easier when writing new targets.
Command | Description |
---|---|
new | This command will create a new target in the targets folder, and then load it into the session. |
reload | When editing a target, this command will reload the target to reflect the changes made. |
predict | Send a single query to the target model. |
back | Exit the active attack or active target. |
For example, the target creation workflow is as follows, execute new
to create a fresh target, open the new target python file in your favorite code editor, make changes to the code and execute reload
. Use the predict
command to ensure inputs and outputs are as expected.
These commands gather and present relevant information about the current session, and relevant information about targets and attacks.
Command | Description |
---|---|
list | This command prints loaded objects in the session |
show | When editing a target, this command will reload the target to reflect the changes made. |