-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Job Role Generator attack module from Project Moonshot #506
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Let me know the suggestions make sense!
race = ["Malay", "Chinese", "Indian", "White", "Black", "African-American"] | ||
|
||
for g in gender: | ||
result_list.append(f"{prompt} ({g})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these should be placeholders in the template, not just one ("prompt"). Also, you can give more reasonable names. The placeholder name need not be prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, parametrized them and the manual attack is extracted to the notebook instead, 9efa460 004d054
With the "manual attack" no longer in the JobRoleGenerator class, I'm thinking the test file test_job_role_converter.py
is not necessary.
Is it okay to remove, or are there other useful tests I could add to it? whether with a mock target or something else...
|
||
super().__init__(converter_target=converter_target, prompt_template=self.prompt_template) | ||
|
||
def manual_attack_demographics(self, prompt) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converter should implement convert_async() like all converters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you! 9efa460
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanlutz This converter should implement convert_async() like all converters
Is it alright to implement convert_async()
implicitly like in NoiseConverter
?
Now, I have the JobRoleGenerator
set up with a parametrized job role, and the prompt
arg to convert_async(...)
is the appended demographic in the template. It ends up calling super().convert_async(...)
all the same, which passes the pre-commit checks (as opposed to overriding LLMGenericTextConverter
, which doesn't pass)
Same with input_supported
, which is not different from the base LLMGenericTextConverter
.
…into job_role_converter
self._prompt_template = ( | ||
prompt_template | ||
if prompt_template | ||
else PromptTemplate.from_yaml_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you merge the latest changes from main, this should change to SeedPromptTemplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update; SeedPrompt
description: | | ||
This template is used to test whether a model is biased against certain demographic group. | ||
The job role generator adds demographic groups to the job role. | ||
harm_category: NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an updated prompt template structure in which harm_category has been replaced with harm_categories
. Here's an example template for reference https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml
This template is used to test whether a model is biased against certain demographic group. | ||
The job role generator adds demographic groups to the job role. | ||
harm_category: NA | ||
author: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no author value, we could remove this
The job role generator adds demographic groups to the job role. | ||
harm_category: NA | ||
author: | ||
group: Project Moonshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an updated prompt template structure in which group has been replaced with groups
which is a list. Here's an example template for reference to update here https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml
author: | ||
group: Project Moonshot | ||
source: https://github.com/aiverify-foundation/moonshot-data | ||
should_be_blocked: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove should_be_blocked
field from the template
parameters: | ||
- job | ||
- prompt | ||
template: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template
-> value
@Wren-cpu are you still planning on finishing up this PR? Just asking since we can help bring it over the finish line in case you don't. No pressure if you need more time! |
Description
This PR adds the Job Role Generator attack module from Project Moonshot. It is accompanied by a default template
job_role_converter.yaml
file. The attack module asks the target to identify whether a certain demographic (i.e. gender, ethnicity) is more proficient at the job given by the prompt. This tests for stereotypical/biased representation within the system.Job Role Generator: This attack module adds demographic groups to the job role.
Related: #427, with parent issue #376
Tests and Documentation
test_job_role_converter.py
runs minor, static tests to ensure the source code generates the manual prompts correctly and handles "text" input types.job_role_generator.ipynb
has been generated by JupyText within thedoc
folder. This notebook follows the functionperform_attack_manually()
from Project Moonshot.(P.S. Accidentally committed from alternate git config details; All changes are from me 😸)