Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Job Role Generator attack module from Project Moonshot #506

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Wren-cpu
Copy link

Description

This PR adds the Job Role Generator attack module from Project Moonshot. It is accompanied by a default template job_role_converter.yaml file. The attack module asks the target to identify whether a certain demographic (i.e. gender, ethnicity) is more proficient at the job given by the prompt. This tests for stereotypical/biased representation within the system.

Job Role Generator: This attack module adds demographic groups to the job role.

Related: #427, with parent issue #376

Tests and Documentation

  • test_job_role_converter.py runs minor, static tests to ensure the source code generates the manual prompts correctly and handles "text" input types.
  • job_role_generator.ipynb has been generated by JupyText within the doc folder. This notebook follows the function perform_attack_manually() from Project Moonshot.

(P.S. Accidentally committed from alternate git config details; All changes are from me 😸)

@Wren-cpu
Copy link
Author

@microsoft-github-policy-service agree

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Let me know the suggestions make sense!

race = ["Malay", "Chinese", "Indian", "White", "Black", "African-American"]

for g in gender:
result_list.append(f"{prompt} ({g})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should be placeholders in the template, not just one ("prompt"). Also, you can give more reasonable names. The placeholder name need not be prompt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, parametrized them and the manual attack is extracted to the notebook instead,
9efa460
004d054

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, parametrized them and the manual attack is extracted to the notebook instead, 9efa460 004d054

With the "manual attack" no longer in the JobRoleGenerator class, I'm thinking the test file test_job_role_converter.py is not necessary.

Is it okay to remove, or are there other useful tests I could add to it? whether with a mock target or something else...

pyrit/datasets/prompt_converters/job_role_converter.yaml Outdated Show resolved Hide resolved

super().__init__(converter_target=converter_target, prompt_template=self.prompt_template)

def manual_attack_demographics(self, prompt) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converter should implement convert_async() like all converters

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you! 9efa460

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanlutz This converter should implement convert_async() like all converters

Is it alright to implement convert_async() implicitly like in NoiseConverter?

Now, I have the JobRoleGenerator set up with a parametrized job role, and the prompt arg to convert_async(...) is the appended demographic in the template. It ends up calling super().convert_async(...) all the same, which passes the pre-commit checks (as opposed to overriding LLMGenericTextConverter, which doesn't pass)

Same with input_supported, which is not different from the base LLMGenericTextConverter.

@Wren-cpu Wren-cpu marked this pull request as draft October 26, 2024 22:06
@Wren-cpu Wren-cpu marked this pull request as ready for review October 26, 2024 22:41
@nina-msft nina-msft linked an issue Oct 28, 2024 that may be closed by this pull request
self._prompt_template = (
prompt_template
if prompt_template
else PromptTemplate.from_yaml_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you merge the latest changes from main, this should change to SeedPromptTemplate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update; SeedPrompt

description: |
This template is used to test whether a model is biased against certain demographic group.
The job role generator adds demographic groups to the job role.
harm_category: NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an updated prompt template structure in which harm_category has been replaced with harm_categories. Here's an example template for reference https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml

This template is used to test whether a model is biased against certain demographic group.
The job role generator adds demographic groups to the job role.
harm_category: NA
author:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no author value, we could remove this

The job role generator adds demographic groups to the job role.
harm_category: NA
author:
group: Project Moonshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an updated prompt template structure in which group has been replaced with groups which is a list. Here's an example template for reference to update here https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml

author:
group: Project Moonshot
source: https://github.com/aiverify-foundation/moonshot-data
should_be_blocked: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove should_be_blocked field from the template

parameters:
- job
- prompt
template: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template -> value

@romanlutz
Copy link
Contributor

@Wren-cpu are you still planning on finishing up this PR? Just asking since we can help bring it over the finish line in case you don't. No pressure if you need more time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT Moonshot Attack Module: Job Role Generator
5 participants