FEAT: Job Role Generator attack module from Project Moonshot #506

Wren-cpu · 2024-10-26T19:08:29Z

Description

This PR adds the Job Role Generator attack module from Project Moonshot. It is accompanied by a default template job_role_converter.yaml file. The attack module asks the target to identify whether a certain demographic (i.e. gender, ethnicity) is more proficient at the job given by the prompt. This tests for stereotypical/biased representation within the system.

Job Role Generator: This attack module adds demographic groups to the job role.

Related: #427, with parent issue #376

Tests and Documentation

test_job_role_converter.py runs minor, static tests to ensure the source code generates the manual prompts correctly and handles "text" input types.

job_role_generator.ipynb has been generated by JupyText within the doc folder. This notebook follows the function perform_attack_manually() from Project Moonshot.

_{(P.S. Accidentally committed from alternate git config details; All changes are from me 😸)}

…into job_role_converter

Wren-cpu · 2024-10-26T19:08:40Z

@microsoft-github-policy-service agree

romanlutz

Thanks for the contribution!

Let me know the suggestions make sense!

romanlutz · 2024-10-26T19:48:52Z

pyrit/prompt_converter/job_role_converter.py

+        race = ["Malay", "Chinese", "Indian", "White", "Black", "African-American"]
+
+        for g in gender:
+            result_list.append(f"{prompt} ({g})")


Both of these should be placeholders in the template, not just one ("prompt"). Also, you can give more reasonable names. The placeholder name need not be prompt.

Alright, parametrized them and the manual attack is extracted to the notebook instead,
9efa460
004d054

Alright, parametrized them and the manual attack is extracted to the notebook instead, 9efa460 004d054

With the "manual attack" no longer in the JobRoleGenerator class, I'm thinking the test file test_job_role_converter.py is not necessary.

Is it okay to remove, or are there other useful tests I could add to it? whether with a mock target or something else...

pyrit/datasets/prompt_converters/job_role_converter.yaml

romanlutz · 2024-10-26T19:50:52Z

pyrit/prompt_converter/job_role_converter.py

+
+        super().__init__(converter_target=converter_target, prompt_template=self.prompt_template)
+
+    def manual_attack_demographics(self, prompt) -> list:


This converter should implement convert_async() like all converters

Okay, thank you! 9efa460

@romanlutz This converter should implement convert_async() like all converters

Is it alright to implement convert_async() implicitly like in NoiseConverter?

Now, I have the JobRoleGenerator set up with a parametrized job role, and the prompt arg to convert_async(...) is the appended demographic in the template. It ends up calling super().convert_async(...) all the same, which passes the pre-commit checks (as opposed to overriding LLMGenericTextConverter, which doesn't pass)

Same with input_supported, which is not different from the base LLMGenericTextConverter.

…into job_role_converter

rdheekonda · 2024-10-31T21:55:52Z

pyrit/prompt_converter/job_role_converter.py

+        self._prompt_template = (
+            prompt_template
+            if prompt_template
+            else PromptTemplate.from_yaml_file(


Once you merge the latest changes from main, this should change to SeedPromptTemplate

Update; SeedPrompt

rdheekonda · 2024-10-31T21:58:28Z

pyrit/datasets/prompt_converters/job_role_converter.yaml

+description: |
+  This template is used to test whether a model is biased against certain demographic group.
+  The job role generator adds demographic groups to the job role.
+harm_category: NA


We have an updated prompt template structure in which harm_category has been replaced with harm_categories. Here's an example template for reference https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml

rdheekonda · 2024-10-31T21:59:39Z

pyrit/datasets/prompt_converters/job_role_converter.yaml

+  This template is used to test whether a model is biased against certain demographic group.
+  The job role generator adds demographic groups to the job role.
+harm_category: NA
+author:


If there is no author value, we could remove this

rdheekonda · 2024-10-31T22:00:15Z

pyrit/datasets/prompt_converters/job_role_converter.yaml

+  The job role generator adds demographic groups to the job role.
+harm_category: NA
+author:
+group: Project Moonshot


We have an updated prompt template structure in which group has been replaced with groups which is a list. Here's an example template for reference to update here https://github.com/Azure/PyRIT/blob/main/pyrit/datasets/orchestrators/tree_of_attacks/red_teaming_system_prompt.yaml

rdheekonda · 2024-10-31T22:00:41Z

pyrit/datasets/prompt_converters/job_role_converter.yaml

+author:
+group: Project Moonshot
+source: https://github.com/aiverify-foundation/moonshot-data
+should_be_blocked: true


We could remove should_be_blocked field from the template

rdheekonda · 2024-10-31T22:00:54Z

pyrit/datasets/prompt_converters/job_role_converter.yaml

+parameters:
+  - job
+  - prompt
+template: |


template -> value

romanlutz · 2024-11-13T04:00:07Z

@Wren-cpu are you still planning on finishing up this PR? Just asking since we can help bring it over the finish line in case you don't. No pressure if you need more time!

Kawaritai and others added 4 commits October 27, 2024 04:48

Adapt job role generator from Project Moonshot

60b3899

Adapt job role generator from Project Moonshot

2af1e54

Merge branch 'job_role_converter' of https://github.com/Wren-cpu/PyRIT …

a72edf2

…into job_role_converter

Add basic tests for job role converter

7cbe77e

romanlutz reviewed Oct 26, 2024

View reviewed changes

Wren-cpu added 3 commits October 27, 2024 07:15

Fix typos in template file.

4566935

Implement convert_async and parametrize job/demographic.

9efa460

Update notebook with manual attack logic

004d054

Wren-cpu marked this pull request as draft October 26, 2024 22:06

Wren-cpu added 5 commits October 27, 2024 09:31

Fix pre-commit checks and simplify convert_async implementation

e9f4bb2

Fix docstrings in attack module

275e610

Fix docstring comments; Remove initial test file

6f4ca22

Merge branch 'job_role_converter' of https://github.com/Wren-cpu/PyRIT …

771e9b3

…into job_role_converter

Remove unused class field in job_role_converter.py

d074d68

Wren-cpu marked this pull request as ready for review October 26, 2024 22:41

Remove duplicate input_supported(...)

616e943

nina-msft linked an issue Oct 28, 2024 that may be closed by this pull request

FEAT Moonshot Attack Module: Job Role Generator #427

Open

rdheekonda reviewed Oct 31, 2024

View reviewed changes

pyrit/datasets/prompt_converters/job_role_converter.yaml

parameters:

- job

- prompt

template: |

Copy link

Contributor

rdheekonda Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template -> value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Job Role Generator attack module from Project Moonshot #506

FEAT: Job Role Generator attack module from Project Moonshot #506

Wren-cpu commented Oct 26, 2024

Wren-cpu commented Oct 26, 2024

romanlutz left a comment

romanlutz Oct 26, 2024

Wren-cpu Oct 26, 2024

Wren-cpu Oct 26, 2024

romanlutz Oct 26, 2024

Wren-cpu Oct 26, 2024

Wren-cpu Oct 26, 2024

rdheekonda Oct 31, 2024

rlundeen2 Nov 2, 2024

rdheekonda Oct 31, 2024

rdheekonda Oct 31, 2024

rdheekonda Oct 31, 2024

rdheekonda Oct 31, 2024

rdheekonda Oct 31, 2024

romanlutz commented Nov 13, 2024


		super().__init__(converter_target=converter_target, prompt_template=self.prompt_template)

		def manual_attack_demographics(self, prompt) -> list:

FEAT: Job Role Generator attack module from Project Moonshot #506

Are you sure you want to change the base?

FEAT: Job Role Generator attack module from Project Moonshot #506

Conversation

Wren-cpu commented Oct 26, 2024

Description

Tests and Documentation

Wren-cpu commented Oct 26, 2024

romanlutz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz commented Nov 13, 2024