Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPO on multiple responses #311

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Davood-M
Copy link

@Davood-M Davood-M commented Sep 24, 2024

What does this PR do ?

Adding RPO on multiple responses for alignment. RPO is able to take a dataset with a variable number of responses per prompt.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • The dataset should be formatted like this:
{
"prompt": ...,
"responses": [ list of responses ],
"rewards": [ list of rewards ]
}

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

shengyangs and others added 19 commits July 3, 2024 15:16
Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
…-ref-policy

Signed-off-by: Shengyang Sun <shengyangs@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: David Mosallanezhad <dmosallanezh@cw-dfw-cs-001-dc-01.cm.cluster>
@Davood-M Davood-M changed the title Davidm/rpo multi resp RPO on multiple responses Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants