feat: DPO support for global padding of seq_len to a multiple #386

terrykong · 2024-11-07T19:38:22Z

What does this PR do ?

adds pad_to_multiple_of for DPO which is a requirement for sequence parallel (which is required for moe models w/ TP)
- the argument pad_length_to_multiple_of will pad all minibatches to the same length if >0. If ==0, the behavior is the same as before.

Needed for:

sequence parallel
mamba

Rebase stack

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

nemo_aligner/algorithms/dpo.py

examples/nlp/gpt/conf/gpt_dpo.yaml

trias702

LGTM, just a few minor questions. Also, to confirm, this PR just lets use pad DPO sequences to a certain multiple, this is not the PR which adds support to DPO for sequence parallelism, correct?

examples/nlp/gpt/conf/gpt_dpo.yaml

examples/nlp/gpt/train_gpt_dpo.py

trias702

Just one added question

examples/nlp/gpt/conf/gpt_dpo.yaml

nemo_aligner/data/nlp/builders.py

trias702

Sound

Signed-off-by: Terry Kong <terryk@nvidia.com> dpo pad fix if none Signed-off-by: Terry Kong <terryk@nvidia.com> rm variable_seq_len && fix comment on pad_multiple Signed-off-by: Terry Kong <terryk@nvidia.com> rm not resolver Signed-off-by: Terry Kong <terryk@nvidia.com> typo Signed-off-by: Terry Kong <terryk@nvidia.com>

…#386) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: abukharin <abukharin@nvidia.com>

github-actions bot added the Algorithms label Nov 7, 2024

terrykong added the Run CICD Set + un-set to retrigger label Nov 7, 2024

terrykong requested review from trias702, arendu and ashors1 November 7, 2024 19:45

This was referenced Nov 7, 2024

feat: support for mcore optimizer (to enable MoE) #380

Open

feat: adds helper script for running & summarizing functional tests #387

Merged

chore: adds missing license header on tests and corrects file modes #388

Merged

terrykong force-pushed the tk/dpo-pad-to-multiple branch from 55a2d73 to 6e3c586 Compare November 7, 2024 20:13

terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Nov 7, 2024

ashors1 reviewed Nov 12, 2024

View reviewed changes

nemo_aligner/algorithms/dpo.py Show resolved Hide resolved

ashors1 reviewed Nov 12, 2024

View reviewed changes

examples/nlp/gpt/conf/gpt_dpo.yaml Outdated Show resolved Hide resolved

trias702 reviewed Nov 12, 2024

View reviewed changes

examples/nlp/gpt/conf/gpt_dpo.yaml Outdated Show resolved Hide resolved

examples/nlp/gpt/train_gpt_dpo.py Show resolved Hide resolved

terrykong force-pushed the tk/dpo-pad-to-multiple branch 2 times, most recently from 9079d4b to cbf87d8 Compare November 14, 2024 22:18

terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Nov 14, 2024

trias702 reviewed Nov 14, 2024

View reviewed changes

examples/nlp/gpt/conf/gpt_dpo.yaml Outdated Show resolved Hide resolved

nemo_aligner/data/nlp/builders.py Show resolved Hide resolved

terrykong force-pushed the tk/dpo-pad-to-multiple branch from cbf87d8 to 32da0a9 Compare November 14, 2024 23:04

trias702 previously approved these changes Nov 14, 2024

View reviewed changes

ashors1 previously approved these changes Nov 15, 2024

View reviewed changes

terrykong dismissed stale reviews from ashors1 and trias702 via 1f4f3e6 November 15, 2024 00:24

terrykong force-pushed the tk/dpo-pad-to-multiple branch from 32da0a9 to 1f4f3e6 Compare November 15, 2024 00:24

terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Nov 15, 2024

ashors1 approved these changes Nov 15, 2024

View reviewed changes

terrykong enabled auto-merge (squash) November 15, 2024 00:41

terrykong merged commit e2b4b3f into main Nov 15, 2024
17 checks passed

terrykong deleted the tk/dpo-pad-to-multiple branch November 15, 2024 00:47

abukharin3 pushed a commit to abukharin3/NeMo-Aligner that referenced this pull request Nov 22, 2024

feat: DPO support for global padding of seq_len to a multiple (NVIDIA…

06a3ca4

…#386) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: abukharin <abukharin@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: DPO support for global padding of seq_len to a multiple #386

feat: DPO support for global padding of seq_len to a multiple #386

terrykong commented Nov 7, 2024 •

edited

Loading

trias702 left a comment

trias702 left a comment

trias702 left a comment

feat: DPO support for global padding of seq_len to a multiple #386

feat: DPO support for global padding of seq_len to a multiple #386

Conversation

terrykong commented Nov 7, 2024 • edited Loading

What does this PR do ?

Rebase stack

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

trias702 left a comment

Choose a reason for hiding this comment

trias702 left a comment

Choose a reason for hiding this comment

trias702 left a comment

Choose a reason for hiding this comment

terrykong commented Nov 7, 2024 •

edited

Loading