Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for loading NeMo 2.0 checkpoints #412

Merged
merged 4 commits into from
Dec 7, 2024

Conversation

hemildesai
Copy link
Collaborator

@hemildesai hemildesai commented Nov 20, 2024

Depends on NVIDIA/NeMo#11452

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the Utils label Nov 20, 2024
@terrykong terrykong changed the title Add support for loading NeMo 2.0 checkpoints feat: Add support for loading NeMo 2.0 checkpoints Nov 20, 2024
@github-actions github-actions bot added the CI label Nov 21, 2024
@hemildesai hemildesai force-pushed the hemil/nemo-2-ckpt-support branch from 088548f to c60471b Compare December 2, 2024 22:34
@hemildesai hemildesai added CI Run CICD Set + un-set to retrigger and removed CI labels Dec 2, 2024
@hemildesai hemildesai force-pushed the hemil/nemo-2-ckpt-support branch from 9d3eec5 to 3ec4c14 Compare December 3, 2024 21:59
@hemildesai hemildesai added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 4, 2024
@hemildesai hemildesai force-pushed the hemil/nemo-2-ckpt-support branch from 3ec4c14 to 154d537 Compare December 5, 2024 23:57
@hemildesai hemildesai added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 6, 2024
@hemildesai hemildesai requested a review from terrykong December 6, 2024 00:05
@hemildesai hemildesai added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 6, 2024
@terrykong terrykong changed the base branch from main to dev December 6, 2024 20:45
@hemildesai hemildesai added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 6, 2024
@terrykong terrykong force-pushed the hemil/nemo-2-ckpt-support branch 3 times, most recently from 45ff6f9 to 1a50692 Compare December 6, 2024 21:08
@terrykong terrykong marked this pull request as ready for review December 6, 2024 21:11
@terrykong terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 6, 2024
terrykong
terrykong previously approved these changes Dec 6, 2024
@terrykong terrykong enabled auto-merge (squash) December 6, 2024 21:26
@terrykong terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 7, 2024
terrykong
terrykong previously approved these changes Dec 7, 2024
hemildesai and others added 2 commits December 7, 2024 01:43
Signed-off-by: Hemil Desai <hemild@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

adds the scaffolding of an e2e unit test

Signed-off-by: Terry Kong <terryk@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

e2e test

Signed-off-by: Hemil Desai <hemild@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Update paths

Signed-off-by: Hemil Desai <hemild@nvidia.com>

fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Update nemo commit

Signed-off-by: Hemil Desai <hemild@nvidia.com>

fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

move nemorun install

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the hemil/nemo-2-ckpt-support branch from 9dd4f3e to d8d2799 Compare December 7, 2024 01:43
@terrykong terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 7, 2024
Signed-off-by: Terry Kong <terryk@nvidia.com>
terrykong
terrykong previously approved these changes Dec 7, 2024
@terrykong terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 7, 2024
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 7, 2024
@terrykong terrykong merged commit cf14d1c into dev Dec 7, 2024
21 checks passed
@terrykong terrykong deleted the hemil/nemo-2-ckpt-support branch December 7, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Run CICD Set + un-set to retrigger Utils
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants