Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical Test template for support Grok/GPT via PAXML #141

Merged
merged 24 commits into from
Aug 2, 2024

Conversation

srivatsankrishnan
Copy link
Collaborator

@srivatsankrishnan srivatsankrishnan commented Jul 9, 2024

Summary

Modification to Jaxtoolbox to support hierarchical test template to support both GPT and Grok models via PAXML.

Test Plan

Internal testing and pytests

Additional Notes

Needs additional testing on same container. XLA Flags seems to have changed. It requires debugging to make Grok working on the same container.

@srivatsankrishnan srivatsankrishnan changed the title Draft: Hierarchical Test template for support Grok/GPT via PAXML Hierarchical Test template for support Grok/GPT via PAXML Jul 15, 2024
@TaekyungHeo TaekyungHeo added the enhancement New feature or request label Jul 24, 2024
TaekyungHeo
TaekyungHeo previously approved these changes Jul 24, 2024
@TaekyungHeo
Copy link
Member

Approved. Let's see if @amaslenn has any comments and wait for @srinivas212 to merge this.

@amaslenn, this PR supports two different subtests in the JAX Toolbox test templates. This allows us to support Grok and GPT simultaneously in a hierarchical manner. The related PR is https://github.com/Mellanox/cloudaix/pull/2.

Copy link
Contributor

@amaslenn amaslenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR comes with zero tests, yet changes a lot. Please add unit tests.

src/cloudai/_core/test_template.py Outdated Show resolved Hide resolved
amaslenn
amaslenn previously approved these changes Jul 25, 2024
TaekyungHeo
TaekyungHeo previously approved these changes Jul 30, 2024
@srinivas212 srinivas212 merged commit 570181a into NVIDIA:main Aug 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants