Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nemotron model via PAXML to CloudAI + optimization for large GPU runs #171

Merged
merged 33 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
2770e9a
Seperate Slurm comand gen class for GPT in jaxtoolbox
srivatsankrishnan Aug 13, 2024
dbc4841
Seperate Slurm command gen class for Grok in jaxtoolbox
srivatsankrishnan Aug 13, 2024
4aa27ad
Create a selection strategy method in base class to select beteen GPT…
srivatsankrishnan Aug 13, 2024
802c556
Make conditional import to avoid circulaar imports for pytest
srivatsankrishnan Aug 13, 2024
487a5c5
Adding support for enabling registry with multiple strategies.
srivatsankrishnan Aug 14, 2024
7cb10b5
Revert "Seperate Slurm command gen class for Grok in jaxtoolbox"
srivatsankrishnan Aug 14, 2024
fc0fb31
Add the grok command gen class back
srivatsankrishnan Aug 14, 2024
1a0bb4c
remove gork slurm gen
srivatsankrishnan Aug 15, 2024
1d968f7
Fix the imports in init
srivatsankrishnan Aug 15, 2024
405778c
remove create strategy from slurm comand gen
srivatsankrishnan Aug 15, 2024
687dde8
keep extract test name for the pytest
srivatsankrishnan Aug 15, 2024
3dd6fb2
Adding nemotron 340b support existing approach
srivatsankrishnan Aug 15, 2024
40026ab
lint fix
srivatsankrishnan Aug 16, 2024
b460246
Functional script generation for nemotron340b in Paxml
srivatsankrishnan Aug 16, 2024
0e06184
Pushing the seperate class for gork and nemotron
srivatsankrishnan Aug 17, 2024
04da262
removing seperate classes for grok/gpt/nemo
srivatsankrishnan Aug 17, 2024
6e21956
Final working version with nemotron/grok/gpt + cleaning pytest
srivatsankrishnan Aug 17, 2024
2a79ef1
Added new unit test for the env handling refactor
srivatsankrishnan Aug 17, 2024
a92e753
Merge branch 'main' into nemotron
srivatsankrishnan Aug 17, 2024
757c819
Add number of nodes in srun for pre-test
srivatsankrishnan Aug 21, 2024
0c208b9
remove mpi for jax runs and nsys to sqlite db conversion
srivatsankrishnan Aug 22, 2024
212491c
modify to have only nsys profile for node and rank 0 for both profile…
srivatsankrishnan Aug 23, 2024
f770573
modify the slurm commans to start and load container
srivatsankrishnan Aug 24, 2024
2c01152
Add command generation for XLA flags for perf and profile stages
srivatsankrishnan Aug 24, 2024
ebc4580
Change list
srivatsankrishnan Aug 27, 2024
854e06f
Disable PGLE path for bf16 runs
srivatsankrishnan Sep 4, 2024
4b4f342
disabling pgle based on a flag (bf16 doesn't need pgle)
srivatsankrishnan Sep 4, 2024
c01b010
Merge branch 'main' into nemotron
srivatsankrishnan Sep 4, 2024
ee4fdcb
ruff fixes + merge conflict fixes
srivatsankrishnan Sep 4, 2024
2d97509
fix pytest with job_status_retrival_stratergy
srivatsankrishnan Sep 5, 2024
d25d85e
ruffing
srivatsankrishnan Sep 5, 2024
b1c53c0
fix the test for large run optimization (container pre-loads etc)
srivatsankrishnan Sep 5, 2024
c69f1f3
Fixing Taekyung's comments
srivatsankrishnan Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/cloudai/_core/test_template_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class TestTemplateParser(BaseMultiFileParser):

__test__ = False

VALID_DATA_TYPES = ["preset", "bool", "int", "str"]
VALID_DATA_TYPES = ["preset", "bool", "int", "str", "float"]

def __init__(self, system: System, directory_path: Path) -> None:
"""
Expand Down Expand Up @@ -214,7 +214,7 @@ def _check_and_set_defaults(self, details: Dict[str, Any], arg: str, arg_type: s

if details["type"] == "bool":
self._validate_boolean(details, arg, arg_type)
elif details["type"] in ["int", "str"]:
elif details["type"] in ["int", "str", "float"]:
self._validate_type(details, arg, arg_type)
elif details["type"] == "preset":
self._validate_preset(details, arg, arg_type)
Expand Down
Loading
Loading