Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic remote uploader downloader for composer profiler #2653

Merged
merged 18 commits into from
Oct 19, 2023

Conversation

j316chuck
Copy link
Contributor

@j316chuck j316chuck commented Oct 18, 2023

What does this PR do?

Add profiling remote downloader to composer

@j316chuck j316chuck requested a review from a team as a code owner October 18, 2023 00:50
@j316chuck j316chuck changed the title Add remote downloader to composer Add profiling remote downloader to composer Oct 18, 2023
@j316chuck j316chuck force-pushed the chuck/refactor_remote_downloader branch from 4ca739d to a81484c Compare October 18, 2023 00:55
@j316chuck j316chuck requested review from eracah and dakinggg October 18, 2023 03:39
@j316chuck j316chuck changed the title Add profiling remote downloader to composer Add automatic remote downloader detection for composer profiler Oct 18, 2023
@j316chuck j316chuck changed the title Add automatic remote downloader detection for composer profiler Add automatic remote uploader downloader for composer profiler Oct 18, 2023
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/profiler/profiler.py Outdated Show resolved Hide resolved
composer/profiler/profiler.py Show resolved Hide resolved
j316chuck and others added 2 commits October 18, 2023 13:16
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
@j316chuck j316chuck requested a review from dakinggg October 18, 2023 20:17
Copy link
Contributor

@eracah eracah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but note that:

  1. all the torch_folder/filename arguments can be reduced to 1 for uri parsing
  2. logger.upload_file in torchprofiler will upload profile traces to all object paths including ones used for checkpointing

@j316chuck j316chuck merged commit d534e0a into dev Oct 19, 2023
15 checks passed
@j316chuck j316chuck deleted the chuck/refactor_remote_downloader branch October 19, 2023 00:06
b-chu added a commit that referenced this pull request Oct 27, 2023
* Remove apex test and clean up fsdp warnings  (#2616)

* patch default (#2628)

* Add logging for generate callbacks (#2630)

* Update generate.py

* add missing imports

* Expose input_names and output_names when exporting to ONNX (#2601)

* Expose input_names and output_names when exporting to ONNX

* assert sample_input type for pyright

* fix mocks

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Bump version to 0.16.4 (#2627)

* bump version

* filter warning

* remove slack failure

* composer

* ckdn

* commit change

* commit change

* commit change

* commit change

* rename

* revert

* cleanup

* move around tests

* log

* fix slack

* clean test

* composer

* rearrange

* remove logs

* skip

* remove log

---------

Co-authored-by: Chuck Tang <chuck@mosaicml.com>

* many logs

* typos

* logs

* filter

* logs

* fix logs

* monkeypatch sharded tensor

* Add partial state dict functionality for FSDP (#2637)

* Use pytorch chunking

commit-id:e4c9b78f

* Add partial state dict functionality for FSDP

commit-id:2a2cae33

* Update monai requirement from <1.3,>=0.9.1 to >=0.9.1,<1.4 (#2643)

Updates the requirements on [monai](https://github.com/Project-MONAI/MONAI) to permit the latest version.
- [Release notes](https://github.com/Project-MONAI/MONAI/releases)
- [Changelog](https://github.com/Project-MONAI/MONAI/blob/dev/CHANGELOG.md)
- [Commits](Project-MONAI/MONAI@0.9.1...1.3.0)

---
updated-dependencies:
- dependency-name: monai
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pytest-codeblocks from 0.16.1 to 0.17.0 (#2645)

Bumps [pytest-codeblocks](https://github.com/nschloe/pytest-codeblocks) from 0.16.1 to 0.17.0.
- [Release notes](https://github.com/nschloe/pytest-codeblocks/releases)
- [Commits](nschloe/pytest-codeblocks@v0.16.1...v0.17.0)

---
updated-dependencies:
- dependency-name: pytest-codeblocks
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove flush on close (#2646)

* update latest (#2650)

* HSDP Support (#2648)

* add hsdp

* add tuple support

* mod wide

* update

* set default

* disable error validation

* hsdp

* gate import

* Log profile averages (#2647)

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* bump

* daily key (#2655)

* Add automatic remote uploader downloader for composer profiler (#2653)

* Update the AWS_OFI_NCCL version and add in the MPI HWLOC install (#2651)

* Update the AWS_OFI_NCCL version and add in the MPI HWLOC install

* Move the HWLOC down to the appropriate stage

* Move the HWLOC to the apt-get install

* Remove extra debug arg

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Anna <anna@mosaicml.com>
Co-authored-by: Antoine Broyelle <antoine.broyelle@helsing.ai>
Co-authored-by: Chuck Tang <chuck@mosaicml.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: willgleich <22464726+willgleich@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants