Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydra Release 2021.11 #1

Open
16 tasks
knagrecha opened this issue Nov 11, 2021 · 1 comment
Open
16 tasks

Hydra Release 2021.11 #1

knagrecha opened this issue Nov 11, 2021 · 1 comment
Labels
enhancement New feature or request planned

Comments

@knagrecha
Copy link
Owner

knagrecha commented Nov 11, 2021

Current Release:

  • Standard model-parallel sharding supported
  • Pilot-run style partitioning
  • Sharded-LRTF Scheduling
  • Standard linear execution patterns for forward/backward passes
  • Arbitrarily deep models can be trained on one GPU
  • Near-linear speedups for end-to-end runtimes in single-node multi-GPU setting

Next Release TODOs (Target Completion: 2022.06):

  • Tensor Parallel Support
  • Recurrent/Autoregressive Network Support
  • Cluster setting scheduler (prioritize turnaround time vs makespan)
  • Disk spilling
  • Multi-node scaling
  • More examples
  • Additional documentation
  • More robust partitioner

Long-term TODOs

  • (Potential) FlexFlow integration?
  • Pip package
  • AMD GPU support
  • Unit tests
  • Website
  • TensorFlow support
  • Data Parallel Support
  • Model Selection APIs
@knagrecha knagrecha added enhancement New feature or request planned labels Nov 11, 2021
@knagrecha
Copy link
Owner Author

Also, really need to get support for PyTorch 1.10 and the corresponding torchtext. The codebase itself should be fine, even have some possible optimization opportunities (see saved_intermediates branch) but the examples just break entirely when I try to upgrade PyTorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request planned
Projects
None yet
Development

No branches or pull requests

1 participant