Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to register algorithm passes #1377

Merged
merged 4 commits into from
Aug 8, 2022

Conversation

hanlint
Copy link
Contributor

@hanlint hanlint commented Aug 6, 2022

The order in which algorithms are run matters significantly during composition. For example, FusedLayerNorm must run after GatedLinearUnits (which adds layer norms), to ensure that all layer norms are converted into fused versions. To enforce these, we use algorithm passes, which operate on lists of algorithms.

This PR refactors these algorithm passes into its own passes module, and allows the user to register custom passes (for custom algorithms) into the Engine.

Also coming along for the ride are some readability and code quality improvements to the engine.

todos:

  • add tests

@hanlint hanlint requested review from a team as code owners August 6, 2022 20:55
@hanlint hanlint requested a review from mvpatel2000 August 6, 2022 21:10
@hanlint hanlint changed the title [WIP] Add ability to register algorithm passes Add ability to register algorithm passes Aug 7, 2022
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I like what this API is trying to accomplish -- agree we need a better API to sort the ordering of algorithms -- but wonder if it would be helpful to work through the design a bit more? My main questions with this API are:

  1. Should an algorithm register a pass, or do we want this functionality to live outside of the algorithm? I was thinking for modularity it could be helpful for each algorithm to self-contain this information. But, this would not be possible via this API (nor via the status quo), since algorithms do not have access to the engine.
  2. Would it be confusing to set the ordering of passes (via the index argument correctly? Mainly thinking if multiple (different) sources are both inserting passes -- then the order in which engine.register_pass is called is important.
  3. Do we want a full function for algorithm sorting that would be opaque to the engine? Or would it be helpful to give the engine more visibility into the scheduling requirements with something like a DAG , where each algorithm could have a run_before(self, event) -> Sequence[Type[Algorithm]] and a run_after(self, event) -> Sequence[Type[Algorithm]] method. The engine could then rearrange algorithms (so long as it satisfies the DAG requirements) for optimal performance (e.g. when running with XLA and lazy execution).

I didn't review the code in detail; happy to do that if we would like to go with this design.

@hanlint
Copy link
Contributor Author

hanlint commented Aug 8, 2022

  1. Some passes may be cross-algorithms (for example, see the one warning about multiple interpolate_loss), so they should live outside of algorithms. This API actually does support an extension in the future. We could eventually add a register_pass method to the Algorithm that the Trainer uses this API to register passes.
  2. The default behavior is to append at the end. index is for power users, as we are all consenting adults.
  3. This sounds like pre-mature narrowing / redesign of the API. Let's retain flexibility until we know what kind of passes we want to do.

I dont think this warrants a full design discussion, this is refactoring the existing design for better extensibility and readability (rather than hiding all the algorithm passes inside _compile).

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Interactions are messy between algorithms -- I don't think you can include per algorithm
  2. Also don't love weird indexing, but fine leaving as a super user feature
  3. We likely will need to rewrite this with a more complicated scheduling algorithm -- agree with many of the points Ravi said. With that said, I'm fine with this refactor (which is much better than current code imo) because I don't think we're at the point where we need to do a full design on algorithm DAGs -- Id punt this to later

composer/core/engine.py Show resolved Hide resolved
composer/core/engine.py Show resolved Hide resolved
composer/core/engine.py Show resolved Hide resolved
@hanlint hanlint merged commit af8bb17 into mosaicml:dev Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants