-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint Refactor #65
Checkpoint Refactor #65
Conversation
This should be ready for review now. I had to mess around with my fork to change it to the public PR so hopefully nothing weird happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments
Hey @akshaysubr , @NickGeneva and I talked and I will refactor some of the registry and entrypoints stuff to be a bit more in line with fsspec. You can look at other parts of the code but I wouldn't bother seeing too much until I make these changes. |
Linking with Modulus launch PR that should be merged at the same time, NVIDIA/modulus-launch#42 |
Hey @akshaysubr and @NickGeneva, This is ready for another round of revisions. I think everything here is in a pretty good place however there is one key design decision that I will explain bellow. So for the entrypoint exposure I followed the example here to the letter, https://amir.rachum.com/python-entry-points/. In particular I follow their final revision and expose all our models through entrypoints and treat internal and external models the same. You can see where I get the list of models here, https://github.com/loliverhennigh/modulus/blob/fea-checkpoint_refactor/modulus/models/registry.py#L20. This goes somewhat counter to the way that fsspec does things. Instead of exposing their protocols through entrypoints and getting them from there they manually write them in a dict here https://github.com/fsspec/filesystem_spec/blob/master/fsspec/registry.py#L62C2-L62C2 . Functionally this leads to the same design and interface for users and they have good documentation on how to add protocols through entrypoints. I would say the advantage of this is they can easily add protocols from outside packages and you can see a few examples of this in the dict. If we did the same thing maybe we could bring in hugging face models in for example. I am open to setting all this up following the fsspec example but at this point there is no benefit to doing it I think. I would say we go with the design I have in this PR and then if we want we can switch to the fsspec setup at any time. This should be easy to do given how I structured the code and will basically just be adding a dict to the Also, just to make it a bit more concrete on how a user can expose a model to modulus via entrypoints I included a little example here https://github.com/loliverhennigh/modulus/blob/fea-checkpoint_refactor/examples/ExternalPackage/example/load_model.py. Ill delete this after but it was really good to wrap my head around this stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loliverhennigh This is great work! Left some comments about suggested changes. Would also be good to change the PR title and description to something more verbose and descriptive.
I think it is good to keep this example. Just need to ensure it gets run every so often to ensure that it doesn't go stale. |
Hey @akshaysubr and @NickGeneva , Ready for another round of reviews! I think I addressed everything. Could I get another look when you have time. Thanks! |
@loliverhennigh I tested this now and it works great! Really liking how this turned out! |
/blossom-ci |
/blossom-ci |
/blossom-ci |
3 similar comments
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
1 similar comment
/blossom-ci |
Description
This feature adds better saving and loading of models that store arguments in a json file. This allows loading the model without needing to keep track of model parameters. Along with this merge there are several related side features. from_torch allows users to bring in models from torch seamlessly assuming they have json serializable inputs. Also, we have exposed the models through endpoints following discussions about expanding usability of modulus.
Here is a list of high level features,