Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint support for torch.package #2570

Open
dasturge opened this issue May 13, 2022 · 3 comments
Open

Checkpoint support for torch.package #2570

dasturge opened this issue May 13, 2022 · 3 comments

Comments

@dasturge
Copy link

🚀 Feature

Currently, checkpointing is very centered around objects with state_dict and load_state_dict properties, but the new torch.package serialization option breaks with this pattern. It doesn't seem that I can simply insert a custom save_handler to handle package import/export

torch offers a new, interesting method for serializing models along with code and dependencies (and is not limited to pytorch/base python types), it would be cool to be able to leverage this for checkpointing so models produced by the checkpointer are packaged and ready-to-go, helping to bridge the gap with deployment workflows.

Something along the lines of:

TorchPackageCheckpoint(package=my_package, internal_module="models", ) which allows one to pass an object which works along with:

with PackageExporter(f'{filepath}.pt') as pe:
    pe.save_pickle(self.internal_module, self.internal_file, self.package)

Obviously this would require some refactoring of private methods if it's to use the base Checkpoint class, needing to offload the responsibility to use state_dict to any DiskSaver/save_handlers. I didn't see a clean way to simply extend the Checkpoint class.

@sdesrozis
Copy link
Contributor

@dasturge Thank you to highlight this very interesting point.

IMO a package is very different to a checkpoint. I'm not expert of what was recently done, but it doesn't sound like a new way to checkpoint and replace the actual load/save from state dicts. I would say that it should be very useful at the training process end helping the deployment.

A specific handler could be an idea but at the moment I don't see how reuse automatically the training code in order to have a inference script. Maybe it's more relative to a guideline for writing applications.

@sadra-barikbin
Copy link
Collaborator

The raw way to do the job:

@trainer.on(Events.COMPLETED)
def package_model()
  with PackageExporter('package.pt') as pe:
    # Some action pattern settings, depending on what you're packaging
    pe.intern('models.**') # example
    pe.extern('numpy.**') # example

    pe.save_pickle('my_package', 'model.pkl', model)

As @dasturge said, we can have an api like:

TorchPackageCheckpoint(path:str, package_name: str, interns: List[str]=[], externs: List[str]=[], mocked: List[str]=[], to_save:Dict[str,Any] )

to do the job, but since user does not know interns and externs beforehand, he/she should do items below again and again so as to all dependencies have an assigned action.

  1. Call some pe.intern, pe.extern or pe.mocks
  2. Face packaging error
  3. Refine patterns and go to step 1

When we have to write those statements, why do not fall back to the raw way I said at first?

@sdesrozis
Copy link
Contributor

Having a new and specific handler for packaging would be interesting if we manage something helpful. Maybe we could have checkpoints during the training and packaging at the end. However, packaging is more than checkpoint, it embeds what is needed for inference and is related to deployment.

Let's think about it. It would be nice having a package importer, exporter for training and why not an inference engine based on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants