Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in the clean filter, auto-detect checkpoint handler based on file extension #250

Open
julien-c opened this issue Apr 30, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@julien-c
Copy link

julien-c commented Apr 30, 2024

let me know if it'd be interesting to contribute this!

(Example collection on HF with some pushed end2ends from git-theta: https://huggingface.co/collections/julien-c/git-theta-6630db4045e53fe3e2ac3108)

@blester125
Copy link
Collaborator

I think this would be a cool addition, I would say to two things we would want in an implementation would be:

  1. For it to be possible to override the checkpoint handler (for example, the GIT_THETA_CHECKPOINT_TYPE env variable defaults to "sniff" which does autodetect, but you can manually set it to force git-theta to use a specific one)
  2. We would want to avoid having to enumerate checkpoint types in git-theta itself. Maybe we would need something like a new git_theta.plugins.checkpoints_sniff plugin entry point and then each handler implementation can provide and register a sniff function. Then the git-theta sniff function would load/run the sniffer for each plugin until one of the sniffers says "yeah, this is a checkpoint I can open"

As a further extension to this idea (which def isn't needed in an initial implementation), some checkpoint formats have magic numbers (pytorch does https://pytorch.org/docs/stable/_modules/torch/serialization.html) which might be possible to use in a plugin's sniffer to do fancier things than just looking at the filenames (although I'm not sure how easy it is to peek at the stdin pipe (used in the filters) without consuming it).

I'm pretty busy atm, but I would be happy to help anyone who wants to implement this and it get merged.

@blester125 blester125 added the enhancement New feature or request label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants