Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P1] Compatibility with tooling that expects a HF transformer model #36

Open
chris-aeviator opened this issue Apr 8, 2024 · 3 comments
Labels
question Further information is requested

Comments

@chris-aeviator
Copy link

chris-aeviator commented Apr 8, 2024

I'm raising the issue that in terms of "production readyness" (statet goal) pyreft, designed as a very thoughtful library, will need to work together with tooling that expects a loadable vanilla transformer model. A real world reproducible example is loading a pyvene trained model with https://github.com/outlines-dev/outlines in order to create structured json/ schema following outputs.

While the model can be accessed via pyref_model.model - it is not loadable, and in any case one tool would miss the other's functionality when loaded this way. What would be a advisable strategy to integrate with other tooling? May I suggest also different backend engines (e.g. vllm, ollama, llama.cpp) will need to have have interfaces to pyreft. Maybe I'm overseeing some documentation here but I'm unsure how to proceed.

Is merging a pyvene intervention into the base model possible or is pyvene/pyreft more of an active component that will require code changes in any case?

@aryamanarora
Copy link
Collaborator

aryamanarora commented Apr 8, 2024

Hey! So:

  1. We got similar questions on Twitter about accelerating inference with different backends (vllm, mlx, etc.) Currently, pyvene is a major dependency for which no alternative exists: it manages the torch hooks that are used to intervene on hidden representations at the token-level in pyreft. To enable support for non-HF and/or non-torch models, we would need to replicate some pyvene functionality. We have thought about how to do this simply without needing to port pyvene entirely1, but it's a long-term software engineering task that we don't immediately have the time/resources/people for. Maybe in the summer once pyreft is known to be stable for a variety of models + tasks, we will invest time into this.
  2. The LoReFT intervention can't be merged into the base model for two reasons. (1) It is a complex function applied directly to the hidden state, so it operates differently than existing model components (which add to the hidden state via residuals) and so can't be folded into them as far as we can tell. (2) It operates only on some tokens, not all, but model weights are the same for every token.

So overall, using LoReFT in a model requires either torch-style hooking functionality or code changes to the model to support token-level interventions.

Footnotes

  1. E.g. we could just load pyvene for the KV-cache population when processing the prompt, and then use the efficient backend for generation. But in the future, we want to support intervention on decoding steps as well which is messier.

@aryamanarora aryamanarora added the question Further information is requested label Apr 8, 2024
@frankaging frankaging changed the title Compatibility with tooling that expects a HF transformer model [P1] Compatibility with tooling that expects a HF transformer model Apr 8, 2024
@frankaging
Copy link
Collaborator

assigning with P1 since there is no blocker.

@chris-aeviator
Copy link
Author

chris-aeviator commented Apr 10, 2024

an elegant solution could be providing an import AutoModel from pyreft that encapsulates the hooks while preserving compatibility with other libraries. Is this on a high level possible? If so, I'd be willing to contribute , my interest here lies also in supporting high troughput vllm and per request model switching, both possible with vllm already. They just loads a HF AutoModel in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants