Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summary] Ability to pass in initial state #175

Open
CompRhys opened this issue Feb 14, 2024 · 3 comments
Open

[Summary] Ability to pass in initial state #175

CompRhys opened this issue Feb 14, 2024 · 3 comments

Comments

@CompRhys
Copy link

Several issues have requested the ability to input initial state but several of these have often been closed by those posting without the issue being resolved. This issue simply collates those prior issues to make comments by the maintainers in response to those more findable when searching open issues.

#155
#146
#141
#127
#101

tldr; this functionality is work in progress

@CompRhys
Copy link
Author

#258

@SamPruden
Copy link

SamPruden commented Mar 24, 2024

I'd also benefit a lot from this feature, as I have multiple training sequences with long common prefixes, and I'd like to be able to run the model over each prefix once, then fork the state for each continuation. This would be for use during training, so contrary to what @albertfgu said in #101 I would need gradient flow through the pause/resume process.

@radarFudan
Copy link

Actually, if you don't mind 10x slower and 2x gpu memory usage, there is a workaround for now: #51

But I guess the true mamba with initial hidden states will require CUDA master to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants