[Summary] Ability to pass in initial state #175

CompRhys · 2024-02-14T01:35:49Z

Several issues have requested the ability to input initial state but several of these have often been closed by those posting without the issue being resolved. This issue simply collates those prior issues to make comments by the maintainers in response to those more findable when searching open issues.

#155
#146
#141
#127
#101

tldr; this functionality is work in progress

CompRhys · 2024-03-20T22:50:47Z

#258

SamPruden · 2024-03-24T14:31:46Z

I'd also benefit a lot from this feature, as I have multiple training sequences with long common prefixes, and I'd like to be able to run the model over each prefix once, then fork the state for each continuation. This would be for use during training, so contrary to what @albertfgu said in #101 I would need gradient flow through the pause/resume process.

radarFudan · 2024-03-24T14:44:44Z

Actually, if you don't mind 10x slower and 2x gpu memory usage, there is a workaround for now: #51

But I guess the true mamba with initial hidden states will require CUDA master to improve it.

CompRhys mentioned this issue Feb 23, 2024

Simple inference example #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Summary] Ability to pass in initial state #175

[Summary] Ability to pass in initial state #175

CompRhys commented Feb 14, 2024

CompRhys commented Mar 20, 2024

SamPruden commented Mar 24, 2024 •

edited

Loading

radarFudan commented Mar 24, 2024

[Summary] Ability to pass in initial state #175

[Summary] Ability to pass in initial state #175

Comments

CompRhys commented Feb 14, 2024

CompRhys commented Mar 20, 2024

SamPruden commented Mar 24, 2024 • edited Loading

radarFudan commented Mar 24, 2024

SamPruden commented Mar 24, 2024 •

edited

Loading