-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the Bamba Model #34982
base: main
Are you sure you want to change the base?
Add the Bamba Model #34982
Conversation
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Hi @fabianlim, do you have a paper reference for this model or any details on the trained checkpoints? |
@Rocketknight1 thanks for reaching out. Yes my colleagues are preparing a paper and a GitHub repo with the (training) code. And checkpoints will be 1.8T, 2T, 2.2T, and an sft model. We will update the PR accordingly. cc: @raghukiran1224 |
The data used is all open, we plan to share any and all details on what the community would want! Open source is the name of the game 😄 |
Cool! @molbap will be the point of contact at Hugging Face for this PR, so feel free to ping me or him if you have any questions as you're working on it. |
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
What does this PR do?
This PR merges the
BambaModel
, which is a hybrid mamba2 architecture with SwiGLU. The checkpoints are jointly trained by IBM, Princeton, and UIUC.The implementation is based off ai21labs/Jamba-v0.1 and the mamba2 implementation ported over to HF for the codestral model.
cc: @ani300, @raghukiran1224
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.