Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Bamba Model #34982

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Add the Bamba Model #34982

wants to merge 20 commits into from

Conversation

fabianlim
Copy link
Contributor

@fabianlim fabianlim commented Nov 28, 2024

What does this PR do?

This PR merges the BambaModel, which is a hybrid mamba2 architecture with SwiGLU. The checkpoints are jointly trained by IBM, Princeton, and UIUC.

The implementation is based off ai21labs/Jamba-v0.1 and the mamba2 implementation ported over to HF for the codestral model.

cc: @ani300, @raghukiran1224

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@fabianlim fabianlim marked this pull request as draft November 28, 2024 00:35
@fabianlim fabianlim changed the title initial commit for PR Add the Bamba Model Nov 28, 2024
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
@Rocketknight1
Copy link
Member

Hi @fabianlim, do you have a paper reference for this model or any details on the trained checkpoints?

@fabianlim
Copy link
Contributor Author

@Rocketknight1 thanks for reaching out. Yes my colleagues are preparing a paper and a GitHub repo with the (training) code. And checkpoints will be 1.8T, 2T, 2.2T, and an sft model. We will update the PR accordingly.

cc: @raghukiran1224

@raghukiran1224
Copy link

The data used is all open, we plan to share any and all details on what the community would want! Open source is the name of the game 😄

@Rocketknight1
Copy link
Member

Cool! @molbap will be the point of contact at Hugging Face for this PR, so feel free to ping me or him if you have any questions as you're working on it.

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
@fabianlim fabianlim mentioned this pull request Dec 5, 2024
6 tasks
@molbap molbap added State space models Issues or PRs related to state space models such as mamba, mamba2 New model labels Dec 9, 2024
ani300 and others added 4 commits December 14, 2024 00:05
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New model State space models Issues or PRs related to state space models such as mamba, mamba2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants