-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [MPT] Support MPT-7b-instruct model #460
Conversation
8656510
to
1ae6639
Compare
Hi @vvchernov , please remove the WIP in the title when you feel the PR is ready. |
…MPTModel was refactored
…ve corresponding TODOs. other torch replacements
5b846b5
to
bff07c3
Compare
c1883ef
to
fa39f08
Compare
fa39f08
to
e116601
Compare
I'm also interested in the TVM-native attention! I want to fuse all of I wonder if such fusion is possible in the presence of KV cache update, though. |
…ul in float32 to avoid inf generation
Just found that the decoder attention kernel in FasterTransformer supports rotary fusion. It also takes KV cache as input. |
…s code, comment unneccessary code parts, upstream layer names for correct mapping
@vvchernov How far away is this PR from being ready such that it works with the MPT model family? |
Hello guys, sorry for the late response. I was transferred to another task (accuracy benchmarking of LLM) and could not finish this one. Now I upstream my last changing in MPT model and mlc-llm pipeline with/without kv cache. It is still required debugging due to low model accuracy and transferring this branch to the top of mlc. It will be great if you help me |
24949b0
to
58be070
Compare
There is implementation of the original mpt-7b-instruct model from hugging face on Relax and some updates on mlc-llm pipelines side to launch it by mlc_chat_cli.
Current state:
Note: need to merge PR and use new version of TVM to correct work of MPT model
cc @yzh119 @masahi