-
Notifications
You must be signed in to change notification settings - Fork 368
Conversation
Hitting a bit of an issue here when trying to read in the converted bloom models on HF [2023-04-01T16:59:13Z INFO llama_cli] Warning: Bad token in vocab at index 0
thread 'main' panicked at 'Could not load model: ReadExactFailed { source: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }, bytes: 4 }', llama-cli/src/main.rs:267:10 |
|
||
/// The weights for the BLOOM model. All the mutable state is split into a | ||
/// separate struct `InferenceSession`. | ||
pub struct BLOOM { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with Bloom
:
In UpperCamelCase, acronyms and contractions of compound words count as one word: use Uuid rather than UUID, Usize rather than USize or Stdin rather than StdIn. In snake_case, acronyms and contractions are lower-cased: is_xid_start.
Sorry for taking so long to look at this - it looks good! Once @setzer22 gives his OK (just so that he's happy with the overall structure of things), I'll get it ready for the PR merge chain. Don't update the PR yet - there's some other changes we'll likely need to land first, and it'll be easier for you to do them all at once. |
sure, I decided against further restructuring the model load function to cut down on code duplication since there are multiple PRs either making changes or using it. I will re-open this #74 and refactor it after completing #85 this one. |
I hope this could be in another repo. Having bloom model in |
Yes - the reason we're keeping it here for now is because they share a lot of commonalities in architecture, and the base LLaMA library keeps changing. We will probably do one or more of the following over time:
|
What's the status of this? Looks like it's not very up-to-date. Is there anything I can do to help? |
It's on pause until most of the major changes have been merged in, see discussion |
There's also a chance that we investigate #137 before we tackle this again, just to reduce the rework if we go down that road, but I'm not sure yet. (Would appreciate your thoughts on it, btw!) |
I would not go that far. Making our own computation graphs is a significant undertaking, and support for Bloom has been here for a while now. I would prioritize merging this before making any other big refactors. In my experience, abstractions always come up better when you don't design them with a single use case to test. If we build the computation graph API and only make sure it works for LLaMA, chances are we're going to have to rework it later on anyway when adding support for other models. It would be better to have multiple models in first, to make sure the abstraction we come up with is more solid. |
Completes #45
Will refactor and remove most of the duplicate code before merging.
ggml_alibi
,ggml_compute_alibi_forward_alibi_f32
,ggml_compute_forward_alibi_f16
,ggml_compute_forward_alibi
toggml.c
ggml_view_2d
,ggml_alibi
andggml_gelu
tolib.rs
in ggml-raw crate.commons
folder and the models into amodels
foldermain.rs
inllama-cli