-
Notifications
You must be signed in to change notification settings - Fork 368
Conversation
Now it can load the model, but it's not working
in math, tensor loading
Merged in the latest Do you have an ETA for when you'll be done with the separate crate? Also, do you think it would be possible to restructure the loader so that it can parse any valid GGML file and do LLaMA-specific logic on top of that? It'd be nice to support BLOOM / RWKV / etc models encoded in GGML-format files without having to copy the entire loader. |
1-2 weeks.
No. Different models have different hyper parameters. Currently llama.cpp's model format is ad-hoc. I don't know what is "any valid GGML file" If safetensors supports q4_0 or q4_1 I will use that. It has problem (unaligned memory) but it is at least defined. |
Hm, do you think it's worth shipping the version without a crate for now and then we get the crate in for the next release? I'd like to release 0.1 this weekend if possible.
Yeah, I know - but as far as I know, the tensors themselves are stored the same way (including with names), and it's only really the hyperparameters are different. It seems to me that you could parameterise over the hyperparameter struct and support loading any files that have the same {container, model-specific hyperparameters, vocabulary, tensor data} structure. |
Apparently some models just have token dupes? /shrug
keep the multi-file part
That seems like a separate issue. In rwkv.cpp the model is parametrized with |
It's a LLaMA model, not a RWKV model. RWKV models wouldn't be expected to work currently. The problem is the code is treating the model |
@@ -62,7 +62,7 @@ pub(crate) fn load( | |||
ggml_loader::load_model_from_reader(&mut reader, &mut loader) | |||
.map_err(|err| LoadError::from_ggml_loader_error(err, path.clone()))?; | |||
|
|||
Ok(loader.model.expect("model should be initialized")) | |||
loader.model.ok_or(LoadError::ModelNotCreated { path }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way I wrote the code, Err
is actually impossible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible if there are no tensors at all in the model (e.g. tensor_buffer
never gets called). Users should be aware of that.
llama-rs/src/loader2.rs
Outdated
self.vocab | ||
.push_token(id, token, score) | ||
.expect("vocab should be valid"); | ||
if let Err(err) = self.vocab.push_token(id, token, score) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, given the API usage, this is impossible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Sure. It mattered more when there was an error for duplicate tokens, but in this case it would be a breaking of the loading invariants if we encounter an ID out of step. I'll change that back to a panic.
I'm leaning towards disabling multi-file loading entirely and using loader1 to create a converter from GGML/GGMF multipart to GGJT. Keeping two loaders around just to handle a fairly niche use-case doesn't seem worth it to me.
Yeah, as Kerfuffle said, that's an issue with an existing LLaMA model (you can google that filename). That being said, I'm thinking we should address that in another PR. |
should we merge this now? |
Give me a sec, just going to make multifile loading on loader2 panic for now and then I'll merge |
The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file. Please make it instruct using |
Why use weird environment variables instead of just having a commandline option? Since multi-part models are a rare case, it seems like it makes the most sense to have single file as the default and specifying (or setting to multi-file mode) with a CLI option. |
loader2 is the default. I was considering making it a CLI option myself, but with
my plan is to remove support entirely for multipart + loader1 and relegate it to that separate tool. I'm guessing you haven't found that many multipart models in the wild, right? |
Are you OK with me removing |
I didn't try. |
Yes. |
Alright fellas, last chance to object before merge. After this, I'm going to
|
I separated the loading logic into a crate.
supercedes #114 #122