Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Standalone loader #125

Merged
merged 46 commits into from
Apr 22, 2023
Merged

Standalone loader #125

merged 46 commits into from
Apr 22, 2023

Conversation

iacore
Copy link
Contributor

@iacore iacore commented Apr 8, 2023

I separated the loading logic into a crate.

supercedes #114 #122

@iacore iacore marked this pull request as ready for review April 8, 2023 12:10
@philpax philpax added this to the 0.1 milestone Apr 10, 2023
@iacore iacore changed the title standalone loader [WIP] standalone loader Apr 10, 2023
@iacore iacore mentioned this pull request Apr 12, 2023
@philpax
Copy link
Collaborator

philpax commented Apr 13, 2023

Merged in the latest main and confirmed that it works with 7B GGML and 13GB GGJT.

Do you have an ETA for when you'll be done with the separate crate?

Also, do you think it would be possible to restructure the loader so that it can parse any valid GGML file and do LLaMA-specific logic on top of that? It'd be nice to support BLOOM / RWKV / etc models encoded in GGML-format files without having to copy the entire loader.

@iacore
Copy link
Contributor Author

iacore commented Apr 13, 2023

Do you have an ETA for when you'll be done with the separate crate?

1-2 weeks.

Also, do you think it would be possible to restructure the loader so that it can parse any valid GGML file and do LLaMA-specific logic on top of that? It'd be nice to support BLOOM / RWKV / etc models encoded in GGML-format files without having to copy the entire loader.

No. Different models have different hyper parameters.

Currently llama.cpp's model format is ad-hoc. I don't know what is "any valid GGML file"

If safetensors supports q4_0 or q4_1 I will use that. It has problem (unaligned memory) but it is at least defined.

@philpax
Copy link
Collaborator

philpax commented Apr 13, 2023

1-2 weeks.

Hm, do you think it's worth shipping the version without a crate for now and then we get the crate in for the next release? I'd like to release 0.1 this weekend if possible.

No. Different models have different hyper parameters.

Yeah, I know - but as far as I know, the tensors themselves are stored the same way (including with names), and it's only really the hyperparameters are different. It seems to me that you could parameterise over the hyperparameter struct and support loading any files that have the same {container, model-specific hyperparameters, vocabulary, tensor data} structure.

This was referenced Apr 13, 2023
@philpax philpax changed the title [WIP] standalone loader Standalone loader Apr 22, 2023
@iacore
Copy link
Contributor Author

iacore commented Apr 22, 2023

1. Do we need to keep loader1 around? Can we replace it entirely with loader2?

keep the multi-file part

2. As mentioned by @KerfuffleV2, `gpt4-x-alpaca-13b-native-4bit-128g.bin` doesn't load: `Could not load model: HyperparametersF16Invalid { ftype: 4 }`. Is that something you would be able to fix?

That seems like a separate issue.

In rwkv.cpp the model is parametrized with vtype as well
Keyword: wtype, vtype. Search for those in rwkv.cpp repo and this repo

@KerfuffleV2
Copy link
Contributor

That seems like a separate issue.

It's a LLaMA model, not a RWKV model. RWKV models wouldn't be expected to work currently.

The problem is the code is treating the model f16 type the same as the tensor type, and they aren't the same. See the top part of: #125 (comment)

llama-rs/src/vocabulary.rs Outdated Show resolved Hide resolved
@@ -62,7 +62,7 @@ pub(crate) fn load(
ggml_loader::load_model_from_reader(&mut reader, &mut loader)
.map_err(|err| LoadError::from_ggml_loader_error(err, path.clone()))?;

Ok(loader.model.expect("model should be initialized"))
loader.model.ok_or(LoadError::ModelNotCreated { path })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way I wrote the code, Err is actually impossible here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible if there are no tensors at all in the model (e.g. tensor_buffer never gets called). Users should be aware of that.

self.vocab
.push_token(id, token, score)
.expect("vocab should be valid");
if let Err(err) = self.vocab.push_token(id, token, score) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, given the API usage, this is impossible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Sure. It mattered more when there was an error for duplicate tokens, but in this case it would be a breaking of the loading invariants if we encounter an ID out of step. I'll change that back to a panic.

@philpax
Copy link
Collaborator

philpax commented Apr 22, 2023

keep the multi-file part

I'm leaning towards disabling multi-file loading entirely and using loader1 to create a converter from GGML/GGMF multipart to GGJT. Keeping two loaders around just to handle a fairly niche use-case doesn't seem worth it to me.

That seems like a separate issue.

Yeah, as Kerfuffle said, that's an issue with an existing LLaMA model (you can google that filename). That being said, I'm thinking we should address that in another PR.

@iacore
Copy link
Contributor Author

iacore commented Apr 22, 2023

should we merge this now?

@philpax
Copy link
Collaborator

philpax commented Apr 22, 2023

Give me a sec, just going to make multifile loading on loader2 panic for now and then I'll merge

@iacore
Copy link
Contributor Author

iacore commented Apr 22, 2023

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.

Please make it instruct using env GGML_LOADER=1 for multi file loading.

@KerfuffleV2
Copy link
Contributor

Why use weird environment variables instead of just having a commandline option?

Since multi-part models are a rare case, it seems like it makes the most sense to have single file as the default and specifying (or setting to multi-file mode) with a CLI option.

@philpax
Copy link
Collaborator

philpax commented Apr 22, 2023

loader2 is the default. I was considering making it a CLI option myself, but with

I'm leaning towards disabling multi-file loading entirely and using loader1 to create a converter from GGML/GGMF multipart to GGJT. Keeping two loaders around just to handle a fairly niche use-case doesn't seem worth it to me.

my plan is to remove support entirely for multipart + loader1 and relegate it to that separate tool. I'm guessing you haven't found that many multipart models in the wild, right?

@philpax
Copy link
Collaborator

philpax commented Apr 22, 2023

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.

Please make it instruct using env GGML_LOADER=1 for multi file loading.

Are you OK with me removing LoadHandler::load_multipart entirely?

@KerfuffleV2
Copy link
Contributor

I'm guessing you haven't found that many multipart models in the wild, right?

I didn't try.

@iacore
Copy link
Contributor Author

iacore commented Apr 22, 2023

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.
Please make it instruct using env GGML_LOADER=1 for multi file loading.

Are you OK with me removing LoadHandler::load_multipart entirely?

Yes.

@philpax
Copy link
Collaborator

philpax commented Apr 22, 2023

Alright fellas, last chance to object before merge. After this, I'm going to

  • merge Ported quantize.cpp #84
  • use it to build a utility to convert multipart models to singlepart GGJT; I'll likely rename ggml_loader to ggml_format and write code for writing GGJT from hyperparameters + vocab + tensors
  • create an issue for loading GPT4-X
  • bring BLOOM Refactor #141 up to date

llama-rs/src/vocabulary.rs Outdated Show resolved Hide resolved
@philpax philpax merged commit 2e62a18 into rustformers:main Apr 22, 2023
@danforbes danforbes mentioned this pull request Apr 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants