Standalone loader #125

iacore · 2023-04-08T12:10:39Z

I separated the loading logic into a crate.

supercedes #114 #122

Now it can load the model, but it's not working

in math, tensor loading

philpax · 2023-04-13T01:09:09Z

Merged in the latest main and confirmed that it works with 7B GGML and 13GB GGJT.

Do you have an ETA for when you'll be done with the separate crate?

Also, do you think it would be possible to restructure the loader so that it can parse any valid GGML file and do LLaMA-specific logic on top of that? It'd be nice to support BLOOM / RWKV / etc models encoded in GGML-format files without having to copy the entire loader.

iacore · 2023-04-13T01:20:39Z

Do you have an ETA for when you'll be done with the separate crate?

1-2 weeks.

Also, do you think it would be possible to restructure the loader so that it can parse any valid GGML file and do LLaMA-specific logic on top of that? It'd be nice to support BLOOM / RWKV / etc models encoded in GGML-format files without having to copy the entire loader.

No. Different models have different hyper parameters.

Currently llama.cpp's model format is ad-hoc. I don't know what is "any valid GGML file"

If safetensors supports q4_0 or q4_1 I will use that. It has problem (unaligned memory) but it is at least defined.

philpax · 2023-04-13T01:33:14Z

1-2 weeks.

Hm, do you think it's worth shipping the version without a crate for now and then we get the crate in for the next release? I'd like to release 0.1 this weekend if possible.

No. Different models have different hyper parameters.

Yeah, I know - but as far as I know, the tensors themselves are stored the same way (including with names), and it's only really the hyperparameters are different. It seems to me that you could parameterise over the hyperparameter struct and support loading any files that have the same {container, model-specific hyperparameters, vocabulary, tensor data} structure.

Apparently some models just have token dupes? /shrug

iacore · 2023-04-22T18:14:13Z

1. Do we need to keep loader1 around? Can we replace it entirely with loader2?

keep the multi-file part

2. As mentioned by @KerfuffleV2, `gpt4-x-alpaca-13b-native-4bit-128g.bin` doesn't load: `Could not load model: HyperparametersF16Invalid { ftype: 4 }`. Is that something you would be able to fix?

That seems like a separate issue.

In rwkv.cpp the model is parametrized with vtype as well
Keyword: wtype, vtype. Search for those in rwkv.cpp repo and this repo

KerfuffleV2 · 2023-04-22T18:19:07Z

That seems like a separate issue.

It's a LLaMA model, not a RWKV model. RWKV models wouldn't be expected to work currently.

The problem is the code is treating the model f16 type the same as the tensor type, and they aren't the same. See the top part of: #125 (comment)

llama-rs/src/vocabulary.rs

iacore · 2023-04-22T18:21:03Z

llama-rs/src/loader2.rs

@@ -62,7 +62,7 @@ pub(crate) fn load(
    ggml_loader::load_model_from_reader(&mut reader, &mut loader)
        .map_err(|err| LoadError::from_ggml_loader_error(err, path.clone()))?;

-    Ok(loader.model.expect("model should be initialized"))
+    loader.model.ok_or(LoadError::ModelNotCreated { path })


the way I wrote the code, Err is actually impossible here.

It's possible if there are no tensors at all in the model (e.g. tensor_buffer never gets called). Users should be aware of that.

iacore · 2023-04-22T18:21:37Z

llama-rs/src/loader2.rs

-        self.vocab
-            .push_token(id, token, score)
-            .expect("vocab should be valid");
+        if let Err(err) = self.vocab.push_token(id, token, score) {


again, given the API usage, this is impossible

Hm. Sure. It mattered more when there was an error for duplicate tokens, but in this case it would be a breaking of the loading invariants if we encounter an ID out of step. I'll change that back to a panic.

philpax · 2023-04-22T18:21:49Z

keep the multi-file part

I'm leaning towards disabling multi-file loading entirely and using loader1 to create a converter from GGML/GGMF multipart to GGJT. Keeping two loaders around just to handle a fairly niche use-case doesn't seem worth it to me.

That seems like a separate issue.

Yeah, as Kerfuffle said, that's an issue with an existing LLaMA model (you can google that filename). That being said, I'm thinking we should address that in another PR.

iacore · 2023-04-22T18:29:14Z

should we merge this now?

philpax · 2023-04-22T18:30:36Z

Give me a sec, just going to make multifile loading on loader2 panic for now and then I'll merge

iacore · 2023-04-22T18:31:59Z

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.

Please make it instruct using env GGML_LOADER=1 for multi file loading.

KerfuffleV2 · 2023-04-22T18:36:17Z

Why use weird environment variables instead of just having a commandline option?

Since multi-part models are a rare case, it seems like it makes the most sense to have single file as the default and specifying (or setting to multi-file mode) with a CLI option.

philpax · 2023-04-22T18:37:40Z

loader2 is the default. I was considering making it a CLI option myself, but with

I'm leaning towards disabling multi-file loading entirely and using loader1 to create a converter from GGML/GGMF multipart to GGJT. Keeping two loaders around just to handle a fairly niche use-case doesn't seem worth it to me.

my plan is to remove support entirely for multipart + loader1 and relegate it to that separate tool. I'm guessing you haven't found that many multipart models in the wild, right?

philpax · 2023-04-22T18:38:18Z

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.

Please make it instruct using env GGML_LOADER=1 for multi file loading.

Are you OK with me removing LoadHandler::load_multipart entirely?

KerfuffleV2 · 2023-04-22T18:44:48Z

I'm guessing you haven't found that many multipart models in the wild, right?

I didn't try.

iacore · 2023-04-22T18:49:41Z

The current loader2 code doesn't detect multiple files; simply assumes it's only 1 file.
Please make it instruct using env GGML_LOADER=1 for multi file loading.

Are you OK with me removing LoadHandler::load_multipart entirely?

Yes.

philpax · 2023-04-22T18:59:50Z

Alright fellas, last chance to object before merge. After this, I'm going to

merge Ported quantize.cpp #84
use it to build a utility to convert multipart models to singlepart GGJT; I'll likely rename ggml_loader to ggml_format and write code for writing GGJT from hyperparameters + vocab + tensors
create an issue for loading GPT4-X
bring BLOOM Refactor #141 up to date

llama-rs/src/vocabulary.rs

iacore marked this pull request as ready for review April 8, 2023 12:10

iacore and others added 14 commits April 8, 2023 13:25

Add loader stub for GGJT

bdbea68

Add loading code for ggjt

b0a666f

Now it can load the model, but it's not working

code cleanup that doesn't change anything

9eefdc5

more code cleanup

c212c53

minor change

bfaec3a

in math, tensor loading

Add non-mmap loader for GGJT

b6044ee

Prefer traits in loader.rs

1872dda

cargo fmt

ec1fca7

cargo clippy --fix

cc846ae

Remove ggml::Tensor::set_data

bf847dd

fix(llama): buffer tokens until valid UTF-8

ea7094c

Add standalone loader

c848d5e

Move loader to standalone crate llama-loader

8390593

[llama-loader] Support non-copy loader

15fe19b

iacore force-pushed the llama-loader branch from 9f625c0 to 15fe19b Compare April 8, 2023 13:30

Use functions from the new crate

2e9311d

iacore force-pushed the llama-loader branch from 1104ddb to 2e9311d Compare April 8, 2023 13:49

philpax added this to the 0.1 milestone Apr 10, 2023

iacore changed the title ~~standalone loader~~ [WIP] standalone loader Apr 10, 2023

jon-chuang mentioned this pull request Apr 12, 2023

feat: mmapped ggjt loader #129

Closed

iacore mentioned this pull request Apr 12, 2023

Add GGJT loader #114

Closed

Merge branch 'main' into llama-loader

4dd0fc5

This was referenced Apr 13, 2023

Ported quantize.cpp #84

Merged

Split up modules #135

Merged

philpax added 2 commits April 13, 2023 12:31

Merge branch 'main' of github.com:rustformers/llama-rs into llama-loader

c40e36e

refactor(llama): pass mut tensors down

34429e0

philpax changed the title ~~[WIP] standalone loader~~ Standalone loader Apr 22, 2023

philpax added 4 commits April 22, 2023 17:37

chore: remove old comments

430abfe

chore: remove unused error case

bf6a917

fix: remove some panics

9b908ae

feat: remove AlreadyAdded error

d8c4ca6

Apparently some models just have token dupes? /shrug

philpax force-pushed the llama-loader branch from 6946823 to d8c4ca6 Compare April 22, 2023 16:13

iacore commented Apr 22, 2023

View reviewed changes

llama-rs/src/vocabulary.rs Outdated Show resolved Hide resolved

iacore commented Apr 22, 2023

View reviewed changes

minor fix

cabc4c9

fix: Vocabulary::push_token is infallible

1930496

fix: bail on multipart models with loader2

bdb9856

iacore commented Apr 22, 2023

View reviewed changes

llama-rs/src/vocabulary.rs Outdated Show resolved Hide resolved

refactor: make Vocabulary::push_token pub(crate)

b41fe14

philpax merged commit 2e62a18 into rustformers:main Apr 22, 2023

This was referenced Apr 22, 2023

GPT4-X cannot be loaded #149

Closed

Support the new mmap-able ggml format #93

Closed

danforbes mentioned this pull request Apr 30, 2023

Structural Overhaul #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone loader #125

Standalone loader #125

iacore commented Apr 8, 2023 •

edited

Loading

philpax commented Apr 13, 2023

iacore commented Apr 13, 2023 •

edited

Loading

philpax commented Apr 13, 2023

iacore commented Apr 22, 2023 •

edited

Loading

KerfuffleV2 commented Apr 22, 2023

iacore Apr 22, 2023

philpax Apr 22, 2023

iacore Apr 22, 2023

philpax Apr 22, 2023

philpax commented Apr 22, 2023

iacore commented Apr 22, 2023

philpax commented Apr 22, 2023

iacore commented Apr 22, 2023 •

edited

Loading

KerfuffleV2 commented Apr 22, 2023

philpax commented Apr 22, 2023

philpax commented Apr 22, 2023

KerfuffleV2 commented Apr 22, 2023

iacore commented Apr 22, 2023

philpax commented Apr 22, 2023

Standalone loader #125

Standalone loader #125

Conversation

iacore commented Apr 8, 2023 • edited Loading

philpax commented Apr 13, 2023

iacore commented Apr 13, 2023 • edited Loading

philpax commented Apr 13, 2023

iacore commented Apr 22, 2023 • edited Loading

KerfuffleV2 commented Apr 22, 2023

iacore Apr 22, 2023

Choose a reason for hiding this comment

philpax Apr 22, 2023

Choose a reason for hiding this comment

iacore Apr 22, 2023

Choose a reason for hiding this comment

philpax Apr 22, 2023

Choose a reason for hiding this comment

philpax commented Apr 22, 2023

iacore commented Apr 22, 2023

philpax commented Apr 22, 2023

iacore commented Apr 22, 2023 • edited Loading

KerfuffleV2 commented Apr 22, 2023

philpax commented Apr 22, 2023

philpax commented Apr 22, 2023

KerfuffleV2 commented Apr 22, 2023

iacore commented Apr 22, 2023

philpax commented Apr 22, 2023

iacore commented Apr 8, 2023 •

edited

Loading

iacore commented Apr 13, 2023 •

edited

Loading

iacore commented Apr 22, 2023 •

edited

Loading

iacore commented Apr 22, 2023 •

edited

Loading