Split up modules #135

philpax · 2023-04-13T03:06:44Z

This is a little overdue I think, and it'll cause problems for the other PRs, but it also makes it much easier to maintain.

I've made a few controversial changes in the last three commits. The rest are pretty straightforward.

…-> InferenceSession::from_snapshot

setzer22

Everything looks good. I think the way modules are split makes sense 👍

Hopefully this wouldn't cause too many conflicts with existing PRs? e.g. #125 comes to mind

setzer22 · 2023-04-13T09:26:43Z

llama-cli/src/main.rs

-    let (_, vocabulary) = args.model_load.load();
-    let toks = match vocabulary.tokenize(&prompt, false) {
+    let model = args.model_load.load();


I agree with the change. I had considered doing this a few times before. Since both the model and the vocabulary are meant to be immutable after creation, bundling them into the same struct will hardly cause issues.

setzer22 · 2023-04-13T09:28:04Z

llama-rs/src/inference_session.rs

+// The size of a scratch buffer used for inference. This is used for temporary
+// storage of intermediate results during inference.
+//
+// The specific value was copied from `llama.cpp`.
+const SCRATCH_SIZE: usize = 512 * 1024 * 1024;


I think llama.cpp figured out a proper way to compute this value. We should have a look at this. Not in this PR of course 👍

https://github.com/ggerganov/llama.cpp/blob/82d146df9b43cf677e0dbce20b03cf864958a0cc/llama.cpp#L45-L57

philpax · 2023-04-13T09:45:52Z

Hopefully this wouldn't cause too many conflicts with existing PRs? e.g. #125 comes to mind

Yeah, I'm a little concerned about that one myself, but I don't think it should be too bad - it'll mostly just be discarding the changes to loader.rs here. The others shouldn't be too hard to accommodate.

philpax added 10 commits April 13, 2023 04:09

refactor(llama): split inference session out

b0480a7

refactor(llama): move TokenUtf8Buffer to util

d3a8da1

refactor(llama): split vocabulary out

c95f0ef

refactor(llama): split loader out

c018af3

refactor(llama): move SnapshotError

0b671d3

refactor(llama): split model out

baaf9ab

refactor(llama): move InferenceStats

f5a1ed1

refactor(llama): include vocabulary in model

24ccef1

refactor(llama): move model creation into Model::new

470b807

refactor(llama): InferenceSession::new, Model::session_from_snapshot …

ec58e46

…-> InferenceSession::from_snapshot

philpax added the meta:maintenance Changes that will make it easier for us to maintain code label Apr 13, 2023

philpax requested a review from setzer22 April 13, 2023 03:06

setzer22 approved these changes Apr 13, 2023

View reviewed changes

philpax merged commit 4938dad into rustformers:main Apr 13, 2023

philpax deleted the split-up-modules branch April 13, 2023 09:51

philpax mentioned this pull request Apr 13, 2023

Investigate concurrent inference across threads with one model #95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split up modules #135

Split up modules #135

philpax commented Apr 13, 2023

setzer22 left a comment

setzer22 Apr 13, 2023

setzer22 Apr 13, 2023

philpax Apr 13, 2023

philpax commented Apr 13, 2023

Split up modules #135

Split up modules #135

Conversation

philpax commented Apr 13, 2023

setzer22 left a comment

Choose a reason for hiding this comment

setzer22 Apr 13, 2023

Choose a reason for hiding this comment

setzer22 Apr 13, 2023

Choose a reason for hiding this comment

philpax Apr 13, 2023

Choose a reason for hiding this comment

philpax commented Apr 13, 2023