-
Notifications
You must be signed in to change notification settings - Fork 368
Conversation
Awesome work! I've left some feedback; it's not the most Rust-y code, but that's fine as it's a port and we can fix that up later on 🙂 Really appreciate you doing this, it's great to get one step closer to being completely standalone 🚀 |
Your comments arent showing up in the PR, And I fully agree, im going to go ahead and fix all the clippy issues plus see if I can improve some of the logic. Do you have any recommendations on data reading and writing? I use the same buffer multiple times since its more efficient when working on rust embedded systems so I went ahead and did the same here. |
Weird, I can definitely see the comments here and in the diff. Not sure what's happening there. For data read/write, not sure - reusing the same buffer seems reasonable if they're semantically similar, but I'd just use a new buffer if they're not. Do you have any examples of something you'd want advice on? |
I've updated the PR, but the output seems to be incorrect:
Probably an assumption somewhere that I broke. Need to look into it further - any ideas? I'd also like to support loading unversioned models and GGJT, so this is going to be a bit of a headache in general :( |
Looking at your commits, I found a couple of places where it could've broken, Ill check it out and see if it fixes it. |
@philpax Regarding supporting other types of models, if you can provide the relevant issues I could research into making those work. |
Ok, updated to the latest
assert_eq!(src.len(), n as usize);
assert_eq!(dst.len(), n as usize);
assert!(hist.len() >= 16); |
What about |
The issue is the size - it should be equivalent to the size of the original array in bytes. I guess we could shove 4*size onto the user, it's not that big of a deal |
Merged in main again, comments/questions from above still apply |
|
Cool then, let's see if #125 happens soon and in the meantime we can fix 2/3. |
#125 is going to be merged soon if all goes well, but its ggml-format loader doesn't work in its current stage. Given that, I think we're OK to merge this once that's in. Let's try to get support for the other formats as soon as possible, but I won't let that block merging. |
I implemented write support for the loader (now |
I went ahead and ported the main
quantize.cpp
file. My changes involve porting the file while keeping the internal C++ function calls intact. I have plans to port those function calls to remove ggml dependencies in a future PR though.During the porting process, I faced some challenges as I was not familiar with how to use
Context
. As a result, I added thehalf
library to handle the f16->f32 conversion. I could remove the dependency if needed but ill need some help with working withContext
. Something to note on this is that if there are plans to move away from ggml then thehalf
library will be necessary.Additionally, I included some print statements inside the function to mimic the original behavior of quantize.cpp. I can remove those if needed.
Currently, there is no way to access the function since I did not implement a CLI function for it.
I am open to feedback and suggestions on how to improve this pull request.
Resolves #40