-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle OOM better on smaller/older GPUs, or bigger models on regular GPUs #150
Comments
We can do a monkey patch to the finetune config using gpu runtime information. Or we can simply add another config for |
I was having the same error, I set the token limit to 2048 but now get another cuda error when running the Filter step:
How much vram is actually needed for fine tuning 1.6B model? |
It doesn't say "out of memory" for you. 🤔 Not sure how to debug this. @bonswouar what GPU do you have? |
I've just tried on Linux to see if output is any different (I noticed the model seems much faster to load btw), this time I always get (with 2048 tokens also):
Only a GTX 970. I was hopping this would be enough as I can ran 7B quantized models, but I guess I was a bit optimistic :) |
Workaround: change tokens parameter from 4096 to 2048 for Refact/1.6B
The text was updated successfully, but these errors were encountered: