Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big speed regression with top-p sampling #246

Closed
xefoci7612 opened this issue Aug 6, 2023 · 5 comments
Closed

Big speed regression with top-p sampling #246

xefoci7612 opened this issue Aug 6, 2023 · 5 comments

Comments

@xefoci7612
Copy link

xefoci7612 commented Aug 6, 2023

On my system from 40 tok/s down to 33 tok/s, almost 20% slower...

$ ./run /tmp/ramdisk/model110m.bin -s 1 -p 0
Once upon a time <stripped> make a big difference in the world.
achieved tok/s: 40.298507
$ ./run /tmp/ramdisk/model110m.bin -s 1
Once upon a time <stripped> make a big difference in the world.
achieved tok/s: 33.588093

Slowdown is even more dramatic on smaller models

@xefoci7612
Copy link
Author

Sorting all the probabilities seems an overkill.

Picking the top K largest values can be done in O(n) by a selection algorithm, like std::nth_element in std library.

Eventually we can keep full sorting but after a top-k step. It could be possible to chain top-K and top-P as described here:

While in theory, Top-p seems more elegant than Top-K, both methods work well in practice. Top-p can also be used in combination with Top-K, which can avoid very low ranked words while allowing for some dynamic selection.

For instance a top-K with k=60 before top-p will reduce final sorting from 30K elements of vocab to just 60....with a very high probability of getting the same result of a full sort.

@twobob
Copy link

twobob commented Aug 6, 2023

./run
Usage: run [options]
Example: run model.bin -t 0.9 -n 256 -p "Once upon a time"
Options:
-t temperature, default 0.9
-s random seed, default time(NULL)
-n number of steps to run for, default 256. 0 = max_seq_len
-p prompt string, default none
-o top_p, default 1.0
-k print_tokens, 1 or 0 flag default 1
-f saveFileBool, 1 or 0 flag default 1
-l saveLogBool, 1 or 0 flag saves timestamps for tokens gen default 0
-b singleBOS, 1 or 0 flag default 1
-d dirname string, default none

qlora\llama2.c> 1..3 | ForEach-Object { 'runmingw.exe', 'rungcc.exe', 'run.exe' | ForEach-Object { $env:OMP_NUM_THREADS=4; & "./$_" '../out/model110M.bin' -t 0 -n 0 -p "Once upon a time" -s 42 -o 1.0 -k 0 -f 0 -l 0 -b 1 -d inbox } }

achieved tok/s: 89.689637 for MINGW
achieved tok/s: 90.555015 for GCC
achieved tok/s: 99.960915 for CLANG

achieved tok/s: 90.555015 for MINGW
achieved tok/s: 90.426942 for GCC
achieved tok/s: 95.162791 for CLANG

achieved tok/s: 90.187781 for MINGW
achieved tok/s: 91.062845 for GCC
achieved tok/s: 101.037037 for CLANG

qlora\llama2.c> 1..3 | ForEach-Object { 'runmingw.exe', 'rungcc.exe', 'run.exe' | ForEach-Object { $env:OMP_NUM_THREADS=4; & "./$_" '../out/model110M.bin' -t 0 -n 0 -p "Once upon a time" -s 42 -o .5 -k 0 -f 0 -l 0 -b 1 -d inbox } }

achieved tok/s: 88.000000 for MINGW using top_p
achieved tok/s: 89.080460 for GCC using top_p
achieved tok/s: 98.745174 for CLANG using top_p

achieved tok/s: 88.594440 for MINGW using top_p
achieved tok/s: 89.072704 for GCC using top_p
achieved tok/s: 98.745174 for CLANG using top_p

achieved tok/s: 87.879048 for MINGW using top_p
achieved tok/s: 86.372847 for GCC using top_p
achieved tok/s: 87.533156 for CLANG using top_p

Hard to say since being a cpu bound system with so much other overhead but running it with a variety of builds and over a period of time I would probably concur with these findings. fwiw.

(only a single run of 3 shown for each test but after a number of such runs these ones are as representative as any other)

@twobob
Copy link

twobob commented Aug 9, 2023

I ran a few more tests, some top_p, some not, and when looking at the logs it did seem to now have two tiers of speed, this may lean into a noticable slowdown when using top _p
(I have actually shaved many of the other slowdowns from this version of the logging, so an A/B is doable)
image
I'll add some individualised logging for top_p RSN, was just incidental findings. Far from conclusive
image
but yeah. might need some fluffing

@rdentato
Copy link
Contributor

Well, I submitted a patch, not sure if it's the way to go but seems to help.

@karpathy
Copy link
Owner

I think fixed now ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants