Big speed regression with top-p sampling #246

xefoci7612 · 2023-08-06T16:59:07Z

On my system from 40 tok/s down to 33 tok/s, almost 20% slower...

$ ./run /tmp/ramdisk/model110m.bin -s 1 -p 0
Once upon a time <stripped> make a big difference in the world.
achieved tok/s: 40.298507
$ ./run /tmp/ramdisk/model110m.bin -s 1
Once upon a time <stripped> make a big difference in the world.
achieved tok/s: 33.588093

Slowdown is even more dramatic on smaller models

The text was updated successfully, but these errors were encountered:

xefoci7612 · 2023-08-06T20:12:22Z

Sorting all the probabilities seems an overkill.

Picking the top K largest values can be done in O(n) by a selection algorithm, like std::nth_element in std library.

Eventually we can keep full sorting but after a top-k step. It could be possible to chain top-K and top-P as described here:

While in theory, Top-p seems more elegant than Top-K, both methods work well in practice. Top-p can also be used in combination with Top-K, which can avoid very low ranked words while allowing for some dynamic selection.

For instance a top-K with k=60 before top-p will reduce final sorting from 30K elements of vocab to just 60....with a very high probability of getting the same result of a full sort.

twobob · 2023-08-06T23:58:08Z

./run
Usage: run [options]
Example: run model.bin -t 0.9 -n 256 -p "Once upon a time"
Options:
-t temperature, default 0.9
-s random seed, default time(NULL)
-n number of steps to run for, default 256. 0 = max_seq_len
-p prompt string, default none
-o top_p, default 1.0
-k print_tokens, 1 or 0 flag default 1
-f saveFileBool, 1 or 0 flag default 1
-l saveLogBool, 1 or 0 flag saves timestamps for tokens gen default 0
-b singleBOS, 1 or 0 flag default 1
-d dirname string, default none

qlora\llama2.c> 1..3 | ForEach-Object { 'runmingw.exe', 'rungcc.exe', 'run.exe' | ForEach-Object { $env:OMP_NUM_THREADS=4; & "./$_" '../out/model110M.bin' -t 0 -n 0 -p "Once upon a time" -s 42 -o 1.0 -k 0 -f 0 -l 0 -b 1 -d inbox } }

achieved tok/s: 89.689637 for MINGW
achieved tok/s: 90.555015 for GCC
achieved tok/s: 99.960915 for CLANG

achieved tok/s: 90.555015 for MINGW
achieved tok/s: 90.426942 for GCC
achieved tok/s: 95.162791 for CLANG

achieved tok/s: 90.187781 for MINGW
achieved tok/s: 91.062845 for GCC
achieved tok/s: 101.037037 for CLANG

qlora\llama2.c> 1..3 | ForEach-Object { 'runmingw.exe', 'rungcc.exe', 'run.exe' | ForEach-Object { $env:OMP_NUM_THREADS=4; & "./$_" '../out/model110M.bin' -t 0 -n 0 -p "Once upon a time" -s 42 -o .5 -k 0 -f 0 -l 0 -b 1 -d inbox } }

achieved tok/s: 88.000000 for MINGW using top_p
achieved tok/s: 89.080460 for GCC using top_p
achieved tok/s: 98.745174 for CLANG using top_p

achieved tok/s: 88.594440 for MINGW using top_p
achieved tok/s: 89.072704 for GCC using top_p
achieved tok/s: 98.745174 for CLANG using top_p

achieved tok/s: 87.879048 for MINGW using top_p
achieved tok/s: 86.372847 for GCC using top_p
achieved tok/s: 87.533156 for CLANG using top_p

Hard to say since being a cpu bound system with so much other overhead but running it with a variety of builds and over a period of time I would probably concur with these findings. fwiw.

(only a single run of 3 shown for each test but after a number of such runs these ones are as representative as any other)

twobob · 2023-08-09T22:13:44Z

I ran a few more tests, some top_p, some not, and when looking at the logs it did seem to now have two tiers of speed, this may lean into a noticable slowdown when using top _p
(I have actually shaved many of the other slowdowns from this version of the logging, so an A/B is doable)

I'll add some individualised logging for top_p RSN, was just incidental findings. Far from conclusive

but yeah. might need some fluffing

rdentato · 2023-08-10T16:01:17Z

Well, I submitted a patch, not sure if it's the way to go but seems to help.

karpathy · 2023-08-14T14:53:48Z

I think fixed now ty

jrudolph mentioned this issue Aug 13, 2023

optimize sample_topp by filtering out small value elements up front #276

Merged

karpathy closed this as completed Aug 14, 2023

Majdoddin mentioned this issue Aug 17, 2023

const cuttoff in sample_topp #313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big speed regression with top-p sampling #246

Big speed regression with top-p sampling #246

xefoci7612 commented Aug 6, 2023 •

edited

Loading

xefoci7612 commented Aug 6, 2023

twobob commented Aug 6, 2023 •

edited

Loading

twobob commented Aug 9, 2023 •

edited

Loading

rdentato commented Aug 10, 2023

karpathy commented Aug 14, 2023

Big speed regression with top-p sampling #246

Big speed regression with top-p sampling #246

Comments

xefoci7612 commented Aug 6, 2023 • edited Loading

xefoci7612 commented Aug 6, 2023

twobob commented Aug 6, 2023 • edited Loading

twobob commented Aug 9, 2023 • edited Loading

rdentato commented Aug 10, 2023

karpathy commented Aug 14, 2023

xefoci7612 commented Aug 6, 2023 •

edited

Loading

twobob commented Aug 6, 2023 •

edited

Loading

twobob commented Aug 9, 2023 •

edited

Loading