PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/65B/ggml-model-f16.bin -p "This is a long story about how programming came to be:" -n 100 -t 32 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap -b 512 main: seed = 132456 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.1 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.2 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.3 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.4 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.5 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.6 llama.cpp: loading model from ./models/65B/ggml-model-f16.bin.7 llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_layer = 80 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 1 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: n_parts = 8 llama_model_load_internal: type = 4 llama_model_load_internal: ggml ctx size = 127513778.86 KB llama_model_load_internal: mem required = 127085.17 MB (+ 5120.00 MB per state) .................................................................................................... llama_init_from_file: kv self size = 5120.00 MB system_info: n_threads = 32 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 512, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: The first programmable computer was invented by Konrad Zuse in 1941. It was called the Z3 and it used binary code. In 1950, John Mauchly and J. Presper Eckert built a machine called the UNIVAC (Universal Automatic Computer). This machine could perform calculations at the rate of one every second. In 1964, IBM developed its first computer for business use. It llama_print_timings: load time = 106470.18 ms llama_print_timings: sample time = 42.30 ms / 100 runs ( 0.42 ms per run) llama_print_timings: prompt eval time = 10114.21 ms / 13 tokens ( 778.02 ms per token) llama_print_timings: eval time = 188545.77 ms / 99 runs ( 1904.50 ms per run) llama_print_timings: total time = 295063.89 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/7B/ggml-model-q4_0.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap main: seed = 132456 llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 4113739.11 KB llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state) .................................................................................................... llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the early days of computers, they were used for one thing only: calculating. They were huge machines with lots of vacuum tubes and switches that ran on punch cards (think old-school IBM Selectric). Programming was done by hand, using a language called Fortran. Fortran was a very low-level language, which meant it had to be translated into machine code before the computer could run it. This was a slow process and took up llama_print_timings: load time = 2200.97 ms llama_print_timings: sample time = 45.29 ms / 100 runs ( 0.45 ms per run) llama_print_timings: prompt eval time = 1034.14 ms / 13 tokens ( 79.55 ms per token) llama_print_timings: eval time = 9059.07 ms / 99 runs ( 91.51 ms per run) llama_print_timings: total time = 11580.57 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/7B/ggml-model-f16.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos main: seed = 132456 llama.cpp: loading model from ./models/7B/ggml-model-f16.bin llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 1 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 14645.07 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the 1950s, computers were big and expensive. They were used by governments and large corporations for scientific research and business applications. In the 1960s, they became smaller and cheaper, and more people started using them. The first computer programmers were mostly men who worked in government or industry. They wrote programs to solve problems that needed a lot of computing power. Programming was hard work because computers used punch cards and paper t llama_print_timings: load time = 5746.67 ms llama_print_timings: sample time = 45.58 ms / 100 runs ( 0.46 ms per run) llama_print_timings: prompt eval time = 1842.97 ms / 13 tokens ( 141.77 ms per token) llama_print_timings: eval time = 19799.40 ms / 99 runs ( 199.99 ms per run) llama_print_timings: total time = 25877.21 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/7B/ggml-model-q4_0.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos main: seed = 132456 llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the early days of computers, they were used for one thing only: calculating. They were huge machines with lots of vacuum tubes and switches that ran on punch cards (think old-school IBM Selectric). Programming was done by hand, using a language called Fortran. Fortran was a very low-level language, which meant it had to be translated into machine code before the computer could run it. This was a slow process and took up llama_print_timings: load time = 1442.17 ms llama_print_timings: sample time = 45.80 ms / 100 runs ( 0.46 ms per run) llama_print_timings: prompt eval time = 1404.30 ms / 13 tokens ( 108.02 ms per token) llama_print_timings: eval time = 8940.16 ms / 99 runs ( 90.30 ms per run) llama_print_timings: total time = 10700.46 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/alpaca-native-enhanced-7B/ggml-model-q4_0.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap main: seed = 132456 llama.cpp: loading model from ./models/alpaca-native-enhanced-7B/ggml-model-q4_0.bin llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 4113739.11 KB llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the 1950s, computers were huge and expensive. They were used for military and government purposes only. In order to make them more accessible to scientists and engineers, they had to be miniaturized and made more affordable. This is when the concept of programming was born. Programming languages were developed that allowed users to give instructions to a computer in a language that it could understand. The first programming language was called Fortran, which stands for Formula Trans llama_print_timings: load time = 2304.64 ms llama_print_timings: sample time = 45.66 ms / 100 runs ( 0.46 ms per run) llama_print_timings: prompt eval time = 1118.61 ms / 13 tokens ( 86.05 ms per token) llama_print_timings: eval time = 8963.01 ms / 99 runs ( 90.54 ms per run) llama_print_timings: total time = 11581.31 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/alpaca-native-enhanced-7B/ggml-model-f16.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos main: seed = 132456 llama.cpp: loading model from ./models/alpaca-native-enhanced-7B/ggml-model-f16.bin llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 1 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 14645.07 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the 1950s, computers were large and expensive machines that could only be programmed by experts. In order to make them more accessible, a new language called FORTRAN was developed which allowed users to write programs in English-like syntax. This made it easier for non-experts to learn how to use the computer. In 1964, John Backus designed the first high-level programming language called ALGOL. It was an important llama_print_timings: load time = 5936.85 ms llama_print_timings: sample time = 46.48 ms / 100 runs ( 0.46 ms per run) llama_print_timings: prompt eval time = 1985.97 ms / 13 tokens ( 152.77 ms per token) llama_print_timings: eval time = 20602.80 ms / 99 runs ( 208.11 ms per run) llama_print_timings: total time = 26879.55 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/gpt4all-7B/gpt4all-lora-quantized.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap main: seed = 132456 llama.cpp: loading model from ./models/gpt4all-7B/gpt4all-lora-quantized.bin llama_model_load_internal: n_vocab = 32001 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 4113744.11 KB llama_model_load_internal: mem required = 5809.33 MB (+ 1026.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the early 1940s, computers were large and expensive machines that were used for scientific and military purposes. They were operated by programmers who wrote programs in machine code, which was a series of ones and zeros that corresponded to instructions for the computer to perform. Programming languages such as Fortran, Cobol, and Algol emerged during this time period, but they were still too complex for most people to use. Then came the 19 llama_print_timings: load time = 1860.44 ms llama_print_timings: sample time = 44.91 ms / 100 runs ( 0.45 ms per run) llama_print_timings: prompt eval time = 776.42 ms / 13 tokens ( 59.72 ms per token) llama_print_timings: eval time = 8780.59 ms / 99 runs ( 88.69 ms per run) llama_print_timings: total time = 10957.94 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/gpt4all-7B/gpt4all-lora-quantized-ggjt.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap main: seed = 132456 llama.cpp: loading model from ./models/gpt4all-7B/gpt4all-lora-quantized-ggjt.bin llama_model_load_internal: n_vocab = 32001 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 4113744.11 KB llama_model_load_internal: mem required = 5809.33 MB (+ 1026.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the early 1940s, computers were large and expensive machines that were used for scientific and military purposes. They were operated by programmers who wrote programs in machine code, which was a series of ones and zeros that corresponded to instructions for the computer to perform. Programming languages such as Fortran, Cobol, and Algol emerged during this time period, but they were still too complex for most people to use. Then came the 19 llama_print_timings: load time = 2209.80 ms llama_print_timings: sample time = 45.36 ms / 100 runs ( 0.45 ms per run) llama_print_timings: prompt eval time = 989.79 ms / 13 tokens ( 76.14 ms per token) llama_print_timings: eval time = 9716.19 ms / 99 runs ( 98.14 ms per run) llama_print_timings: total time = 12257.87 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/gpt4all-7B/gpt4all-lora-quantized-ggjt.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos main: seed = 132456 llama.cpp: loading model from ./models/gpt4all-7B/gpt4all-lora-quantized-ggjt.bin llama_model_load_internal: n_vocab = 32001 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: type = 1 llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5809.33 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 8, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: In the early 1940s, computers were large and expensive machines that were used for scientific and military purposes. They were operated by programmers who wrote programs in machine code, which was a series of ones and zeros that corresponded to instructions for the computer to perform. Programming languages such as Fortran, Cobol, and Algol emerged during this time period, but they were still too complex for most people to use. Then came the 19 llama_print_timings: load time = 1101.60 ms llama_print_timings: sample time = 45.48 ms / 100 runs ( 0.45 ms per run) llama_print_timings: prompt eval time = 1055.18 ms / 13 tokens ( 81.17 ms per token) llama_print_timings: eval time = 9163.82 ms / 99 runs ( 92.56 ms per run) llama_print_timings: total time = 10588.21 ms PS C:\DATA\TestLLama\llama> ./main_pizza -m ./models/65B/ggml-model-q4_0.bin -p "This is a long story about how programming came to be:" -n 100 -t 24 --temp 0.2 -c 2048 -s 132456 --ignore-eos --no-mmap -b 512 main: seed = 132456 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.1 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.2 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.3 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.4 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.5 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.6 llama.cpp: loading model from ./models/65B/ggml-model-q4_0.bin.7 llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_layer = 80 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: n_parts = 8 llama_model_load_internal: type = 4 llama_model_load_internal: ggml ctx size = 39851698.86 KB llama_model_load_internal: mem required = 41477.67 MB (+ 5120.00 MB per state) .................................................................................................... llama_init_from_file: kv self size = 5120.00 MB system_info: n_threads = 24 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2048, n_batch = 512, n_predict = 100, n_keep = 0 This is a long story about how programming came to be: The first computers were programmed by hand, literally. The earliest computers had front panels with switches and lights that could be used to enter programs directly into memory. Later machines had punched paper tape readers. Early versions of the Fortran language allowed the use of a "do loop" which was a series of statements that were executed repeatedly until some condition was met. The number of times through the loop was controlled with a variable known as the "counter". The counter usually llama_print_timings: load time = 24007.64 ms llama_print_timings: sample time = 47.19 ms / 100 runs ( 0.47 ms per run) llama_print_timings: prompt eval time = 7241.78 ms / 13 tokens ( 557.06 ms per token) llama_print_timings: eval time = 66380.09 ms / 99 runs ( 670.51 ms per run) llama_print_timings: total time = 90440.56 ms PS C:\DATA\TestLLama\llama>