forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from MagnusS0/noramistral-tokenizer
feat: add compatability with noramistral
- Loading branch information
Showing
27 changed files
with
1,359 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
ied 4 ½ months | ||
__ggml_vocab_test__ | ||
Führer | ||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
this is 🦙.cpp | ||
__ggml_vocab_test__ | ||
w048 7tuijk dsdfhu | ||
__ggml_vocab_test__ | ||
нещо на Български | ||
__ggml_vocab_test__ | ||
កាន់តែពិសេសអាចខលចេញ | ||
__ggml_vocab_test__ | ||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token) | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
Hello | ||
__ggml_vocab_test__ | ||
( | ||
__ggml_vocab_test__ | ||
|
||
= | ||
__ggml_vocab_test__ | ||
' era | ||
__ggml_vocab_test__ | ||
Hello, y'all! How are you 😁 ?我想在apple工作1314151天~ | ||
__ggml_vocab_test__ | ||
3 | ||
__ggml_vocab_test__ | ||
33 | ||
__ggml_vocab_test__ | ||
333 | ||
__ggml_vocab_test__ | ||
3333 | ||
__ggml_vocab_test__ | ||
33333 | ||
__ggml_vocab_test__ | ||
333333 | ||
__ggml_vocab_test__ | ||
3333333 | ||
__ggml_vocab_test__ | ||
33333333 | ||
__ggml_vocab_test__ | ||
333333333 | ||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL | ||
__ggml_vocab_test__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
5187 879 59261 21535 | ||
42 6395 3776 266 | ||
|
||
225 | ||
261 | ||
264 | ||
202 | ||
203 | ||
420 | ||
3712 | ||
11208 | ||
10564 7550 | ||
28137 7550 | ||
10564 10288 | ||
28137 10288 | ||
28137 10288 5 | ||
10564 16 7550 5 | ||
28137 16 7550 5 | ||
472 453 9919 104 252 18 3029 | ||
91 13392 1577 54321 19498 364 46363 8437 | ||
36655 13633 1769 14501 54827 21893 3849 10107 13878 41078 | ||
13065 227 50218 13065 246 25763 238 13065 242 25763 229 13065 249 13065 120 13065 258 25763 228 13065 258 13065 100 50218 13065 232 13065 228 13065 254 13065 232 25763 228 13065 236 | ||
8000 253 227 301 4411 13 9919 251 119 2965 240 8000 239 109 26726 301 10186 3520 23869 302 45604 13 12284 255 232 301 2895 53752 810 1533 2920 4613 3565 13 | ||
10564 | ||
28137 | ||
225 28137 | ||
261 28137 | ||
264 28137 | ||
264 28137 287 28137 | ||
301 | ||
203 278 | ||
11 225 3742 | ||
10564 16 711 11 474 5 8294 1021 1212 9919 251 228 959 10133 23692 5928 9173 33543 1330 1254 13567 22873 44634 257 | ||
23 | ||
1103 | ||
9581 | ||
3303 | ||
20428 | ||
13652 | ||
3303 9581 | ||
8274 | ||
8274 23 | ||
319 655 7239 11489 274 6881 12642 16716 203 8000 253 227 301 4411 13 9919 251 119 2965 240 8000 239 109 26726 301 10186 3520 23869 302 45604 13 12284 255 232 9919 104 252 8000 104 252 795 8104 38292 795 9581 795 3303 795 20428 795 13652 795 3303 9581 795 18 23 795 419 23 795 1713 23 225 13065 227 50218 13065 246 25763 238 13065 242 25763 229 13065 249 13065 120 13065 258 25763 228 13065 258 13065 100 50218 13065 232 8000 251 228 959 10133 23692 5928 9173 33543 1330 1254 13567 22873 44634 257 36031 12434 16706 13633 1769 14501 54827 21893 3849 10107 13878 41078 7095 9107 30834 2678 1246 1246 40651 13911 5366 23681 7887 527 10105 3081 363 88 1505 1063 1476 2866 16 363 495 1212 4509 35 363 49 691 4509 527 9104 2554 605 16 363 40 1212 4156 2681 4594 69 35 2893 11 30247 323 11 80 48 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
ied 4 ½ months | ||
__ggml_vocab_test__ | ||
Führer | ||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
this is 🦙.cpp | ||
__ggml_vocab_test__ | ||
w048 7tuijk dsdfhu | ||
__ggml_vocab_test__ | ||
нещо на Български | ||
__ggml_vocab_test__ | ||
កាន់តែពិសេសអាចខលចេញ | ||
__ggml_vocab_test__ | ||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token) | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
Hello | ||
__ggml_vocab_test__ | ||
( | ||
__ggml_vocab_test__ | ||
|
||
= | ||
__ggml_vocab_test__ | ||
' era | ||
__ggml_vocab_test__ | ||
Hello, y'all! How are you 😁 ?我想在apple工作1314151天~ | ||
__ggml_vocab_test__ | ||
3 | ||
__ggml_vocab_test__ | ||
33 | ||
__ggml_vocab_test__ | ||
333 | ||
__ggml_vocab_test__ | ||
3333 | ||
__ggml_vocab_test__ | ||
33333 | ||
__ggml_vocab_test__ | ||
333333 | ||
__ggml_vocab_test__ | ||
3333333 | ||
__ggml_vocab_test__ | ||
33333333 | ||
__ggml_vocab_test__ | ||
333333333 | ||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL | ||
__ggml_vocab_test__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
1009 699 35137 3294 | ||
39832 261 | ||
|
||
225 | ||
6733 | ||
53448 | ||
202 | ||
203 | ||
203 203 | ||
203 203 203 | ||
202 203 | ||
17964 1568 | ||
29546 1568 | ||
17964 3519 | ||
29546 3519 | ||
29546 3519 5 | ||
17964 16 1568 5 | ||
29546 16 1568 5 | ||
555 337 5060 104 252 18 71 428 | ||
91 3079 28 964 30274 48013 267 87 6649 14811 | ||
9024 6983 146 236 6294 52261 4933 244 146 237 13905 32390 46632 51078 | ||
162 257 227 162 257 119 162 257 246 162 258 238 162 257 242 162 258 229 162 257 249 162 257 120 162 257 258 162 258 228 162 257 258 162 257 100 162 257 119 162 257 232 162 257 228 162 257 254 162 257 232 162 258 228 162 257 236 | ||
3753 253 227 406 17453 13 10278 119 54678 3753 239 109 16598 406 52806 1504 5752 78 276 2365 851 697 13 38607 406 20529 5752 12069 413 671 983 1469 30658 13 | ||
17964 | ||
29546 | ||
225 29546 | ||
6733 29546 | ||
53448 29546 | ||
53448 29546 203 53448 29546 | ||
406 | ||
203 3887 | ||
11 15453 | ||
17964 16 361 11 476 5 1953 459 426 10278 228 4985 167 235 244 167 230 116 57520 106 33974 166 120 103 46520 255 2281 2237 42047 47551 107 176 126 257 | ||
23 | ||
3837 | ||
45768 | ||
3837 3837 | ||
3837 45768 | ||
3837 3837 3837 | ||
3837 3837 45768 | ||
3837 3837 3837 3837 | ||
3837 3837 3837 45768 | ||
203 225 203 203 225 203 203 203 225 202 225 202 202 225 202 203 6733 203 53448 203 13607 203 13607 225 203 3753 253 227 406 17453 13 10278 119 54678 3753 239 109 16598 406 52806 1504 5752 78 276 2365 851 697 13 38607 5060 104 252 3753 104 252 589 8235 54381 589 45768 54381 3837 54381 45768 54381 3837 3837 54381 3837 45768 589 18 23 589 466 23 589 714 23 34376 257 227 162 257 119 162 257 246 162 258 238 162 257 242 162 258 229 162 257 249 162 257 120 162 257 258 162 258 228 162 257 258 162 257 100 162 257 119 162 257 232 32164 228 4985 167 235 244 167 230 116 57520 106 33974 166 120 103 46520 255 2281 2237 42047 47551 107 176 126 257 485 6624 17 30007 14589 33 36028 6983 146 236 6294 52261 4933 244 146 237 13905 32390 46632 51078 1268 12228 12228 11 51396 51396 51396 68 30699 30699 21828 11344 1844 20800 4300 324 1990 927 1268 88 939 540 507 899 16 1268 3136 426 2158 35 1268 49 586 2158 324 2202 1066 436 16 1268 40 426 917 822 11788 35 628 11 30868 264 11 80 48 |
Oops, something went wrong.