fix: inconsistent tokenization by llama tokenizer #3006

congchan · 2024-02-03T09:09:50Z

Why are these changes needed?

Fix the mismatched tokenization by llama tokenizer:

Testing

inputs:

[
    {
      "from": "system",
      "value": "You are an AI."
    },
    {
      "from": "human",
      "value": "What is up?"
    },
    {
      "from": "gpt",
      "value": "Hello! How can I help you today?"
    },
    {
      "from": "human",
      "value": "Who are you?"
    },
    {
      "from": "gpt",
      "value": "You can call me Vicuna, and I was trained by Large Model Systems Organization (LMSYS) researchers as a language model."
    },
    {
      "from": "human",
      "value": "Goodbye"
    },
    {
      "from": "gpt",
      "value": "Goodbye! If you have any more questions in the future, don't hesitate to ask."
    }
]

Llama 2 testing passed:

1 	 -100 	 <s>
529 	 -100 	 <
29989 	 -100 	 |
5205 	 -100 	 system
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

3492 	 -100 	 You
526 	 -100 	 are
385 	 -100 	 an
319 	 -100 	 A
29902 	 -100 	 I
29889 	 -100 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

5618 	 -100 	 What
338 	 -100 	 is
701 	 -100 	 up
29973 	 -100 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

10994 	 10994 	 Hello
29991 	 29991 	 !
1128 	 1128 	 How
508 	 508 	 can
306 	 306 	 I
1371 	 1371 	 help
366 	 366 	 you
9826 	 9826 	 today
29973 	 29973 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

22110 	 -100 	 Who
526 	 -100 	 are
366 	 -100 	 you
29973 	 -100 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

3492 	 3492 	 You
508 	 508 	 can
1246 	 1246 	 call
592 	 592 	 me
13423 	 13423 	 Vic
4347 	 4347 	 una
29892 	 29892 	 ,
322 	 322 	 and
306 	 306 	 I
471 	 471 	 was
16370 	 16370 	 trained
491 	 491 	 by
8218 	 8218 	 Lar
479 	 479 	 ge
8125 	 8125 	 Model
23985 	 23985 	 Systems
9205 	 9205 	 Organ
2133 	 2133 	 ization
313 	 313 	 (
29931 	 29931 	 L
4345 	 4345 	 MS
21554 	 21554 	 YS
29897 	 29897 	 )
5925 	 5925 	 research
414 	 414 	 ers
408 	 408 	 as
263 	 263 	 a
4086 	 4086 	 language
1904 	 1904 	 model
29889 	 29889 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

18420 	 -100 	 Good
26966 	 -100 	 bye
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

18420 	 18420 	 Good
26966 	 26966 	 bye
29991 	 29991 	 !
960 	 960 	 If
366 	 366 	 you
505 	 505 	 have
738 	 738 	 any
901 	 901 	 more
5155 	 5155 	 questions
297 	 297 	 in
278 	 278 	 the
5434 	 5434 	 future
29892 	 29892 	 ,
1016 	 1016 	 don
29915 	 29915 	 '
29873 	 29873 	 t
19066 	 19066 	 hes
10388 	 10388 	 itate
304 	 304 	 to
2244 	 2244 	 ask
29889 	 29889 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

0 	 -100 	 <unk>

Passed testing TinyLlama-1.1B-Chat-v1.0/ tokenizer with TinyLlama template. And we need to set the tokenizer.pad_token to tokenizer.unk_token to avoid getting the wrong total_len.

1 	 -100 	 <s>
529 	 -100 	 <
29989 	 -100 	 |
5205 	 -100 	 system
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

3492 	 -100 	 You
526 	 -100 	 are
385 	 -100 	 an
319 	 -100 	 A
29902 	 -100 	 I
29889 	 -100 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

5618 	 -100 	 What
338 	 -100 	 is
701 	 -100 	 up
29973 	 -100 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

10994 	 10994 	 Hello
29991 	 29991 	 !
1128 	 1128 	 How
508 	 508 	 can
306 	 306 	 I
1371 	 1371 	 help
366 	 366 	 you
9826 	 9826 	 today
29973 	 29973 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

22110 	 -100 	 Who
526 	 -100 	 are
366 	 -100 	 you
29973 	 -100 	 ?
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

3492 	 3492 	 You
508 	 508 	 can
1246 	 1246 	 call
592 	 592 	 me
13423 	 13423 	 Vic
4347 	 4347 	 una
29892 	 29892 	 ,
322 	 322 	 and
306 	 306 	 I
471 	 471 	 was
16370 	 16370 	 trained
491 	 491 	 by
8218 	 8218 	 Lar
479 	 479 	 ge
8125 	 8125 	 Model
23985 	 23985 	 Systems
9205 	 9205 	 Organ
2133 	 2133 	 ization
313 	 313 	 (
29931 	 29931 	 L
4345 	 4345 	 MS
21554 	 21554 	 YS
29897 	 29897 	 )
5925 	 5925 	 research
414 	 414 	 ers
408 	 408 	 as
263 	 263 	 a
4086 	 4086 	 language
1904 	 1904 	 model
29889 	 29889 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
1792 	 -100 	 user
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

18420 	 -100 	 Good
26966 	 -100 	 bye
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

29966 	 -100 	 <
29989 	 -100 	 |
465 	 -100 	 ass
22137 	 -100 	 istant
29989 	 -100 	 |
29958 	 -100 	 >
13 	 -100 	 

18420 	 18420 	 Good
26966 	 26966 	 bye
29991 	 29991 	 !
960 	 960 	 If
366 	 366 	 you
505 	 505 	 have
738 	 738 	 any
901 	 901 	 more
5155 	 5155 	 questions
297 	 297 	 in
278 	 278 	 the
5434 	 5434 	 future
29892 	 29892 	 ,
1016 	 1016 	 don
29915 	 29915 	 '
29873 	 29873 	 t
19066 	 19066 	 hes
10388 	 10388 	 itate
304 	 304 	 to
2244 	 2244 	 ask
29889 	 29889 	 .
2 	 -100 	 </s>
29871 	 -100 	 
13 	 -100 	 

0 	 -100 	 <unk>

Related issue number (if applicable)

This pr fix #2871 #2992

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <jon.durbin@onna.com> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <k.hennara@work-with-nerds.ca> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <crushed7@o.cnu.ac.kr> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) * Add deepseek chat (lm-sys#2760) * a convenient script for spinning up the API with Model Workers (lm-sys#2790) * Prevent returning partial stop string in vllm worker (lm-sys#2780) * Update UI and new models (lm-sys#2762) * Support MetaMath (lm-sys#2748) * Use common logging code in the OpenAI API server (lm-sys#2758) Co-authored-by: Warren Francis <warren@kududyn.com> * Show how to turn on experiment tracking for fine-tuning (lm-sys#2742) Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> * Support xDAN-L1-Chat Model (lm-sys#2732) * Format code * Update the version to 0.2.34 (lm-sys#2793) * add dolphin (lm-sys#2794) * Fix tiny typo (lm-sys#2805) * Add instructions for evaluating on MT bench using vLLM (lm-sys#2770) * Update README.md * Add SOLAR-10.7b Instruct Model (lm-sys#2826) * Update README.md (lm-sys#2852) * fix: 'compeletion' typo (lm-sys#2847) * Add Tunnelmole as an open source alternative to ngrok and include usage instructions (lm-sys#2846) * update readme * update mt-bench readme * Add support for CatPPT (lm-sys#2840) * Add functionality to ping AI2 InferD endpoints for tulu 2 (lm-sys#2832) Co-authored-by: Sam Skjonsberg <sams@allenai.org> * add download models from www.modelscope.cn (lm-sys#2830) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> * Fix conv_template of chinese alpaca 2 (lm-sys#2812) * add bagel model adapter (lm-sys#2814) * add root_path argument to gradio web server. (lm-sys#2807) Co-authored-by: bertls <s.bertl@iaea.org> * Import `accelerate` locally to avoid it as a strong dependency (lm-sys#2820) * Replace dict merge with unpacking for compatibility of 3.8 in vLLM worker (lm-sys#2824) Signed-off-by: rudeigerc <rudeigerc@gmail.com> * Format code (lm-sys#2854) * Openai API migrate (lm-sys#2765) * fix openai api server docs * Add a16z as a sponser * Add new models (Perplexity, gemini) & Separate GPT versions (lm-sys#2856) Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> * Clean error messages (lm-sys#2857) * Update docs (lm-sys#2858) * Modify doc description (lm-sys#2859) * Fix the problem of not using the decoding method corresponding to the base model in peft mode (lm-sys#2865) * update a new sota model on MT-Bench which touch an 8.8 scores. (lm-sys#2864) * NPU needs to be initialized when starting a new process (lm-sys#2843) * Fix the problem with "vllm + chatglm3" (lm-sys#2845) (lm-sys#2876) Co-authored-by: 姚峰 <yaofeng@chinaums.com> * Update token spacing for mistral conversation.py (lm-sys#2872) * check if hm in models before deleting to avoid errors (lm-sys#2870) Co-authored-by: Your Name <you@example.com> * Add TinyLlama (lm-sys#2889) * Fix bug that model doesn't automatically switch peft adapter (lm-sys#2884) * Update web server commands (lm-sys#2869) * fix the tokenize process and prompt template of chatglm3 (lm-sys#2883) Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com> * Add `Notus` support (lm-sys#2813) Co-authored-by: alvarobartt <alvaro@argilla.io> * feat: support anthropic api with api_dict (lm-sys#2879) * Update model_adapter.py (lm-sys#2895) * leaderboard code update (lm-sys#2867) * fix: change order of SEQUENCE_LENGTH_KEYS (lm-sys#2925) * fix baichuan:apply_prompt_template call args error (lm-sys#2921) Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local> * Fix a typo in openai_api_server.py (lm-sys#2905) * feat: use variables OPENAI_MODEL_LIST (lm-sys#2907) * Add TenyxChat-7B-v1 model (lm-sys#2901) Co-authored-by: sarath@L3 <[omitted]> * add support for iei yuan2.0 (https://huggingface.co/IEITYuan) (lm-sys#2919) * nous-hermes-2-mixtral-dpo (lm-sys#2922) * Bump the version to 0.2.35 (lm-sys#2927) * fix specify local path issue use model from www.modelscope.cn (lm-sys#2934) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> * support openai embedding for topic clustering (lm-sys#2729) * Remove duplicate API endpoint (lm-sys#2949) * Update Hermes Mixtral (lm-sys#2938) * Enablement of REST API Usage within Google Colab Free Tier (lm-sys#2940) * Create a new worker implementation for Apple MLX (lm-sys#2937) * feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System (lm-sys#2936) * Fix the pooling method of BGE embedding model (lm-sys#2926) * format code * SGLang Worker (lm-sys#2928) * Fix sglang worker (lm-sys#2953) * Update mlx_worker to be async (lm-sys#2958) * Integrate LightLLM into serve worker (lm-sys#2888) * Copy button (lm-sys#2963) * feat: train with template (lm-sys#2951) * fix content maybe a str (lm-sys#2968) * Adding download folder information in README (lm-sys#2972) * use cl100k_base as the default tiktoken encoding (lm-sys#2974) Signed-off-by: bjwswang <bjwswang@gmail.com> * Update README.md (lm-sys#2975) * Fix tokenizer for vllm worker (lm-sys#2984) * update yuan2.0 generation (lm-sys#2989) * fix: tokenization mismatch when training with different templates (lm-sys#2996) * fix: inconsistent tokenization by llama tokenizer (lm-sys#3006) * Fix type hint for play_a_match_single (lm-sys#3008) * code update (lm-sys#2997) * Update model_support.md (lm-sys#3016) * Update lightllm_integration.md (lm-sys#3014) * Upgrade gradio to 4.17 (lm-sys#3027) * Update MLX integration to use new generate_step function signature (lm-sys#3021) * Update readme (lm-sys#3028) * Update gradio version in `pyproject.toml` and fix a bug (lm-sys#3029) * Update gradio demo and API model providers (lm-sys#3030) * Gradio Web Server for Multimodal Models (lm-sys#2960) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Migrate the gradio server to openai v1 (lm-sys#3032) * Update version to 0.2.36 (lm-sys#3033) Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> * Add llava 34b template (lm-sys#3034) * Update model support (lm-sys#3040) * Add psutil to pyproject.toml dependencies (lm-sys#3039) * Fix SGLang worker (lm-sys#3045) * Random VQA Sample button for VLM direct chat (lm-sys#3041) * Update arena.md to fix link (lm-sys#3051) * multi inference --------- Signed-off-by: Lei Wen <wenlei03@qiyi.com> Signed-off-by: rudeigerc <rudeigerc@gmail.com> Signed-off-by: bjwswang <bjwswang@gmail.com> Co-authored-by: Trangle <kw_w@foxmail.com> Co-authored-by: Nathan Stitt <nathan@stitt.org> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Jon Durbin <jon@jondurbin.com> Co-authored-by: Jon Durbin <jon.durbin@onna.com> Co-authored-by: Rayrtfr <2384172887@qq.com> Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> Co-authored-by: wangxiyuan <wangxiyuan@huawei.com> Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com> Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com> Co-authored-by: obitolyz <obitoquilt@qq.com> Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com> Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr> Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com> Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch> Co-authored-by: Jae-Won Chung <jwnchung@umich.edu> Co-authored-by: Mingdao Liu <joshua@btlmd.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com> Co-authored-by: dongxiaolong <774848421@qq.com> Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> Co-authored-by: Siddartha Naidu <siddartha@abacus.ai> Co-authored-by: shuishu <990941859@qq.com> Co-authored-by: Andrew Aikawa <asai@berkeley.edu> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: enochlev <47466848+enochlev@users.noreply.github.com> Co-authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: Lé <lerela@users.noreply.github.com> Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com> Co-authored-by: khalil <90086758+khalil-Hennara@users.noreply.github.com> Co-authored-by: khalil <k.hennara@work-with-nerds.ca> Co-authored-by: dubaoquan404 <87166864@qq.com> Co-authored-by: Chang W. Lee <changlee99@gmail.com> Co-authored-by: theScotchGame <36061851+leonxia1018@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Stephen Horvath <s.horvath@outlook.com.au> Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com> Co-authored-by: Norman Mu <normster@users.noreply.github.com> Co-authored-by: Sebastian Bodza <66752172+SebastianBodza@users.noreply.github.com> Co-authored-by: Tianle (Tim) Li <67527391+CodingWithTim@users.noreply.github.com> Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu> Co-authored-by: Alex <alexander.s.delapaz@gmail.com> Co-authored-by: Jingcheng Hu <67776176+REIGN12@users.noreply.github.com> Co-authored-by: lvxuan <3645933+lvxuan263@users.noreply.github.com> Co-authored-by: cOng <erdongerzong@qq.com> Co-authored-by: bofeng huang <bofenghuang7@gmail.com> Co-authored-by: Phil-U-U <phil.h.cui@gmail.com> Co-authored-by: Wayne Spangenberg <waynespa@gmail.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Rohan Gupta <63547845+Gk-rohan@users.noreply.github.com> Co-authored-by: ugolotti <96428459+ugolotti@users.noreply.github.com> Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> Co-authored-by: edisonwd <2388100489@qq.com> Co-authored-by: FangYin Cheng <staneyffer@gmail.com> Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Srinath Janakiraman <me@vjsrinath.com> Co-authored-by: Jaeheon Jeong <tizm423@gmail.com> Co-authored-by: One <imoneoi@users.noreply.github.com> Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com> Co-authored-by: David <scenaristeur@gmail.com> Co-authored-by: Witold Wasiczko <snapshotpl@users.noreply.github.com> Co-authored-by: Peter Willemsen <peter@codebuffet.co> Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com> Co-authored-by: Forceless <72636351+Force1ess@users.noreply.github.com> Co-authored-by: Jeff <122586668+jm23jeffmorgan@users.noreply.github.com> Co-authored-by: MrZhengXin <34998703+MrZhengXin@users.noreply.github.com> Co-authored-by: Long Nguyen <long.nguyen11288@gmail.com> Co-authored-by: Elsa Granger <zeyugao@outlook.com> Co-authored-by: Christopher Chou <49086305+BabyChouSr@users.noreply.github.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: amaleshvemula <vemulaamalesh1997@gmail.com> Co-authored-by: Zollty Tsou <zollty@163.com> Co-authored-by: xuguodong1999 <bugxu@outlook.com> Co-authored-by: Michael J Kaye <1014467+mjkaye@users.noreply.github.com> Co-authored-by: 152334H <54623771+152334H@users.noreply.github.com> Co-authored-by: Jingsong-Yan <75230787+Jingsong-Yan@users.noreply.github.com> Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com> Co-authored-by: Chris Kerwell Gresla <80501101+ckgresla@users.noreply.github.com> Co-authored-by: pandada8 <pandada8@gmail.com> Co-authored-by: Isaac Ong <isaacong.jw@gmail.com> Co-authored-by: Warren Francis <geekoftheweek@users.noreply.github.com> Co-authored-by: Warren Francis <warren@kududyn.com> Co-authored-by: Morgan McGuire <morganmcg1@users.noreply.github.com> Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> Co-authored-by: xDAN-AI <128944251+xiechengmude@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Robbie <robbie-cahill@proton.me> Co-authored-by: Rishiraj Acharya <44090649+rishiraj@users.noreply.github.com> Co-authored-by: Nathan Lambert <nathanl@allenai.org> Co-authored-by: Sam Skjonsberg <sams@allenai.org> Co-authored-by: liuyhwangyh <liuyhwangyh@163.com> Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> Co-authored-by: stephanbertl <stephan@bweb.at> Co-authored-by: bertls <s.bertl@iaea.org> Co-authored-by: Chirag Jain <jain.chirag925@gmail.com> Co-authored-by: Yuchen Cheng <rudeigerc@gmail.com> Co-authored-by: Shuo Yang <73746844+andy-yang-1@users.noreply.github.com> Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> Co-authored-by: JQ <460494839@qq.com> Co-authored-by: yaofeng <yf_reg@outlook.com> Co-authored-by: 姚峰 <yaofeng@chinaums.com> Co-authored-by: Michael <67104840+thavens@users.noreply.github.com> Co-authored-by: Josh NE <renjunyao@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: WHDY <38045789+WHDY@users.noreply.github.com> Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com> Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> Co-authored-by: alvarobartt <alvaro@argilla.io> Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local> Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com> Co-authored-by: Sarath Shekkizhar <137322432+sarath-shekkizhar@users.noreply.github.com> Co-authored-by: wangpengfei1013 <155146149+wangpengfei1013@users.noreply.github.com> Co-authored-by: Alexandre Strube <a.strube@fz-juelich.de> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> Co-authored-by: Cristian Gutiérrez <57730982+ggcr@users.noreply.github.com> Co-authored-by: ali asaria <aliasaria@users.noreply.github.com> Co-authored-by: wulixuan <cauwulixuan@163.com> Co-authored-by: staoxiao <2906698981@qq.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: dheeraj-326 <dheeraj.326@gmail.com> Co-authored-by: bjwswang <30621793+bjwswang@users.noreply.github.com> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Ted Li <tl2493@columbia.edu> Co-authored-by: Shukant Pal <SukantK2002@outlook.com> Co-authored-by: Lisa Dunlap <lisabdunlap@gmail.com> Co-authored-by: Logan Kilpatrick <23kilpatrick23@gmail.com>

Oscarjia · 2024-04-23T11:53:42Z

Hi, @congchan Can you update the code to support llama3?

congchan added 2 commits February 3, 2024 16:33

fix: inconsistent tokenization by llama tokenzier

e34bc7b

explicitly set teh pad_token_id to unk_token_id

55c2258

congchan mentioned this pull request Feb 3, 2024

Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

Closed

merrymercy merged commit 3bef934 into lm-sys:main Feb 3, 2024
1 check passed

congchan deleted the fix_training_with_template branch February 17, 2024 09:01

congchan mentioned this pull request Feb 17, 2024

Using train_with_template on mistral end up in a model with a loop #3055

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: inconsistent tokenization by llama tokenizer #3006

fix: inconsistent tokenization by llama tokenizer #3006

congchan commented Feb 3, 2024

Oscarjia commented Apr 23, 2024 •

edited

Loading

fix: inconsistent tokenization by llama tokenizer #3006

fix: inconsistent tokenization by llama tokenizer #3006

Conversation

congchan commented Feb 3, 2024

Why are these changes needed?

Testing

Related issue number (if applicable)

Checks

Oscarjia commented Apr 23, 2024 • edited Loading

Oscarjia commented Apr 23, 2024 •

edited

Loading