support qwen2 #5037

simonJJJ · 2024-01-19T09:36:33Z

This PR adds the support of codes for the coming Qwen2 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen. @ggerganov

JianbangZ · 2024-01-19T13:57:50Z

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

sorasoras · 2024-01-19T15:34:41Z

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

I think it's the same as Qwen1 which use titoken as well.

simonJJJ · 2024-01-20T14:17:14Z

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

The open-sourced version would not use tiktoken.

JianbangZ · 2024-01-21T15:12:36Z

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

The open-sourced version would not use tiktoken.

Could you take a look at this related issue #4331?
In my case when I use master llama.cpp and try out Qwen, it keeps outputing [PAD151643]

sorasoras · 2024-02-21T05:58:17Z

gguff

transformers

@simonJJJ Can you help with debugging this?
The output of QWEN1.5 is just incoherent in llama cpp now
My guess is this might have to do with convert gguf hf py

ggerganov · 2024-02-21T06:28:27Z

Can you provide HF repo with the model and a main command that you are using?

sorasoras · 2024-02-21T07:01:40Z

Can you provide HF repo with the model and a main command that you are using?

https://huggingface.co/SakuraLLM/Sakura-1B8-Qwen2beta-v0.9/tree/main
This Prompt is what this model is finetune for.

<|im_start|>system
你是一个轻小说翻译模型，可以流畅通顺地以日本轻小说的风格将日文翻译成简体中文，并联系上下文正确使用人称代词，不擅自添加原文中没有的代词。<|im_end|>
<|im_start|>user
将下面的日文文本翻译成中文："some Japanese<|im_end|>
<|im_start|>assistant
"some Chinese"<|im_end|>

you can try it with something like
"かくして、魔法使いの国は本当の姿を現しました。"
you should expect something like
于是，魔法师国家真正的姿态在我眼前展开。

ggerganov · 2024-02-21T07:07:48Z

This looks like a finetuned model - I need original Qwen2 models

sorasoras · 2024-02-21T07:38:39Z

This looks like a finetuned model - I need original Qwen2 models

https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main
that's would be the original
ps
this might be helpful
#5459 (comment)

ggerganov · 2024-02-21T08:19:50Z

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种，但无论是哪种途径，都离不开两个要素，即：实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中，每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验，甚至认为自己没有能力战胜这些困难。其实，从某种意义上来说，任何人的能力都是有限的，不可能人人都能获得成功，但一个人的能力毕竟有限，要想取得好的成绩，必须要有一个良好的心态，要正确看待自己在学习中的不足和缺陷，在学习中遇到困难或问题时应如何面对，如何用积极乐观的态度面对自己的学习困难。只有这样，我们才能真正理解什么是“学无止境”，什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

sorasoras · 2024-02-21T09:35:11Z

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种，但无论是哪种途径，都离不开两个要素，即：实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中，每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验，甚至认为自己没有能力战胜这些困难。其实，从某种意义上来说，任何人的能力都是有限的，不可能人人都能获得成功，但一个人的能力毕竟有限，要想取得好的成绩，必须要有一个良好的心态，要正确看待自己在学习中的不足和缺陷，在学习中遇到困难或问题时应如何面对，如何用积极乐观的态度面对自己的学习困难。只有这样，我们才能真正理解什么是“学无止境”，什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

I think you need a longer prompt to trigger the bug.

sorasoras · 2024-02-26T10:00:55Z

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种，但无论是哪种途径，都离不开两个要素，即：实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中，每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验，甚至认为自己没有能力战胜这些困难。其实，从某种意义上来说，任何人的能力都是有限的，不可能人人都能获得成功，但一个人的能力毕竟有限，要想取得好的成绩，必须要有一个良好的心态，要正确看待自己在学习中的不足和缺陷，在学习中遇到困难或问题时应如何面对，如何用积极乐观的态度面对自己的学习困难。只有这样，我们才能真正理解什么是“学无止境”，什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

I think I found the problem here
official repo of qwen1.5 does have a working GGUF

The left is the one I make with convert-hf-to-gguf.py
The right side is the one from official repo

support qwen2

3ca0153

ggerganov approved these changes Jan 19, 2024

View reviewed changes

ggerganov merged commit 9b75cb2 into ggerganov:master Jan 19, 2024
36 of 45 checks passed

snowyu mentioned this pull request Jan 20, 2024

Support Qwen Model by Alibaba mudler/LocalAI#1110

Closed

DavidGOrtega mentioned this pull request Jan 24, 2024

Requesting Qwen-7B Support #2528

Closed

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

llama : support upcoming Qwen2 (ggerganov#5037)

053bff9

cognitivetech mentioned this pull request Feb 9, 2024

Running Qwen ollama/ollama#2419

Closed

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

llama : support upcoming Qwen2 (ggerganov#5037)

018fc8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support qwen2 #5037

support qwen2 #5037

simonJJJ commented Jan 19, 2024

JianbangZ commented Jan 19, 2024

sorasoras commented Jan 19, 2024

simonJJJ commented Jan 20, 2024

JianbangZ commented Jan 21, 2024

sorasoras commented Feb 21, 2024

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024 •

edited

Loading

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024

sorasoras commented Feb 26, 2024

support qwen2 #5037

support qwen2 #5037

Conversation

simonJJJ commented Jan 19, 2024

JianbangZ commented Jan 19, 2024

sorasoras commented Jan 19, 2024

simonJJJ commented Jan 20, 2024

JianbangZ commented Jan 21, 2024

sorasoras commented Feb 21, 2024

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024 • edited Loading

ggerganov commented Feb 21, 2024

sorasoras commented Feb 21, 2024

sorasoras commented Feb 26, 2024

sorasoras commented Feb 21, 2024 •

edited

Loading