Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support qwen2 #5037

Merged
merged 1 commit into from
Jan 19, 2024
Merged

support qwen2 #5037

merged 1 commit into from
Jan 19, 2024

Conversation

simonJJJ
Copy link
Contributor

This PR adds the support of codes for the coming Qwen2 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen. @ggerganov

@ggerganov ggerganov merged commit 9b75cb2 into ggerganov:master Jan 19, 2024
36 of 45 checks passed
@JianbangZ
Copy link

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

@sorasoras
Copy link

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

I think it's the same as Qwen1 which use titoken as well.

@simonJJJ
Copy link
Contributor Author

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

The open-sourced version would not use tiktoken.

@JianbangZ
Copy link

@simonJJJ is tiktoken tokenizer being used? last time I checked though llama.cpp supports Qwen, it seems not using tiktoken.

The open-sourced version would not use tiktoken.

Could you take a look at this related issue #4331?
In my case when I use master llama.cpp and try out Qwen, it keeps outputing [PAD151643]

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
@sorasoras
Copy link

image
gguff
image
transformers

@simonJJJ Can you help with debugging this?
The output of QWEN1.5 is just incoherent in llama cpp now
My guess is this might have to do with convert gguf hf py

@ggerganov
Copy link
Owner

Can you provide HF repo with the model and a main command that you are using?

@sorasoras
Copy link

Can you provide HF repo with the model and a main command that you are using?

https://huggingface.co/SakuraLLM/Sakura-1B8-Qwen2beta-v0.9/tree/main
This Prompt is what this model is finetune for.

<|im_start|>system
你是一个轻小说翻译模型,可以流畅通顺地以日本轻小说的风格将日文翻译成简体中文,并联系上下文正确使用人称代词,不擅自添加原文中没有的代词。<|im_end|>
<|im_start|>user
将下面的日文文本翻译成中文:"some Japanese<|im_end|>
<|im_start|>assistant
"some Chinese"<|im_end|>

you can try it with something like
"かくして、魔法使いの国は本当の姿を現しました。"
you should expect something like
于是,魔法师国家真正的姿态在我眼前展开。

@ggerganov
Copy link
Owner

This looks like a finetuned model - I need original Qwen2 models

@sorasoras
Copy link

sorasoras commented Feb 21, 2024

This looks like a finetuned model - I need original Qwen2 models

https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main
that's would be the original
ps
this might be helpful
#5459 (comment)

@ggerganov
Copy link
Owner

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种,但无论是哪种途径,都离不开两个要素,即:实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中,每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验,甚至认为自己没有能力战胜这些困难。其实,从某种意义上来说,任何人的能力都是有限的,不可能人人都能获得成功,但一个人的能力毕竟有限,要想取得好的成绩,必须要有一个良好的心态,要正确看待自己在学习中的不足和缺陷,在学习中遇到困难或问题时应如何面对,如何用积极乐观的态度面对自己的学习困难。只有这样,我们才能真正理解什么是“学无止境”,什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

@sorasoras
Copy link

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种,但无论是哪种途径,都离不开两个要素,即:实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中,每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验,甚至认为自己没有能力战胜这些困难。其实,从某种意义上来说,任何人的能力都是有限的,不可能人人都能获得成功,但一个人的能力毕竟有限,要想取得好的成绩,必须要有一个良好的心态,要正确看待自己在学习中的不足和缺陷,在学习中遇到困难或问题时应如何面对,如何用积极乐观的态度面对自己的学习困难。只有这样,我们才能真正理解什么是“学无止境”,什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

I think you need a longer prompt to trigger the bug.

@sorasoras
Copy link

Looks to be working on my side (ps. I don't know Chinese):

make -j main && ./main -m ./models/qwen-1.8b-v1.5/ggml-model-f16.gguf -p "我相信生命的意义在于" -s 3 -ngl 99

...

我相信生命的意义在于创造。创造的途径有多种,但无论是哪种途径,都离不开两个要素,即:实践和创新。
我们今天要谈的话题是“如何正确看待学习中的困难”。在我们的日常生活中,每个人都会遇到大大小小的学习困难。有的同学可能会觉得学习上的困难对自己是一种巨大的考验,甚至认为自己没有能力战胜这些困难。其实,从某种意义上来说,任何人的能力都是有限的,不可能人人都能获得成功,但一个人的能力毕竟有限,要想取得好的成绩,必须要有一个良好的心态,要正确看待自己在学习中的不足和缺陷,在学习中遇到困难或问题时应如何面对,如何用积极乐观的态度面对自己的学习困难。只有这样,我们才能真正理解什么是“学无止境”,什么是“学海无涯”。 [end of text]

llama_print_timings:        load time =     212.41 ms
llama_print_timings:      sample time =      47.52 ms /   161 runs   (    0.30 ms per token,  3387.76 tokens per second)
llama_print_timings: prompt eval time =      29.97 ms /     4 tokens (    7.49 ms per token,   133.45 tokens per second)
llama_print_timings:        eval time =    1356.25 ms /   160 runs   (    8.48 ms per token,   117.97 tokens per second)
llama_print_timings:       total time =    1491.44 ms /   164 tokens

Used this repo: https://huggingface.co/Qwen/Qwen1.5-1.8B/tree/main

Converted using this command:

python3 convert-hf-to-gguf.py ~/Data/huggingface/Qwen1.5-1.8B/ --outfile models/qwen-1.8b-v1.5/ggml-model-f16.gguf --outtype f16

I think I found the problem here
official repo of qwen1.5 does have a working GGUF
tg_image_1799388023
The left is the one I make with convert-hf-to-gguf.py
The right side is the one from official repo

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants