Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

sunyuhan19981208 · 2023-05-10T11:43:05Z

I write a simple web gradio demo which supports multi-round conversation and multi-card inference. Here is the shell command to start the web demo:

python gradio_demo.py --base_model /home/sunyuhan/syh/sunyuhan/zju/llama-7b-hf/ --lora_model /home/sunyuhan/syh/sunyuhan/zju/chinese-alpaca-lora-7b --with_prompt --gpus 4,5,6,7

airaria · 2023-05-10T12:42:39Z

Looks awesome! Thank you for contribution.
But I have a question about loading Alpaca Plus models. Since the Plus needs two LoRA weights, so does the demo support multi-LoRA loading? Or alternatively, does the it support loading only base_model without lora_model (since users can merge the LoRAs into the base model to get a single merged Plus model weight file)?

sunyuhan19981208 · 2023-05-11T02:29:26Z

support base_model-only mode

airaria · 2023-05-11T04:41:13Z

support base_model-only mode

Thanks. We will perform some tests before confirming merging.

sunyuhan19981208 · 2023-05-11T05:42:26Z

support base_model-only mode

Thanks. We will perform some tests before confirming merging.

Please let me know if you have any advice or if you've found any bugs as soon as possible. I would be grateful to receive your feedback, and I'm eager to work on improving based on your suggestions. Thank you very much.

tluo-github · 2023-05-15T09:12:11Z

在这个 notebook 中:
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/convert_and_quantize_chinese_llama.ipynbusp=sharing#scrollTo=EjkXqaqbmrVZ
使用如下命令: 能成功运行
!cd Chinese-LLaMA-Alpaca/ && python scripts/my_gradio_demo.py \ --base_model 'decapoda-research/llama-7b-hf' \ --lora_model 'ziqingyang/chinese-alpaca-lora-7b'

在这个 notebook ：
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/finetune_chinese_alpaca_lora.ipynb
!cd Chinese-LLaMA-Alpaca/ && python scripts/my_gradio_demo.py \ --base_model 'decapoda-research/llama-7b-hf' \ --lora_model '/content/output_model/peft_model'
运行起来之后，提问会报错:
`
2023-05-15 08:58:49.995893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading checkpoint shards: 100% 33/33 [00:18<00:00, 1.75it/s]
Vocab of the base model: 32000
Vocab of the tokenizer: 49954
Resize model embeddings to fit tokenizer
loading peft model
Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://da6b02733e495413ee.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 414, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1320, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1048, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/content/Chinese-LLaMA-Alpaca/scripts/my_gradio_demo.py", line 132, in predict
generation_output = model.generate(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 581, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1524, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2810, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 358, in forward
result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float
Keyboard interruption in main thread... closing server.`

tluo-github · 2023-05-15T09:43:40Z

已解决:
load_type = torch.float16
改为如下:
load_type = torch.float32

sunyuhan19981208 added 2 commits May 10, 2023 19:34

feat: add a simple gradio web demo

62cd87a

doc: modify READMEs according to the new feature gradio demo

4941f94

airaria added 2 commits May 14, 2023 20:49

clean code and fix prompt

18f81b3

remove debug info

d23e3f2

airaria requested review from iMountTai, airaria and ymcui May 15, 2023 03:41

airaria assigned airaria and unassigned airaria May 15, 2023

airaria approved these changes May 15, 2023

View reviewed changes

iMountTai approved these changes May 15, 2023

View reviewed changes

ymcui merged commit 49ec61e into ymcui:main May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

sunyuhan19981208 commented May 10, 2023

airaria commented May 10, 2023

sunyuhan19981208 commented May 11, 2023 •

edited

Loading

airaria commented May 11, 2023 •

edited

Loading

sunyuhan19981208 commented May 11, 2023

tluo-github commented May 15, 2023

tluo-github commented May 15, 2023

Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

Conversation

sunyuhan19981208 commented May 10, 2023

airaria commented May 10, 2023

sunyuhan19981208 commented May 11, 2023 • edited Loading

airaria commented May 11, 2023 • edited Loading

sunyuhan19981208 commented May 11, 2023

tluo-github commented May 15, 2023

tluo-github commented May 15, 2023

sunyuhan19981208 commented May 11, 2023 •

edited

Loading

airaria commented May 11, 2023 •

edited

Loading