Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Gradio web demo for Chinese-LLaMA-Alpaca #300

Merged
merged 4 commits into from
May 15, 2023

Conversation

sunyuhan19981208
Copy link
Contributor

截屏2023-05-10 下午7 27 14

I write a simple web gradio demo which supports multi-round conversation and multi-card inference. Here is the shell command to start the web demo:

python gradio_demo.py --base_model /home/sunyuhan/syh/sunyuhan/zju/llama-7b-hf/ --lora_model /home/sunyuhan/syh/sunyuhan/zju/chinese-alpaca-lora-7b --with_prompt --gpus 4,5,6,7

@airaria
Copy link
Contributor

airaria commented May 10, 2023

Looks awesome! Thank you for contribution.
But I have a question about loading Alpaca Plus models. Since the Plus needs two LoRA weights, so does the demo support multi-LoRA loading? Or alternatively, does the it support loading only base_model without lora_model (since users can merge the LoRAs into the base model to get a single merged Plus model weight file)?

@sunyuhan19981208
Copy link
Contributor Author

sunyuhan19981208 commented May 11, 2023

support base_model-only mode

@airaria
Copy link
Contributor

airaria commented May 11, 2023

support base_model-only mode

Thanks. We will perform some tests before confirming merging.

@sunyuhan19981208
Copy link
Contributor Author

support base_model-only mode

Thanks. We will perform some tests before confirming merging.

Please let me know if you have any advice or if you've found any bugs as soon as possible. I would be grateful to receive your feedback, and I'm eager to work on improving based on your suggestions. Thank you very much.

@airaria airaria assigned airaria and unassigned airaria May 15, 2023
@ymcui ymcui merged commit 49ec61e into ymcui:main May 15, 2023
@tluo-github
Copy link

在这个 notebook 中:
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/convert_and_quantize_chinese_llama.ipynbusp=sharing#scrollTo=EjkXqaqbmrVZ
使用如下命令: 能成功运行
!cd Chinese-LLaMA-Alpaca/ && python scripts/my_gradio_demo.py \ --base_model 'decapoda-research/llama-7b-hf' \ --lora_model 'ziqingyang/chinese-alpaca-lora-7b'

在这个 notebook :
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/finetune_chinese_alpaca_lora.ipynb
!cd Chinese-LLaMA-Alpaca/ && python scripts/my_gradio_demo.py \ --base_model 'decapoda-research/llama-7b-hf' \ --lora_model '/content/output_model/peft_model'
运行起来之后,提问会报错:
`
2023-05-15 08:58:49.995893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading checkpoint shards: 100% 33/33 [00:18<00:00, 1.75it/s]
Vocab of the base model: 32000
Vocab of the tokenizer: 49954
Resize model embeddings to fit tokenizer
loading peft model
Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://da6b02733e495413ee.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 414, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1320, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1048, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/content/Chinese-LLaMA-Alpaca/scripts/my_gradio_demo.py", line 132, in predict
generation_output = model.generate(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 581, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1524, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2810, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 358, in forward
result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float
Keyboard interruption in main thread... closing server.`

@tluo-github
Copy link

已解决:
load_type = torch.float16
改为如下:
load_type = torch.float32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants