Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FileNotFoundError: configuration file<config.json> or <model_config.json> not found #4738

Closed
1 task done
gcr1992 opened this issue Feb 10, 2023 · 15 comments · Fixed by #4758
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@gcr1992
Copy link

gcr1992 commented Feb 10, 2023

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.3.2.post111
- paddlenlp:  2.5.0.post0   2.5.0 都试过

重复问题

  • I have searched the existing issues

错误描述

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/code_generation/codegen

试用PaddleNLP codegen,安装之后,通过代码进行测试,一直提示FileNotFoundError: configuration file<config.json> or <model_config.json> not found,尝试过安装paddlenlp:  2.5.0.post0 2.5.0 

跟踪过源码,code_generation.py 
self._construct_tokenizer(model) 可以正常下载
self._construct_model(model)  下载不了配置文件

'https://bj.bcebos.com/paddlenlp/models/community/Salesforce/codegen-350M-mono/config.json'
'https://bj.bcebos.com/paddlenlp/models/community/Salesforce/codegen-350M-mono/model_config.json'

稳定复现步骤 & 代码

1.安装
(1)从https://github.com/PaddlePaddle/PaddleNLP下载 develop 或者 v2.5.0 到本地window10机器
(2)pip uninstall -y paddlenlp
(3)进入 PaddleNLP解压目录 执行 python setup.py install
2.代码测试
from paddlenlp import Taskflow
prompt = "def lengthOfLongestSubstring(self, s: str) -> int:"
codegen = Taskflow("code_generation", model="Salesforce/codegen-350M-mono",decode_strategy="greedy_search", repetition_penalty=1.0)
print(codegen(prompt))

3.结果日志:

D:\Program Files\Python37\lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2023-02-10 14:12:35,367] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/vocab.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono
[2023-02-10 14:12:35,512] [ INFO] - Downloading vocab.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/vocab.json
100%|██████████| 779k/779k [00:01<00:00, 773kB/s]
[2023-02-10 14:12:36,900] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/merges.txt and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono
[2023-02-10 14:12:37,069] [ INFO] - Downloading merges.txt from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/merges.txt
100%|██████████| 446k/446k [00:00<00:00, 562kB/s]
[2023-02-10 14:12:38,164] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/added_tokens.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono
[2023-02-10 14:12:38,305] [ INFO] - Downloading added_tokens.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/added_tokens.json
100%|██████████| 0.98k/0.98k [00:00<?, ?B/s]
[2023-02-10 14:12:38,429] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/special_tokens_map.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono
[2023-02-10 14:12:38,561] [ INFO] - Downloading special_tokens_map.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/special_tokens_map.json
100%|██████████| 90.0/90.0 [00:00<?, ?B/s]
[2023-02-10 14:12:38,708] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/tokenizer_config.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono
[2023-02-10 14:12:38,831] [ INFO] - Downloading tokenizer_config.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/tokenizer_config.json
100%|██████████| 177/177 [00:00<?, ?B/s]
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
[2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary
Traceback (most recent call last):
File "F:/pythonProject/rpa/PaddleNLP/testCode.py", line 6, in
codegen = Taskflow("code_generation", model="Salesforce/codegen-350M-mono",decode_strategy="greedy_search", repetition_penalty=1.0)
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 591, in init
model=self.model, task=self.task, priority_path=self.priority_path, from_hf_hub=from_hf_hub, **self.kwargs
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\code_generation.py", line 59, in init
self._construct_model(model)
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\code_generation.py", line 65, in _construct_model
self._model = CodeGenForCausalLM.from_pretrained(model)
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\model_utils.py", line 486, in from_pretrained
pretrained_model_name_or_path, from_hf_hub=from_hf_hub, subfolder=subfolder, *args, **kwargs
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1328, in from_pretrained_v2
**kwargs,
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 736, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 758, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 831, in _get_config_dict
raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found")
FileNotFoundError: configuration file<config.json> or <model_config.json> not found

@gcr1992 gcr1992 added the bug Something isn't working label Feb 10, 2023
@wawltor wawltor removed the triage label Feb 10, 2023
@gongel
Copy link
Member

gongel commented Feb 10, 2023

你好,我这边没有复现该问题。你这边需要确定下pip的python和运行程序的python是同一个。

@gcr1992
Copy link
Author

gcr1992 commented Feb 10, 2023

你好,我这边没有复现该问题。你这边需要确定下pip的python和运行程序的python是同一个。

确定是同一个版本,我在pycharm中控制台执行了pip list 命令行是可以看到的,工程的依赖包也可以看到。如果是缺包,正常应该会报出来啊

@caorushizi
Copy link

你自己改一下吧

community_url = os.path.join(COMMUNITY_MODEL_PREFIX, pretrained_model_name_or_path, CONFIG_NAME)

改成
community_url = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{CONFIG_NAME}"

community_url = os.path.join(COMMUNITY_MODEL_PREFIX, pretrained_model_name_or_path, LEGACY_CONFIG_NAME)

改成
community_url = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{LEGACY_CONFIG_NAME}"

community_model_file_path = os.path.join(

改成
community_model_file_path = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{cls.resource_files_names['model_state']}"

@caorushizi
Copy link

使用了 os.path.join 在 windows上拼接url的时候用的是 '', 导致发送requests请求的时候 404 了。

@sijunhe
Copy link
Collaborator

sijunhe commented Feb 12, 2023

感谢 @caorushizi@gcr1992 二位的反馈。这里确实我们写的逻辑有问题,导致windows用户出错。已在#4758 中fix, 合入之后应该可以解决问题

@gongel
Copy link
Member

gongel commented Feb 14, 2023

好的,该问题之前应该修复过 #3640,感谢大家的反馈!

@iouen
Copy link

iouen commented Feb 28, 2023

image

mac 电脑,2.5.1的版本,在使用时同样的问题 from paddlenlp import Taskflow

默认模型为 pai-painter-painting-base-zh

text_to_image = Taskflow("text_to_image")

@iouen
Copy link

iouen commented Feb 28, 2023

image

这里按上看的调整了,也报错

@JunnYu
Copy link
Member

JunnYu commented Feb 28, 2023

@iouen 当前这个PR正在升级pretrained config的,#4992
如果您想要体验文生图的话,建议使用ppdiffusers快速体验https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers

@northovo
Copy link

D:\Python>python finetune.py --device cpu --logging_steps 5 --save_steps 25 --eval_steps 25 --seed 42 --model_name_or_path uie-x-base --output_dir ./document/model_best --train_path document/data/train.txt --dev_path /document/data/dev.txt --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --num_train_epochs 5 --learning_rate 1e-5 --label_names 'start_position' 'end_position' --do_train --do_eval --do_export --export_model_dir ./document/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1
D:\Anacoda\lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2023-03-12 00:08:19,796] [ WARNING] - evaluation_strategy reset to IntervalStrategy.STEPS for do_eval is True. you can also set evaluation_strategy='epoch'.
[2023-03-12 00:08:19,796] [ INFO] - The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-03-12 00:08:19,796] [ INFO] - ============================================================
[2023-03-12 00:08:19,796] [ INFO] - Model Configuration Arguments
[2023-03-12 00:08:19,796] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009
[2023-03-12 00:08:19,796] [ INFO] - export_model_dir :./document/model_best
[2023-03-12 00:08:19,796] [ INFO] - model_name_or_path :uie-x-base
[2023-03-12 00:08:19,796] [ INFO] - multilingual :False
[2023-03-12 00:08:19,796] [ INFO] -
[2023-03-12 00:08:19,796] [ INFO] - ============================================================
[2023-03-12 00:08:19,796] [ INFO] - Data Configuration Arguments
[2023-03-12 00:08:19,796] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009
[2023-03-12 00:08:19,796] [ INFO] - dev_path :/document/data/dev.txt
[2023-03-12 00:08:19,796] [ INFO] - dynamic_max_length :None
[2023-03-12 00:08:19,796] [ INFO] - max_seq_length :512
[2023-03-12 00:08:19,796] [ INFO] - train_path :document/data/train.txt
[2023-03-12 00:08:19,812] [ INFO] -
[2023-03-12 00:08:19,812] [ WARNING] - Process rank: -1, device: cpu, world_size: 1, distributed training: False, 16-bits training: False
[2023-03-12 00:08:19,812] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load 'uie-x-base'.
[2023-03-12 00:08:19,812] [ INFO] - Already cached C:\Users\24210.paddlenlp\models\uie-x-base\vocab.txt
[2023-03-12 00:08:19,812] [ INFO] - Already cached C:\Users\24210.paddlenlp\models\uie-x-base\sentencepiece.bpe.model
[2023-03-12 00:08:20,378] [ INFO] - tokenizer config file saved in C:\Users\24210.paddlenlp\models\uie-x-base\tokenizer_config.json
[2023-03-12 00:08:20,378] [ INFO] - Special tokens file saved in C:\Users\24210.paddlenlp\models\uie-x-base\special_tokens_map.json
Traceback (most recent call last):
File "D:\Python\finetune.py", line 244, in
main()
File "D:\Python\finetune.py", line 134, in main
model = UIE.from_pretrained(model_args.model_name_or_path)
File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\model_utils.py", line 484, in from_pretrained
return cls.from_pretrained_v2(
File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1320, in from_pretrained_v2
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 699, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 722, in get_config_dict
config_dict, kwargs = cls._get_config_dict(
File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 797, in _get_config_dict
raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found")
FileNotFoundError: configuration file<config.json> or <model_config.json> not found

@lili-changjiang
Copy link

我也有同上的问题

@byy-git
Copy link

byy-git commented Jul 13, 2023

I got the same error:
raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found")
FileNotFoundError: configuration file<config.json> or <model_config.json> not found

@byy-git
Copy link

byy-git commented Jul 14, 2023

I got the same error: raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found") FileNotFoundError: configuration file<config.json> or <model_config.json> not found

I found that the error was caused by the use of the wrong finetune.py, it should be ./document/finetune.py, not ./text/finetune.py

@zhaoqf-cq
Copy link

没人发现这个config地址是
https://bj.bcebos.com/paddlenlp/models/community/uie-x-base/config.json
https://bj.bcebos.com/paddlenlp/models/community/uie-x-base/model_config.json
这个改本地代码能有用? 拿不到配置啊

@chenzaichun
Copy link

同样问题,拿不到对应的配置文件。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.