pretrainedModel add gconfig #6915

wtmlon · 2023-09-04T13:47:33Z

PR types

PR changes

Description

目录结构变动，PretrainedModel 适配 gconfig

paddle-bot · 2023-09-04T13:47:39Z

Thanks for your contribution!

codecov · 2023-09-04T14:20:00Z

Codecov Report

Merging #6915 (6a605ba) into develop (2f3eac3) will decrease coverage by 0.06%.
Report is 15 commits behind head on develop.
The diff coverage is 67.34%.

@@             Coverage Diff             @@
##           develop    #6915      +/-   ##
===========================================
- Coverage    59.92%   59.87%   -0.06%     
===========================================
  Files          547      552       +5     
  Lines        81009    81452     +443     
===========================================
+ Hits         48546    48770     +224     
- Misses       32463    32682     +219

Files Changed	Coverage Δ
...enlp/experimental/transformers/generation_utils.py	`0.00% <0.00%> (ø)`
paddlenlp/transformers/utils.py	`61.51% <17.77%> (-5.63%)`	⬇️
paddlenlp/generation/streamers.py	`25.00% <25.00%> (ø)`
paddlenlp/peft/prefix/prefix_model.py	`60.17% <50.00%> (-0.28%)`	⬇️
paddlenlp/generation/utils.py	`67.51% <72.91%> (ø)`
paddlenlp/generation/logits_process.py	`73.12% <73.12%> (ø)`
paddlenlp/generation/configuration_utils.py	`81.06% <81.06%> (ø)`
paddlenlp/generation/stopping_criteria.py	`81.08% <81.08%> (ø)`
paddlenlp/transformers/model_utils.py	`70.93% <90.00%> (+0.18%)`	⬆️
paddlenlp/generation/__init__.py	`100.00% <100.00%> (ø)`
... and 3 more

... and 4 files with indirect coverage changes

sijunhe · 2023-09-05T07:49:01Z

llm/predictor.py

@@ -41,6 +41,7 @@
    PretrainedModel,
    PretrainedTokenizer,
 )
+from paddlenlp.transformers.generation_utils import GenerationConfig


尽量从paddlenlp.generation 路径import, 后续paddlenlp.transformers.generation_utils 逐渐废弃

sijunhe · 2023-09-05T07:51:05Z

paddlenlp/transformers/generation_utils.py

+from paddlenlp.generation.configuration_utils import *  # noqa: F401, F403
+from paddlenlp.generation.logits_process import *  # noqa: F401, F403
+from paddlenlp.generation.stopping_criteria import *  # noqa: F401, F403
+from paddlenlp.generation.streamers import *  # noqa: F401, F403
+from paddlenlp.generation.utils import *  # noqa: F401, F403


可不可以直接废弃调transformers/generation_utils? 后续都从transformers/generation import?

sijunhe · 2023-09-05T09:54:13Z

llm/utils.py

@@ -26,6 +26,7 @@
 from sklearn.metrics import accuracy_score

 from paddlenlp.datasets import InTokensIterableDataset
+from paddlenlp.generation import GenerationConfig


这个不着急改，需要向后兼容一点时间，保持原有api

sijunhe · 2023-09-05T09:54:24Z

llm/utils.py

+                generation_config=GenerationConfig(
+                    max_new_token=self.data_args.tgt_length,
+                    decode_strategy="sampling",
+                    top_k=self.gen_args.top_k,
+                    top_p=self.gen_args.top_p,
+                    bos_token_id=self.tokenizer.bos_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    pad_token_id=self.tokenizer.pad_token_id,
+                    use_cache=True,
+                ),


保持原有api

sijunhe · 2023-09-05T09:56:58Z

paddlenlp/generation/utils.py

+            logger.warning("`max_length` will be deprecated in future, use" " `max_new_token` instead.")
+            generation_config.max_new_token = generation_config.max_length
+
+        if generation_config.min_length != 0 and generation_config.min_new_token == 0:
+            logger.warning("`min_length` will be deprecated in future, use" " `min_new_token` instead.")


Suggested change

logger.warning("`max_length` will be deprecated in future, use" " `max_new_token` instead.")

generation_config.max_new_token = generation_config.max_length

if generation_config.min_length != 0 and generation_config.min_new_token == 0:

logger.warning("`min_length` will be deprecated in future, use" " `min_new_token` instead.")

logger.warning("`max_length` will be deprecated in future releases, use `max_new_token` instead.")

generation_config.max_new_token = generation_config.max_length

if generation_config.min_length != 0 and generation_config.min_new_token == 0:

logger.warning("`min_length` will be deprecated in future releases, use `min_new_token` instead.")

sijunhe · 2023-09-05T09:57:34Z

paddlenlp/transformers/model_utils.py

@@ -46,6 +46,7 @@
 from paddle.utils.download import is_url as is_remote_url
 from tqdm.auto import tqdm

+from paddlenlp.generation import GenerationConfig, GenerationMixin


全部替换成relative import

pretrainedModel add gconfig

f79d158

max/min_new_token, bug fix

2e06d9f

sijunhe reviewed Sep 5, 2023

View reviewed changes

deprecate generation_utils

c53550b

sijunhe reviewed Sep 5, 2023

View reviewed changes

wtmlon added 2 commits September 5, 2023 20:11

code refinement

d541ae9

code refinement

6a605ba

sijunhe approved these changes Sep 5, 2023

View reviewed changes

sijunhe merged commit e183825 into PaddlePaddle:develop Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrainedModel add gconfig #6915

pretrainedModel add gconfig #6915

wtmlon commented Sep 4, 2023

paddle-bot bot commented Sep 4, 2023

codecov bot commented Sep 4, 2023 •

edited

Loading

sijunhe Sep 5, 2023

sijunhe Sep 5, 2023

sijunhe Sep 5, 2023

sijunhe Sep 5, 2023

sijunhe Sep 5, 2023

sijunhe Sep 5, 2023

pretrainedModel add gconfig #6915

pretrainedModel add gconfig #6915

Conversation

wtmlon commented Sep 4, 2023

PR types

PR changes

Description

paddle-bot bot commented Sep 4, 2023

codecov bot commented Sep 4, 2023 • edited Loading

Codecov Report

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

sijunhe Sep 5, 2023

Choose a reason for hiding this comment

codecov bot commented Sep 4, 2023 •

edited

Loading