-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MegatronBERT P0] add PretrainedConfig and unit test #4912
Conversation
Thanks for your contribution! |
您好,欢迎参加飞桨黑客松。这个PR是一个好的开始,后续需要您打开test_modeling和test_tokenizer单测 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
进展不错。您可以配好环境以后在本地使用pytest tests/transformers/megatronbert/test_modeling.py
和pytest tests/transformers/megatronbert/test_tokenizer.py
进行测试
您好,发现了tokenizer.py文件中的bug,megatronbert_tokenizer在继承bert_tokenizer时部分参数无法传递,会影响tokenizer的参数传递而造成test_tokenizer无法通过. |
self.self = MegatronBertSelfAttention( | ||
num_attention_heads=num_attention_heads, | ||
attention_probs_dropout_prob=attention_probs_dropout_prob, | ||
max_position_embeddings=max_position_embeddings, | ||
position_embedding_type=position_embedding_type, | ||
num_attention_heads=config.num_attention_heads, | ||
attention_probs_dropout_prob=config.attention_probs_dropout_prob, | ||
max_position_embeddings=config.max_position_embeddings, | ||
position_embedding_type=config.position_embedding_type, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该是self.self = MegatronBertSelfAttention(config)
super(MegatronBertAttention, self).__init__() | ||
self.layer_norm = nn.LayerNorm(hidden_size, epsilon=layer_norm_eps) | ||
self.layer_norm = nn.LayerNorm(config.hidden_size, epsilon=layer_norm_eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.layer_norm_eps
max_position_embeddings=config.max_position_embeddings, | ||
position_embedding_type=config.position_embedding_type, | ||
) | ||
self.output = MegatronBertSelfOutput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.output = MegatronBertSelfOutput(config)
self.attention = MegatronBertAttention( | ||
hidden_size=hidden_size, | ||
num_attention_heads=num_attention_heads, | ||
hidden_dropout_prob=hidden_dropout_prob, | ||
attention_probs_dropout_prob=attention_probs_dropout_prob, | ||
max_position_embeddings=max_position_embeddings, | ||
position_embedding_type=position_embedding_type, | ||
hidden_size=config.hidden_size, | ||
num_attention_heads=config.num_attention_heads, | ||
hidden_dropout_prob=config.hidden_dropout_prob, | ||
attention_probs_dropout_prob=config.attention_probs_dropout_prob, | ||
max_position_embeddings=config.max_position_embeddings, | ||
position_embedding_type=config.position_embedding_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.attention = MegatronBertAttention(config)
) | ||
|
||
self.layer_norm = nn.LayerNorm(hidden_size, epsilon=layer_norm_eps) | ||
self.layer_norm = nn.LayerNorm(config.hidden_size, epsilon=layer_norm_eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config. layer_norm_eps
self.intermediate = MegatronBertIntermediate( | ||
hidden_size=hidden_size, intermediate_size=intermediate_size, hidden_act=hidden_act | ||
hidden_size=config.hidden_size, intermediate_size=config.intermediate_size, hidden_act=config.hidden_act | ||
) | ||
self.output = MegatronBertOutput( | ||
intermediate_size, hidden_dropout_prob=hidden_dropout_prob, hidden_size=hidden_size | ||
config.intermediate_size, hidden_dropout_prob=config.hidden_dropout_prob, hidden_size=config.hidden_size | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的输入参数统一都是只有config
position_embedding_type=None, | ||
num_hidden_layers=24, | ||
): | ||
def __init__(self, config: MegatronBertConfig): | ||
super(MegatronBertEncoder, self).__init__() | ||
self.layer = nn.LayerList( | ||
[ | ||
MegatronBertLayer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入参数应该是config
self.initializer_range = initializer_range | ||
self.num_hidden_layers = config.num_hidden_layers | ||
self.pad_token_id = config.pad_token_id | ||
self.initializer_range = config.initializer_range | ||
self.embeddings = MegatronBertEmbeddings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入参数应该是config
type_vocab_size=config.type_vocab_size, | ||
max_position_embeddings=config.max_position_embeddings, | ||
hidden_dropout_prob=config.hidden_dropout_prob, | ||
position_embedding_type=config.position_embedding_type, | ||
) | ||
self.encoder = MegatronBertEncoder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入参数应该是config
没问题。这个模型代码年久失修,有bug也是预期之中,在PR里一起修复提交即可 |
您好,这样是不是就完成了一个模型了? |
Codecov Report
@@ Coverage Diff @@
## develop #4912 +/- ##
===========================================
+ Coverage 46.32% 47.56% +1.23%
===========================================
Files 448 453 +5
Lines 64616 65421 +805
===========================================
+ Hits 29936 31116 +1180
+ Misses 34680 34305 -375
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 稍微修改一下我comment的小问题即可合入
self.decoder_weight = self.create_parameter( | ||
shape=[vocab_size, hidden_size], dtype=self.transform.weight.dtype, is_bias=False | ||
shape=[config.vocab_size, config.hidden_size], dtype=nn.Embedding(1, 1)._dtype, is_bias=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里dtype还是保持dtype=self.transform.weight.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您好,为什么改了这一块是因为'MegatronBertPredictionHeadTransform' object has no attribute 'weight'
原始modeling文件中,MegatronBertPredictionHeadTransform没有weight参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
了解了,那就改成dtype=self.transform.dense.weight.dtype 试试
@parameterized_class( | ||
("return_dict", "use_labels"), | ||
[ | ||
[False, False], | ||
[False, True], | ||
[True, False], | ||
[True, True], | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前的模型并没有实现return_dict和labels, 所以parameterized_class这一段不需要,可以删除
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/" | ||
"megatron-bert/megatronbert-cased/model_state.pdparams", | ||
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/" | ||
"megatron-bert/megatronbert-cased/model_state.pdparams", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/" | |
"megatron-bert/megatronbert-cased/model_state.pdparams", | |
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/" | |
"megatron-bert/megatronbert-cased/model_state.pdparams", | |
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/megatron-bert/megatronbert-cased/model_state.pdparams", | |
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/megatron-bert/megatronbert-uncased/model_state.pdparams", |
非常接近了,稍微修改一下我comment的小问题即可合入 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. 感谢您完成这个模型的升级!
PR types
PR changes
Description