[Unittest]add tinybert unittest #2992

wj-Mcat · 2022-08-08T10:18:33Z

PR types

New features

PR changes

Models

Description

add tinybert unittest

guoshengCS · 2022-08-09T07:21:38Z

paddlenlp/transformers/tinybert/modeling.py

@@ -330,7 +346,7 @@ def forward(self, input_ids, token_type_ids=None, attention_mask=None):
        if attention_mask is None:
            attention_mask = paddle.unsqueeze(
                (input_ids == self.pad_token_id).astype(
-                    self.pooler.dense.weight.dtype) * -1e4,
+                    self.pooler.dense.weight.dtype) * 0e4,


这个先不要改动，当前attention_mask is None的时候咱们确实会和HF行为不一致，而且目前还需要保留这个。attention_mask相关内容也会在 #2005 里处理

不好意思，这个应该是键盘vim快捷键误触导致的，我已经回滚了。

guoshengCS · 2022-08-09T07:26:17Z

paddlenlp/transformers/tinybert/modeling.py

+        Args:
+            embedding (nn.Embedding): the new embedding value
+        """
+        self.tinybert.embeddings.word_embeddings = embedding


按照基类中的实现，这个是要改TinyBERTModel而不是这些下游任务模型的吧，参考HF中BERT

关于这点，我阅读了下代码，确实是在PreTrainedModel这个类中有实现 get_input_embeddings 这个方法，核心还是调用base_model的get_input_embeddings的方法。

我一会儿也把其他几个单测中的也修改一下。

guoshengCS · 2022-08-09T07:52:26Z

tests/transformers/tinybert/test_modeling.py

+import unittest
+from typing import Optional, Tuple
+from dataclasses import dataclass, fields, Field
+from dataclasses_json import dataclass_json


这里不要引入更多依赖了，看这个也是造成CI单测失败的原因

嗯嗯，我也看到了，其他的几个单测我也顺便都调整了。

guoshengCS · 2022-08-09T07:53:22Z

tests/transformers/tinybert/test_modeling.py

+class TinyBertModelIntegrationTest(unittest.TestCase):
+
+    # @slow
+    def test_inference_no_attention(self):


这个和下面的slow为什么去掉了呢

我后续加上，因为我在本地测试时，这个会直接skip掉，所以为了测试它的有效性，我都是注释掉的。

tests/transformers/tinybert/test_tokenizer.py

guoshengCS · 2022-08-09T08:10:58Z

tests/transformers/tinybert/test_tokenizer.py

+                self.assertListEqual(tokens_without_spe_char_p, expected_tokens)
+                '''
+
+    def test_pretrained_model_lists(self):


这个单测的意图是要测什么呢，看HF没有

是为了测试tokenizer当中pretrained_vocab_files_map和max_model_input_sizes的配置是否正确。只不过HF的这个是在基类（test_tokenizer_common.py）里面，我是写在派生类里面来。

写在派生类的原因：tinybert没有max_model_input_sizes属性（其实是绝大部分的tokenizer都没有这个属性）导致单测failed，所以我要重写方法修改一下测试逻辑。

后想一下，这个逻辑其实是没有意义的，我觉得可以删掉，或者判断没有max_model_input_sizes属性的话，就直接skip掉单测也是可以的。你觉得如何？ @guoshengCS

max_model_input_sizes这个倒可以加上，HF大部分是加上了max_model_input_sizes的

好，这个我马上加上

…bert-test

wj-Mcat · 2022-08-10T05:50:33Z

ping @guoshengCS

update tinybert unittest

dd555ad

wj-Mcat changed the title ~~[Unittest]update tinybert unittest~~ [Unittest]add tinybert unittest Aug 8, 2022

guoshengCS reviewed Aug 9, 2022

View reviewed changes

wj-Mcat added 3 commits August 9, 2022 09:25

remove dataclass-json & fix get_input_embedding

64372a5

Merge branch 'develop' of github.com:wj-Mcat/PaddleNLP into add-tiny-…

7d63939

…bert-test

update tinybert

2270b18

wj-Mcat and others added 3 commits August 11, 2022 07:17

add max_model_input_size feature

d8518d7

fix relative importing

5e745cc

Merge branch 'develop' into add-tiny-bert-test

4e44d29

guoshengCS approved these changes Aug 11, 2022

View reviewed changes

guoshengCS merged commit 6952b91 into PaddlePaddle:develop Aug 11, 2022

wj-Mcat mentioned this pull request Aug 24, 2022

PaddleNLP 2.3.6 Release Note Candidate #3122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unittest]add tinybert unittest #2992

[Unittest]add tinybert unittest #2992

wj-Mcat commented Aug 8, 2022

guoshengCS Aug 9, 2022

wj-Mcat Aug 9, 2022

guoshengCS Aug 9, 2022

wj-Mcat Aug 9, 2022

guoshengCS Aug 9, 2022

wj-Mcat Aug 9, 2022

guoshengCS Aug 9, 2022

wj-Mcat Aug 9, 2022

guoshengCS Aug 9, 2022

wj-Mcat Aug 9, 2022

guoshengCS Aug 10, 2022

wj-Mcat Aug 11, 2022

wj-Mcat commented Aug 10, 2022

[Unittest]add tinybert unittest #2992

[Unittest]add tinybert unittest #2992

Conversation

wj-Mcat commented Aug 8, 2022

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wj-Mcat commented Aug 10, 2022