-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外看有些新增的test文件是空的,是还没有完成是吗
@@ -406,6 +407,8 @@ def __init__( | |||
bpe_merges = [tuple(merge.split()) for merge in bpe_data] | |||
self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges)))) | |||
self.cache = {} | |||
self.add_prefix_space = add_prefix_space | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看着并不会被用到,看HF是否这里少了对prepare_for_tokenization
方法的重写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -363,6 +363,7 @@ def __init__( | |||
merges_file, | |||
errors='replace', | |||
max_len=None, | |||
add_prefix_space=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外这个参数也放到special token后面吧,一是和HF一致,二是更好的照顾兼容性
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
unk_token="<unk>", | ||
pad_token="<pad>", | ||
mask_token="<mask>", | ||
add_prefix_space=False, | ||
max_len=None, | ||
special_tokens=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_len和special_tokens看着都没有使用,而且HF也没有,没有用的话就去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AlbertModel, | ||
) | ||
from tests.transformers.test_modeling_common import ids_tensor, random_attention_mask, ModelTesterMixin | ||
from tests.testing_utils import slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照上次统一的结论,对tests下的内容使用相对import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for t in text))) | ||
return self.convert_tokens_to_ids(tokens) | ||
else: | ||
return self.convert_tokens_to_ids(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里之前没改留了TODO好像就是因为兼容性问题,还得关注下这里是否有CI挂的,之前是skep序列标注任务训练和预测报错 #2063
**tokenizer.added_tokens_encoder) | ||
vocab = tokenizer.get_vocab() | ||
# vocab = dict(tokenizer.vocab._token_to_idx, | ||
# **tokenizer.added_tokens_encoder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删掉这个吧,不用了是吧,上面那样也是这样的就删掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
看CI里这个制品详情里只有一个,这个是否是符合预期的呢 @zjjlivein |
PR types
unittest
PR changes
Models
Description
Add unittest for RoBERTa and ALBERT