-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix compatibility for ppnimilm, skep, unifiedtransformer, gpt2 #2063
fix compatibility for ppnimilm, skep, unifiedtransformer, gpt2 #2063
Conversation
return self.convert_tokens_to_ids(tokens) | ||
else: | ||
return self.convert_tokens_to_ids(text) | ||
return self.convert_tokens_to_ids(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
包括这里也加下TODO或者NOTE说明下原因吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
PR描述里说明下目前会造成哪些影响吧 |
也再确认下code style,看着这个CI挂了 |
另外 @yingyibiao 也关注下,主要是 tokenzier 这里留的一些 TODO和NOTE |
@@ -127,6 +127,7 @@ def __init__( | |||
**kwargs # The token of newline. | |||
): | |||
self._model_file = model_file | |||
self.eol_token = eol_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加上TODO说明下吧,说明GPT tokenzier我们的实现有些特殊,有个eol_token
PR types
Bug fixes
PR changes
APIs
Description
fix compatibility for ppnimilm, skep, unifiedtransformer, gpt2
ppnimilm:使用升级后tokenizer存储和读取时会报错
skep:序列标注任务训练和预测报错
unifiedtransformer:使用padding后训练和预测会报错
gpt2:调用tokenizer.eol_token会报错