-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #4357 +/- ##
========================================
Coverage 39.65% 39.65%
========================================
Files 433 433
Lines 60936 60936
========================================
Hits 24167 24167
Misses 36769 36769 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -146,6 +147,11 @@ def convert_example_to_feature_cls(example, tokenizer, label2id, max_seq_len=512 | |||
return encoded_inputs | |||
|
|||
|
|||
def remove_blanks(example): | |||
example["text"] = re.sub(" +", "", example["text"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里比较好奇,为什么要改动原文的输入?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去除原文中的空格,当前的tokenizer 在encode时会忽略空格,导致input_ids长度!=原始文本的长度,会有匹配上的一些问题。
def remove_blanks(example): | ||
example["text"] = re.sub(" +", "", example["text"]) | ||
return example | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
- ``negative_ratio``: 最大负例比例,该参数只对抽取类型任务有效,适当构造负例可提升模型效果。负例数量和实际的标签数量有关,最大负例数量 = negative_ratio * 正例数量。该参数只对训练集有效,默认为5。为了保证评估指标的准确性,验证集和测试集默认构造全负例。 | ||
- ``is_shuffle``: 是否对数据集进行随机打散,默认为True。 | ||
- ``seed``: 随机种子,默认为1000. | ||
其中,参数``negative_ratio``表示对于一个样本,为每个子任务(属性级的观点抽取,属性级的情感分类)最多生成``negative_ratio``个负样本。如果额外提供了属性同义词标或隐性观点抽取词表,将结合两者信息生成更多的负样本,以增强属性聚合和隐性观点抽取能力。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的参数介绍,是否要保留 label_studio_file 、 task_type 相关的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已添加
"sentiment_prompt_prefix": "情感倾向", | ||
"separator": "##", | ||
"not_mentioned_option": "未提及", | ||
"options": "正向,负向,未提及", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的options被写死了,如果用户想自定义 "正向,负向,中性" 类似的options,这块的定制能力是不是没有了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如沟通,当前只对外暴露的options,保持情感分类定制能力
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Function optimization & Bug fixes
PR changes
APIs & Docs
Description