Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

Merged
merged 152 commits into from
Jan 10, 2023

Conversation

1649759610
Copy link
Contributor

PR types

Function optimization & Bug fixes

PR changes

APIs & Docs

Description

  1. Futher support aspect-level ext tasks, such as aspect, aspect-sentiment, aspect-opinion and so on. Open up the process from annotation to visualization.
  2. Fix the problem caused by tokenizer updating for sentiment analysis based on skep.
  3. Optimize the log output for our project
  4. Refine the readme of label-studio and sentiment analysis, to make users to understand our project easily.

@paddle-bot
Copy link

paddle-bot bot commented Jan 5, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Jan 5, 2023

Codecov Report

Merging #4357 (1792554) into develop (0716ead) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #4357   +/-   ##
========================================
  Coverage    39.65%   39.65%           
========================================
  Files          433      433           
  Lines        60936    60936           
========================================
  Hits         24167    24167           
  Misses       36769    36769           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -146,6 +147,11 @@ def convert_example_to_feature_cls(example, tokenizer, label2id, max_seq_len=512
return encoded_inputs


def remove_blanks(example):
example["text"] = re.sub(" +", "", example["text"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里比较好奇,为什么要改动原文的输入?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去除原文中的空格,当前的tokenizer 在encode时会忽略空格,导致input_ids长度!=原始文本的长度,会有匹配上的一些问题。

def remove_blanks(example):
example["text"] = re.sub(" +", "", example["text"])
return example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

- ``negative_ratio``: 最大负例比例,该参数只对抽取类型任务有效,适当构造负例可提升模型效果。负例数量和实际的标签数量有关,最大负例数量 = negative_ratio * 正例数量。该参数只对训练集有效,默认为5。为了保证评估指标的准确性,验证集和测试集默认构造全负例。
- ``is_shuffle``: 是否对数据集进行随机打散,默认为True。
- ``seed``: 随机种子,默认为1000.
其中,参数``negative_ratio``表示对于一个样本,为每个子任务(属性级的观点抽取,属性级的情感分类)最多生成``negative_ratio``个负样本。如果额外提供了属性同义词标或隐性观点抽取词表,将结合两者信息生成更多的负样本,以增强属性聚合和隐性观点抽取能力。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的参数介绍,是否要保留 label_studio_file 、 task_type 相关的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

"sentiment_prompt_prefix": "情感倾向",
"separator": "##",
"not_mentioned_option": "未提及",
"options": "正向,负向,未提及",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的options被写死了,如果用户想自定义 "正向,负向,中性" 类似的options,这块的定制能力是不是没有了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如沟通,当前只对外暴露的options,保持情感分类定制能力

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@1649759610 1649759610 merged commit f340ff5 into PaddlePaddle:develop Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants