support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

1649759610 · 2023-01-05T07:55:08Z

PR types

Function optimization & Bug fixes

PR changes

APIs & Docs

Description

Futher support aspect-level ext tasks, such as aspect, aspect-sentiment, aspect-opinion and so on. Open up the process from annotation to visualization.
Fix the problem caused by tokenizer updating for sentiment analysis based on skep.
Optimize the log output for our project
Refine the readme of label-studio and sentiment analysis, to make users to understand our project easily.

…o develop

paddle-bot · 2023-01-05T07:55:12Z

Thanks for your contribution!

codecov · 2023-01-05T08:09:08Z

Codecov Report

Merging #4357 (1792554) into develop (0716ead) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #4357   +/-   ##
========================================
  Coverage    39.65%   39.65%           
========================================
  Files          433      433           
  Lines        60936    60936           
========================================
  Hits         24167    24167           
  Misses       36769    36769

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

wawltor · 2023-01-05T11:37:48Z

applications/sentiment_analysis/ASO_analysis/deploy/predict.py

@@ -146,6 +147,11 @@ def convert_example_to_feature_cls(example, tokenizer, label2id, max_seq_len=512
    return encoded_inputs


+def remove_blanks(example):
+    example["text"] = re.sub(" +", "", example["text"])


这里比较好奇，为什么要改动原文的输入？

去除原文中的空格，当前的tokenizer 在encode时会忽略空格，导致input_ids长度!=原始文本的长度，会有匹配上的一些问题。

wawltor · 2023-01-05T11:45:27Z

applications/sentiment_analysis/ASO_analysis/predict.py

+def remove_blanks(example):
+    example["text"] = re.sub(" +", "", example["text"])
+    return example
+


wawltor · 2023-01-05T11:46:48Z

applications/sentiment_analysis/unified_sentiment_extraction/README.md

- ``negative_ratio``: 最大负例比例，该参数只对抽取类型任务有效，适当构造负例可提升模型效果。负例数量和实际的标签数量有关，最大负例数量 = negative_ratio * 正例数量。该参数只对训练集有效，默认为5。为了保证评估指标的准确性，验证集和测试集默认构造全负例。
- ``is_shuffle``: 是否对数据集进行随机打散，默认为True。
- ``seed``: 随机种子，默认为1000.
+其中，参数``negative_ratio``表示对于一个样本，为每个子任务（属性级的观点抽取，属性级的情感分类）最多生成``negative_ratio``个负样本。如果额外提供了属性同义词标或隐性观点抽取词表，将结合两者信息生成更多的负样本，以增强属性聚合和隐性观点抽取能力。


这里的参数介绍，是否要保留 label_studio_file 、 task_type 相关的

wawltor · 2023-01-05T11:53:36Z

applications/sentiment_analysis/unified_sentiment_extraction/label_studio.py

+    "sentiment_prompt_prefix": "情感倾向",
+    "separator": "##",
+    "not_mentioned_option": "未提及",
+    "options": "正向,负向,未提及",


这里的options被写死了，如果用户想自定义 "正向,负向,中性" 类似的options，这块的定制能力是不是没有了？

如沟通，当前只对外暴露的options，保持情感分类定制能力

wawltor

LGTM

1649759610 and others added 30 commits June 24, 2022 13:43

initial commit

37d27f9

refine readme

d65208e

refine codestyle

ac4d644

refine readme

3f433b9

refine readme

d3f6ada

fix model saving bug

54ed34b

Merge branch 'develop' into develop

63b0a76

initial commit

4669194

Merge branch 'PaddlePaddle:develop' into develop

f6f93e1

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

fb3ade1

…o develop

initial commit

a83a902

initial commit

7bd988a

Merge branch 'PaddlePaddle:develop' into develop

700810a

Merge branch 'PaddlePaddle:develop' into develop

68e025a

use common metric instead of eval_metrics.py and remove unuseful code

02a997b

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

1500e5f

…o develop

Merge branch 'develop' into develop

faaf5f5

Merge branch 'PaddlePaddle:develop' into develop

6a512a0

Merge branch 'PaddlePaddle:develop' into develop

a99fc68

Merge branch 'PaddlePaddle:develop' into develop

4b5fa30

mv stage project to ASO_analysis

b837b67

add unified sentiment analysis

f415740

Merge branch 'PaddlePaddle:develop' into develop

41b020d

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

252860d

…o develop

refine readme

aed2d64

refine readme

8899a12

refnie readme

425a273

add unified sentiment analysis

acd9add

refine readme

4796016

Merge branch 'PaddlePaddle:develop' into develop

e857c6a

1649759610 and others added 14 commits December 30, 2022 12:03

fix running time for skep and uie

c344b8c

Merge branch 'PaddlePaddle:develop' into develop

df1ad55

fix bug to solve tokenizer updating problem

7f00796

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

1ea8757

…o develop

refine label-studio readme

3123147

refine label-studio readme

88060da

Merge branch 'PaddlePaddle:develop' into develop

ab18fc8

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

fc52a00

…o develop

refine label-studio readme

ac89d49

optimize example construction for a, o, as, ao extraction task

579631b

Merge branch 'PaddlePaddle:develop' into develop

b4364cf

Merge branch 'develop' of https://github.com/1649759610/PaddleNLP int…

30ea32f

…o develop

add the labeling method for ext task: a, as, ao and so on.

4bfab1c

add note for visual_analysis.py

091e216

wawltor reviewed Jan 5, 2023

View reviewed changes

1649759610 and others added 10 commits January 6, 2023 06:29

change link for downloading data and refine log output

0a5b991

refine log output

070bfda

refine readme

c34bedb

expose options interface

e5a94c2

refine readme

e624418

modify typos

a021936

expose options for customing sentiment analysis

b754bb1

README.md

561f607

Merge branch 'PaddlePaddle:develop' into develop

7a2b5ca

Merge branch 'PaddlePaddle:develop' into develop

1792554

wawltor approved these changes Jan 10, 2023

View reviewed changes

1649759610 merged commit f340ff5 into PaddlePaddle:develop Jan 10, 2023

1649759610 mentioned this pull request Jan 12, 2023

PaddleNLP 2.5.0 Release Note Candidate #4439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

1649759610 commented Jan 5, 2023

paddle-bot bot commented Jan 5, 2023

codecov bot commented Jan 5, 2023 •

edited

Loading

wawltor Jan 5, 2023

1649759610 Jan 6, 2023

wawltor Jan 5, 2023

1649759610 Jan 6, 2023

wawltor Jan 5, 2023

1649759610 Jan 6, 2023

wawltor Jan 5, 2023

1649759610 Jan 6, 2023

wawltor left a comment

support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

support other ext tasks except aso task and fix sentiment analysis based on SKEP #4357

Conversation

1649759610 commented Jan 5, 2023

PR types

PR changes

Description

paddle-bot bot commented Jan 5, 2023

codecov bot commented Jan 5, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 5, 2023 •

edited

Loading