Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retrieval based classification #2836

Merged
merged 9 commits into from
Aug 5, 2022
Merged

Conversation

w5688414
Copy link
Contributor

@w5688414 w5688414 commented Jul 19, 2022

PR types

以前的分类任务中,标签信息作为无实际意义,独立存在的one-hot编码形式存在,这种做法会潜在的丢失标签的语义信息,本方案把文本分类任务中的标签信息转换成含有语义信息的语义向量,将文本分类任务转换成向量检索和匹配的任务。这样做的好处是对于一些类别标签不是很固定的场景,或者需要经常有一些新增类别的需求的情况非常合适。另外,对于一些新的相关的分类任务,这种方法也不需要模型重新学习或者设计一种新的模型结构来适应新的任务。总的来说,这种基于检索的文本分类方法能够有很好的拓展性,能够利用标签里面包含的语义信息,不需要重新进行学习。

  • New features
模型 Accuracy 策略简要说明
ernie-3.0-medium-zh 50.580 ernie-3.0-medium-zh多分类,5个epoch,对于新增类别需要重新训练
In-batch Negatives + RocketQA 49.755 Inbatch-negative有监督训练,标签当作召回集,对新增类别不需要重新训练
In-batch Negatives + RocketQA + 投票 51.756 Inbatch-negative有监督训练,训练集当作召回集,对新增类别,需要至少一条的数据放入召回库中

PR changes

Others

Description

  • Add retrieval based classification

@w5688414 w5688414 self-assigned this Jul 19, 2022
@tianxin1860 tianxin1860 requested a review from wawltor July 19, 2022 12:50
@w5688414 w5688414 changed the title Add retrieval basd classification Add retrieval based classification Jul 19, 2022
@lugimzzz
Copy link
Contributor

文档格式最好可以和hierarchical/README.md对齐,保持文本application内统一。目前hierarchical/README.md文档已提交新的PR还未合入 #2868

@lugimzzz
Copy link
Contributor

如果使用bash脚本和直接运行python脚本没有太大区别,选一个就好,减少用户需要选择的部分

@Daemon-ser
Copy link

请问你这个是在哪个数据集上测试的啊

@w5688414
Copy link
Contributor Author

w5688414 commented Aug 2, 2022

请问你这个是在哪个数据集上测试的啊

在一个百度百科的数据集上

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@w5688414 w5688414 merged commit 36ccdcd into PaddlePaddle:develop Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants