-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text similarity task for Taskflow #1345
Conversation
docs/model_zoo/taskflow.md
Outdated
@@ -174,6 +176,20 @@ senta("作为老的四星酒店,房间依然很整洁,相当不错。机场 | |||
>>> [{'text': '作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。', 'label': 'positive', 'score': 0.984320878982544}] | |||
``` | |||
|
|||
### 文本匹配 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文本相似度计算更为直观
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
docs/model_zoo/taskflow.md
Outdated
@@ -11,6 +11,7 @@ | |||
- [文本纠错](#文本纠错) | |||
- [句法分析](#句法分析) | |||
- [情感分析](#情感分析) | |||
- [文本匹配](#文本匹配) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文本相似度
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
paddlenlp/taskflow/text_matching.py
Outdated
|
||
usage = r""" | ||
from paddlenlp import Taskflow | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taskname改为text similarity会不会更为表意?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
docs/model_zoo/taskflow.md
Outdated
|
||
similarity = Taskflow("text_similarity") | ||
similarity([["世界上什么东西最小", "世界上什么东西最小?"]]) | ||
>>> [{'query': '世界上什么东西最小', 'title': '世界上什么东西最小?', 'similarity': 0.992725}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入的key可能采用text1,text2 更加准确。如果用query和title会被倾向于认为是短文本与长文本匹配
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
[{'text1': '世界上什么东西最小', 'text2': '世界上什么东西最小?', 'similarity': 0.992725}] | ||
''' | ||
|
||
similarity = Taskflow("text_similarity", batch_size=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_size这地方需要手动配置吗?是否可以根据输入的size自动获得呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是说batch_size=1的话,不能同时输入两条?还是说这个batch size是作为predictor的关键参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_size目前是手动配置的,默认值是1,考虑是让用户结合机器本身情况配置
batch_size是predictor的关键参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议整体内部代码都体现为text1和text2,不要外头是text1内部是query
self.input_handles[1].copy_from_cpu(t_segment_ids) | ||
self.predictor.run() | ||
vecs_title = self.output_handle[1].copy_to_cpu() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议整体内部代码都体现为text1和text2。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
docs/model_zoo/taskflow.md
Outdated
similarity([["世界上什么东西最小", "世界上什么东西最小?"]]) | ||
>>> [{'text1': '世界上什么东西最小', 'text2': '世界上什么东西最小?', 'similarity': 0.992725}] | ||
|
||
similarity = Taskflow("text_similarity", batch_size=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是得告诉开发者,为什么这个batch_size=2有什么用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其他examples都得同步增强下这里的API参数描述。不然这里会误解,必须要设置batch size=2,才能传入两条样本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改,这里修改了代码示例,新增可配置参数说明
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Description
1.Add text similarity task for Taskflow