Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add bad case analysis for text classification #3385

Merged
merged 6 commits into from
Nov 1, 2022

Conversation

lugimzzz
Copy link
Contributor

@lugimzzz lugimzzz commented Sep 28, 2022

PR types

Others

PR changes

Others

Description

新增错误样例分析方案
-基于token级别的可信分析(LIME、GradShap、InterGrad)
-基于特征级别的可信分析(FeatureSimilarity)

@@ -3,13 +3,16 @@
**目录**
* [analysis模块介绍](#analysis模块介绍)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个名字Analysis A大写会更标准

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -3,13 +3,16 @@
**目录**
* [analysis模块介绍](#analysis模块介绍)
* [模型评估](#模型评估)
* [错误样例分析](#错误样例分析)
* [稀疏数据筛选方案](#稀疏数据筛选方案)
* [脏数据清洗方案](#脏数据清洗方案)
* [数据增强策略方案](#数据增强策略方案)

## analysis模块介绍
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analysis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改


**安装TrustAI**
```shell
pip install trustai==0.1.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议使用 >=的方式来控制版本

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同时提需求给TrustAI同学尽可能保持兼容

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

pip install trustai==0.1.7
```

**安装interpretdl**(可选)如果使用词级别可解释性分析GradShap方法,需要安转interpretdl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

安装InterpretDL,主要产品品牌名的大小写正确

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为InterpretDL


可支持配置的参数:

* `device`: 选用什么设备进行训练,选择cpu、gpu、xpu、npu;默认为"gpu"。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“可选择cpu、gpu、xpu、npu;”加个可

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已在所有的文档中修改

parser.add_argument("--top_k", type=int, default=3, help="Top K important training data.")
parser.add_argument("--train_file", type=str, default="train.txt", help="Train dataset file name")
parser.add_argument("--interpret_file", type=str, default="bad_case.txt", help="interpretation file name")
parser.add_argument("--interpreted_file", type=str, default="sent_interpret.txt", help="interpreted file name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个文件的help解释很难理解区分

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为interpret_input_file、interpret_result_file

* `top_k`:筛选支持训练证据数量;默认为3。
* `train_file`:本地数据集中训练集文件名;默认为"train.txt"。
* `interpret_file`:本地数据集中待分析文件名;默认为"bad_case.txt"。
* `interpreted_file`:保存句子级别可解释性结果文件名;默认为"sent_interpret.txt"。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个args命名再斟酌下,不是太容易区分。
是否按照interpret_input/interpret_result 这类更直接的描述来区分

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为interpret_input_file、interpret_result_file

@lugimzzz lugimzzz requested a review from wawltor October 12, 2022 12:01
Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lugimzzz lugimzzz merged commit bdd659d into PaddlePaddle:develop Nov 1, 2022
@lugimzzz lugimzzz deleted the badcase branch November 1, 2022 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants