sentiment analysis translation (#270) #444

westeast · 2016-11-11T16:09:16Z

情感分析部分翻译成中文~

merge the latest commits

merge latest baidu paddle develop

This reverts commit 47b7d52.

This reverts commit 0f00419.

coveralls · 2016-11-11T17:00:00Z

Coverage decreased (-0.004%) to 62.901% when pulling 476e94f on westeast:master into ef5e483 on baidu:develop.

qingqing01

另外需要修改 Paddle/doc_cn/demo/index.rst里情感分析后面的链接。

qingqing01 · 2016-11-14T01:57:58Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+
+### IMDB 数椐准备
+
+在这个例子中，我们只使用已经标注过的训练集和测试集，且默认在测试集上构建字典，而不使用IMDB数椐集中的imdb.vocab做为字典。训练集已经做了随机打乱排序而测试集没有。 Moses 工具中的脚本`tokenizer.perl` 用于切分单单词和标点符号。执行下面的命令就可以预处理数椐。


这句翻译不对： "默认在测试集上构建字典" -> "默认在训练集上构建字典"。

qingqing01 · 2016-11-14T02:00:07Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+```
+dict.txt  labels.list  test.list  test_part_000  train.list  train_part_000
+```
+* test\_part\_000 and train\_part\_000: 所有标记的训练集和测试集. 训练集已经随机打乱。


"训练集和测试集" 调换下顺序吧，和前面test_xx and train_xx对应。

qingqing01 · 2016-11-14T02:05:05Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+
+## 训练模型
+
+在这步任务中,我们使用了循环神经网络（RNN）的 LSTM 架构来训练情感分析模型。 引入LSTM模型主要是为了克服消失梯度的问题。 LSTM网络类似于具有隐藏层的标循环现神经网络, 但是隐藏层中的每个普通节点被一个记忆单元替换。 每个记忆单元包含四个主要的元素: 输入门, 具有自循环连接的神经元，忘记门和输出门。 更多的细节可以在文献中找到[4]。 LSTM架构的最大优点是它可以在长时间间隔内记忆信息，而没有短时记忆的损失。在有新的单词来临的每一个时间步骤内，存储在记忆单元区块的历史信息被更新用来迭代的学习单词以合理的序列程现。


"标循环现神经网络" -> "标准循环现神经网络"

qingqing01 · 2016-11-14T02:13:23Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+<center>![LSTM](../../../doc/demo/sentiment_analysis/lstm.png)</center>
+<center>图表 1. LSTM [3]</center>
+
+情感分析是自然语言理解中最典型的问题之一。 它的目的是预测在一个序列中表达的情感态度。 通常, ，仅仅是一些关键词，如形容词和副词，在预测序列或段落的情感中起主要作用。然而有些评论上下文非常长，例如 IMDB的数椐集。 我们只所以使用LSTM来执行这个任务是因为其改进的设计并且具有门机制。 首先，它能够从字级到具有可变上下文长度的上下文级别（其通过门值来适配）来总结表示。 第二，它可以在句子级别利用可扩展的上下文, 而大多数方法只是利用n-gram级别的知识。第三，它直接学习段落表示，而不是组合上下文级别信息。


"word level" 应该翻译成 "词级别"， "首先，它能够从字级" 这里要改。

"(其通过门值来适配)" 这里翻译不准确，括号里的可以去掉。

qingqing01 · 2016-11-14T02:14:50Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+
+#### 双向LSTM
+
+图2是双向LSTM网络，由全连接层和softmax层组成。


"由全连接层和softmax层组成" -> "后面连全连接层和softmac层"。

改成了->"后面连全连接层和softmax层"

qingqing01 · 2016-11-14T02:44:40Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+- CurrentCost=xx: 最新的log_period批处理的当前时间花费。
+- Eval: classification\_error\_evaluator=xx: 表示第0批次到当前批次的分类错误。
+- CurrentEval: classification\_error\_evaluator: 最新日志周期批次的分类错误。
+- Pass=0: 通过所有训练集一次称为一遍。 0表示第一次经过训练集。


上面所有的批都改成batch吧。 log_period就直接写成log_period，不要翻译成"日志周期"。

qingqing01 · 2016-11-14T02:45:11Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+- Pass=0: 通过所有训练集一次称为一遍。 0表示第一次经过训练集。
+
+默认情况下，我们使用`stacked_lstm_net`网络，当传递相同的样本数时，它的收敛速度比`bidirectional_lstm_net`快。If you want to use bidirectional LSTM, 如果要使用双向LSTM，只需删除最后一行中的注释并把“stacked_lstm_net”注释掉。
+


If you want to use bidirectional LSTM, 去掉。

qingqing01 · 2016-11-14T02:46:30Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+             2>&1 | tee 'test.log'
+```
+
+函数`get_best_pass`通过计算分类错误率获得最佳模型进行测试。 在本示例中，我们默认使用IMDB的测试数据集作为验证。 与训练不同，它需要在这里指定`--job = test`和模型路径，即`--model_list = $model_list`。如果运行成功，日志将保存在“demo / sentiment / test.log”的路径中。例如，在我们的测试中，最好的模型是`model_output / pass-00002`，分类误差是0.115645，如下：


通过计算分类错误率-> 依据分类错误率。

qingqing01 · 2016-11-14T02:47:10Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+*  -d data/pre-imdb/dict.txt: 设置字段文件。
+*  -i data/aclImdb/test/pos/10014_7.txt: 设置一个要预测的示例文件。
+
+注意你应该确保默认模型路径`model_output / pass-00002`存在或更改为其它模型路径。


"你" 去掉吧

qingqing01 · 2016-11-14T02:48:46Z

doc_cn/demo/sentiment_analysis/index.rst

+    :glob:
+
+    Training Locally <sentiment_analysis.md>
+    internal/cluster_train.md


去掉 internal/cluster_train.md。

coveralls · 2016-11-14T06:33:14Z

Coverage increased (+0.04%) to 62.945% when pulling 56e31f7 on westeast:master into ef5e483 on baidu:develop.

qingqing01 · 2016-11-14T12:48:21Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+另一方面，抓取产品的用户评论并分析他们的情感，有助于理解用户对不同公司，不同产品，甚至不同竞争对手产品的偏好。
+
+本教程将指导您完成长期短期记忆（LSTM）网络的训练过程，以分类来自[大型电影评论数据集](http://ai.stanford.edu/~amaas/data/sentiment/)（有时称为[互联网电影数据库 (IMDB)](http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf)）的句子的情感 。 此数据集包含电影评论及其相关联的二进制情绪极性标签，即正面和负面。
+


相关联的二进制情绪极性标签 -> 类别标签

qingqing01 · 2016-11-14T12:53:13Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+```
+
+* **数椐定义**:
+   * get\_config\_arg(): 获取通过 `--config_args=xx` i设置的命令行参数。


设置前面的i去掉。

qingqing01 · 2016-11-14T12:55:20Z

doc_cn/demo/sentiment_analysis/sentiment_analysis.md

+*  -n $config : 设置网络配置。
+*  -w $model: 设置模型路径。
+*  -b $label: 设置标签类别字典，这个字典是整数标签和字符串标签的一个对应。
+*  -d data/pre-imdb/dict.txt: 设置字段文件。


字段->字典。

coveralls · 2016-11-14T13:45:05Z

Coverage increased (+0.03%) to 62.933% when pulling 3fbdd49 on westeast:master into ef5e483 on baidu:develop.

coveralls · 2016-11-14T13:55:56Z

Coverage increased (+0.03%) to 62.933% when pulling 3fbdd49 on westeast:master into ef5e483 on baidu:develop.

coveralls · 2016-11-15T05:51:53Z

Coverage increased (+0.07%) to 62.974% when pulling 36d60e3 on westeast:master into ef5e483 on baidu:develop.

coveralls · 2016-11-15T08:16:02Z

Coverage increased (+0.06%) to 62.965% when pulling 7984320 on westeast:master into ef5e483 on baidu:develop.

Revert "sentiment analysis translation detail fix" This reverts commit 0f00419. sentiment analysis translation fix errors sentiment analysis translation fix errors sentiment analysis translation fix errors sentiment analysis translation fix errors sentiment analysis doc_cn update for qingqing01 review sentiment analysis translation fix index link sentiment analysis doc_cn update for qingqing01 review sentiment analysis translation update link sentiment analysis translation update link

coveralls · 2016-11-15T08:55:29Z

Coverage increased (+0.06%) to 62.97% when pulling 20185fd on westeast:master into ef5e483 on baidu:develop.

…a_menu Delete toctree extra menu

* release ernie-gen * add .meta * del tag * fix tag

* fix pretrain reader encoding * dygraph * update bce * fix readme * update readme * fix readme * remove debug log * remove chn example * fix pretrain * tokenizer can escape special token rename mlm_bias * update readme * 1. MRC tasks 2. optimize truncate strategy in tokenzier 3. rename `ernie.cls_fc` to `ernie.classifyer` 4. reorg pretrain dir * + doc string * update readme * Update README.md * Update README.md * Update README.md * update pretrain * Update README.md * show gif * update readme * fix gif; disable save_dygrpah * bugfix * + save inf model + inference + distill + readme * Update README.md * + propeller server * Update README.md * Update README.md * Update README.md * + sementic analysis(another text classification) * transformer + cache fix tokenization bug * update reamde; fix tokenization * Update README.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * infer senti analysis * + readme header * transformer cache has gradients * + seq2seq * +experimental * reorg * update readme * Update README.md * seq2seq * + cnndm evluation scripts * update README.md * +zh readme * + publish ernie gen model * update README.md * Update README.zh.md * Update README.md * Update README.zh.md * Update README.md * Update README.zh.md * Update README.md * Update README.zh.md * Update README.md * Update README.zh.md * Update README.zh.md * Add files via upload * Update README.md * Update README.zh.md * Add files via upload * release ernie-gen (PaddlePaddle#444) * release ernie-gen * add .meta * del tag * fix tag * Add files via upload * Update README.zh.md * Update README.md * Update and rename README.zh.md to README.eng.md * Update README.eng.md * Update README.md * Update README.eng.md * Update README.md * Update README.eng.md * Update README.md * Update README.md * Update README.md * Update README.eng.md * Update README.eng.md * Update README.eng.md * Update README.md * Update README.md * Update README.md * Update README.md Co-authored-by: kirayummy <shi.k.feng@gmail.com> Co-authored-by: zhanghan <zhanghan17@baidu.com>

…asets Add some clue datasets

* remove IpuCustomOpIdentifier in python * add getter * fix ci error * clean api * update log

westeast and others added 16 commits October 27, 2016 11:48

Merge pull request #1 from baidu/master

309abff

merge the latest commits

doc_cn

277f283

doc_cn

2dfaa05

sentiment analysis translation

a834e74

sentiment analysis translation detail

a9e76b0

sentiment analysis translation detail

eaa0cec

sentiment analysis translation detail

47b7d52

Merge pull request #2 from baidu/develop

96274ab

merge latest baidu paddle develop

sentiment analysis translation detail fix

0f00419

Merge remote-tracking branch 'origin/master'

b6c0cc1

Revert "sentiment analysis translation detail"

f5ad3ce

This reverts commit 47b7d52.

Revert "sentiment analysis translation detail fix"

789b702

This reverts commit 0f00419.

sentiment analysis translation fix errors

2fe61ce

sentiment analysis translation fix errors

90be101

sentiment analysis translation fix errors

1d92b99

sentiment analysis translation fix errors

476e94f

reyoung assigned qingqing01 Nov 12, 2016

qingqing01 requested changes Nov 14, 2016

View reviewed changes

sentiment analysis doc_cn update for qingqing01 review

56e31f7

qingqing01 requested changes Nov 14, 2016

View reviewed changes

westeast added 2 commits November 14, 2016 21:21

sentiment analysis translation fix index link

8a2723a

sentiment analysis doc_cn update for qingqing01 review

3fbdd49

sentiment analysis translation update link

36d60e3

sentiment analysis translation update link

7984320

westeast added 2 commits November 15, 2016 16:25

Merge branch 'master' of https://github.com/westeast/Paddle

20185fd

westeast closed this Nov 15, 2016

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019

Merge pull request PaddlePaddle#444 from shanyi15/delete_toctree_extr…

3478634

…a_menu Delete toctree extra menu

Meiyim pushed a commit to Meiyim/Paddle that referenced this pull request May 21, 2021

release ernie-gen (PaddlePaddle#444)

2a4965c

* release ernie-gen * add .meta * del tag * fix tag

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

Merge pull request PaddlePaddle#444 from LiuChiachi/add-some-clue-dat…

8fe0f91

…asets Add some clue datasets

gglin001 added a commit to graphcore/Paddle-fork that referenced this pull request Mar 17, 2022

remove IpuCustomOpIdentifier in python (PaddlePaddle#444)

951ccd4

* remove IpuCustomOpIdentifier in python * add getter * fix ci error * clean api * update log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentiment analysis translation (#270) #444

sentiment analysis translation (#270) #444

westeast commented Nov 11, 2016

coveralls commented Nov 11, 2016

qingqing01 left a comment •

edited

Loading

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

westeast Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

coveralls commented Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

qingqing01 Nov 14, 2016

coveralls commented Nov 14, 2016

coveralls commented Nov 14, 2016

coveralls commented Nov 15, 2016

coveralls commented Nov 15, 2016

coveralls commented Nov 15, 2016


		### IMDB 数椐准备

		在这个例子中，我们只使用已经标注过的训练集和测试集，且默认在测试集上构建字典，而不使用IMDB数椐集中的imdb.vocab做为字典。训练集已经做了随机打乱排序而测试集没有。 Moses 工具中的脚本`tokenizer.perl` 用于切分单单词和标点符号。执行下面的命令就可以预处理数椐。


		## 训练模型

		在这步任务中,我们使用了循环神经网络（RNN）的 LSTM 架构来训练情感分析模型。引入LSTM模型主要是为了克服消失梯度的问题。 LSTM网络类似于具有隐藏层的标循环现神经网络, 但是隐藏层中的每个普通节点被一个记忆单元替换。每个记忆单元包含四个主要的元素: 输入门, 具有自循环连接的神经元，忘记门和输出门。更多的细节可以在文献中找到[4]。 LSTM架构的最大优点是它可以在长时间间隔内记忆信息，而没有短时记忆的损失。在有新的单词来临的每一个时间步骤内，存储在记忆单元区块的历史信息被更新用来迭代的学习单词以合理的序列程现。


		#### 双向LSTM

		图2是双向LSTM网络，由全连接层和softmax层组成。

		- Pass=0: 通过所有训练集一次称为一遍。 0表示第一次经过训练集。

		默认情况下，我们使用`stacked_lstm_net`网络，当传递相同的样本数时，它的收敛速度比`bidirectional_lstm_net`快。If you want to use bidirectional LSTM, 如果要使用双向LSTM，只需删除最后一行中的注释并把“stacked_lstm_net”注释掉。

		另一方面，抓取产品的用户评论并分析他们的情感，有助于理解用户对不同公司，不同产品，甚至不同竞争对手产品的偏好。

		本教程将指导您完成长期短期记忆（LSTM）网络的训练过程，以分类来自[大型电影评论数据集](http://ai.stanford.edu/~amaas/data/sentiment/)（有时称为[互联网电影数据库 (IMDB)](http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf)）的句子的情感。此数据集包含电影评论及其相关联的二进制情绪极性标签，即正面和负面。

sentiment analysis translation (#270) #444

sentiment analysis translation (#270) #444

Conversation

westeast commented Nov 11, 2016

coveralls commented Nov 11, 2016

qingqing01 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Nov 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Nov 14, 2016

coveralls commented Nov 14, 2016

coveralls commented Nov 15, 2016

coveralls commented Nov 15, 2016

coveralls commented Nov 15, 2016

qingqing01 left a comment •

edited

Loading