Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

机器之魂:聊天机器人是怎么工作的 #2011

Merged
merged 3 commits into from
Aug 14, 2017
Merged

机器之魂:聊天机器人是怎么工作的 #2011

merged 3 commits into from
Aug 14, 2017

Conversation

lsvih
Copy link
Member

@lsvih lsvih commented Aug 6, 2017

close #2008

@lsvih
Copy link
Member Author

lsvih commented Aug 6, 2017

#2008

@lsvih
Copy link
Member Author

lsvih commented Aug 6, 2017

看来必须要在 pr 的说明里就 reference issue 才行 XD

@lsvih
Copy link
Member Author

lsvih commented Aug 6, 2017

看来只能创建 pr 的时候就 reference issue,创建完再修改也识别不了了 XD

@TobiasLee
Copy link

@sqrthree 校对认领

@linhe0x0
Copy link
Member

@lileizhenshuai 好的呢 🍺

Copy link

@TobiasLee TobiasLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一校完成


In addition to striking a note, the movement of the drum can cause other automation, such as a moving figurine. Either way, the fundamental machinery of the music box remains the same.
除了发出音符之外,圆桶的转动还可以造成一些其它的动作,例如移动小雕像等。不管怎样,这个音乐盒的基本机械结构是不会变的。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

圆桶的转动还可以造成一些其它的动作 ->
圆桶的转动还可以附加一些其它的动作


We learned about classification early in elementary science: a chimpanzee is in the class “mammals”, a blue jay is in the class “birds”, the earth is in the class “planets” and so on. Simple.
你可以将分类器看成是将一段数据(一句话)分成多个分类(即意图)的一种方式。输入一句话“how are you?”,将被分类成一种意图,然后将其与一种回应(例如“I’m good”或者更好的“I am well”)联系起来。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一段数据(一句话)分成多个分类(即意图)的一种方式 ->
一段数据(一句话)分成几个分类中的一种(即某种意图)的一种方式


### **Chatbot text classification approaches**
一般来说,文本分类有 3 种不同的方法。可以将它们看做是为了一些特定目的制造的软件机械,就如同音乐盒的鼓一样。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

就如同音乐盒的鼓一样 ->
就如同音乐盒的圆柱桶一样
与上文保持一致


It knows who a physicist is only because his or her name has an associated pattern. Likewise it responds to anything solely because of an authored pattern. Given hundreds or thousands of patterns you might see a chatbot “persona” emerge.
它之所以知道别人问的是哪个物理学家,只是靠着与他或者她名字相关联的模式匹配。同样的,它靠着创作者模式可以对任何意图进行回应。在给予它成千上万种模式之后,你终将能看到一个“类人”的聊天机器人出现。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

它靠着创作者模式可以对任何意图进行回应 ->
它靠着创作者预设的模式可以对任何意图进行回应


Notice that the classification for “What’s it like outside” found a term in another class but the term similarities to the desired class produced a higher score. By using an equation we are looking for word matches given some sample sentences for each class, and we avoid having to identify every pattern.
这种分类器通过标定分类分值(计算词频)的方法给出最匹配语句的分类,但是它仍然有局限性。分值与概率不同,它仅仅能告诉我们句子的意图最有可能是哪个分类,而不能告诉我们它的所有匹配分类的可能性。因此,很难去给出一个阈值来判定是接受这个得分结果还是不接受这个结果。这种类型的算法给出的最高分仅仅能作为判断相关性的基础,本质上作为分类器还是比较差的。此外,这个算法不能接受 *is not* 类型的句子,因为它仅仅计算了 *it* 可能是啥。也就是说这种方法不适合做为包含 *not* 的否定句的分类。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本质上作为分类器还是比较差的 ->
它本质上作为分类器的效果还是比较差的


![](https://cdn-images-1.medium.com/max/1600/1*QckgibgJ74BhMaqinqwSDw.png)

The trained neural network is less code than an comparable algorithm but it requires a potentially large matrix of “weights”. In a relatively small sample, where the training sentences have 150 unique words and 30 classes this would be a matrix of 150x30. Imagine multiplying a matrix of this size 100,000 times to establish a sufficiently low error rate. This is where processing speed comes in.
训练好的神经网络模型的代码量其实很小,不过它需要一个很大的潜在权重矩阵。举个相对较小的样例,它的训练句子包括了 150 个单词、30 种分类,这可能产生一个 150x30 大小的矩阵;你可以想象一下,为了降低错误率,这么大的一个矩阵需要反复的进行 10 万次矩阵乘法。这也是为啥说需要高性能处理器的原因。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这也是为啥说 ->
这也是为什么说
更加书面化一点


If the neural network sounds magnificently sophisticated, relax, it boils down to [matrix multiplication](https://www.khanacademy.org/math/precalculus/precalc-matrices/multiplying-matrices-by-matrices/v/matrix-multiplication-intro) and a [formula for reducing values](https://en.wikipedia.org/wiki/Sigmoid_function) between -1 and 1 or some other minimal range. A middle-school math student could learn this in a few hours. The hard work is achieving clean training data.
神经网络之所以能够做到既复杂又稀疏,归结于[矩阵乘法](https://www.khanacademy.org/math/precalculus/precalc-matrices/multiplying-matrices-by-matrices/v/matrix-multiplication-intro)和一种[缩小值至 -1,1 区间的公式](https://en.wikipedia.org/wiki/Sigmoid_function)(即激活函数,这里指的是 Sigmoid),一个中学生也能在几小时内学会它。其实真正困难的工作是清洗训练数据。_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个中学生也能在几小时内学会它。其实真正困难的工作是清洗训练数据。_ ->
一个中学生也能在几小时内学会它。其实真正困难的工作是清洗训练数据。


Chatbot machinery is looking for patterns in collections of terms, each term is reduced to a token. In this machine* words have no meaning* except for their patterned existence within training data. The label “artificial intelligence” applied to such machinery [is mostly BS](https://medium.com/@gk_/the-ai-label-is-bullshit-559b171867ff#.3tlhftemt).
聊天机器人实质上就是寻找短语集合中摸的模式,每个短语还能再分割成单个单词。在聊天机器人内部,除了它们存在的模式以及训练数据之外的**单词其实并没有意义**。为这样的“机器人”贴上“人工智能”的标签其实[也很糟糕](https://medium.com/@gk_/the-ai-label-is-bullshit-559b171867ff#.3tlhftemt)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

聊天机器人实质上就是寻找短语集合中摸的模式 ->
聊天机器人实质上就是寻找短语集合中的模式


In addition to striking a note, the movement of the drum can cause other automation, such as a moving figurine. Either way, the fundamental machinery of the music box remains the same.
除了发出音符之外,圆桶的转动还可以造成一些其它的动作,例如移动小雕像等。不管怎样,这个音乐盒的基本机械结构是不会变的。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我去查了一下“圆筒”和“圆桶”的区别,似乎“桶”侧重用来盛放东西,有底;“筒”则是类似竹子一样的结构。
所以这里可能用圆筒(和最后一段)一样比较好?


But how do these machines work? First, wind back time and explore an earlier — yet similar — technology.
但是这些“机械”是如何运作的呢?首先,让我们回溯过去,探寻一种原始,但典型的技术。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

首先,让我们回溯过去,探寻一种原始,但典型的技术。 ->
首先,让我们回溯过去,探寻一种原始,但相似的技术。
觉得整篇文章的核心是音乐盒和聊天机器人原理类似,所以这边可能用相似更好?

@TobiasLee
Copy link

@sqrthree 一校完成

@jasonxia23
Copy link

@sqrthree 认领校对~

@linhe0x0
Copy link
Member

@jasonxia23 妥妥哒 🍻

@lsvih
Copy link
Member Author

lsvih commented Aug 12, 2017

@jasonxia23 hi,大佬有空的时候来校对一下哈

@jasonxia23
Copy link

@lsvih 昨晚已经校对一半了,今天团建,马上好,稍等哈

@lsvih
Copy link
Member Author

lsvih commented Aug 12, 2017

@jasonxia23 没事没事= =就提醒您一下。。现在改了也得等根号三上班才发-。-

Copy link

@jasonxia23 jasonxia23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqrthree @lsvih 校对完毕~

<aiml version = "1.0.1" encoding = "UTF-8"?>
<category>
<pattern> WHO IS ALBERT EINSTEIN </pattern>
<template>Albert Einstein was a German physicist.</template>
<pattern> WHO IS ALBERT EINSTEIN </pattern>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咦,这里缩进出问题了嘛~

</category>
</aiml>
```
````

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈哈,这个`是不是多了,后面代码区好像也有问题,帮忙检查一下哈


Artificial neural networks, invented in the 1940’s, are a way of calculating an output from an input (a classification) using weighted connections (“synapses”) that are calculated from repeated iterations through training data. Each pass through the training data alters the weights such that the neural network produces the output with greater “accuracy” (lower error rate).
人工神经网络发明于 19 世纪 40 年代,它通过迭代计算训练数据得到连接的加权值(“突触”),然后用于对输入数据进行分类。通过一次次使用训练数据计算改变加权值以使得神经网络的输出得到更高的“准确率”(低错误率)。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 世纪
=>
20 世纪


Artificial neural networks, invented in the 1940’s, are a way of calculating an output from an input (a classification) using weighted connections (“synapses”) that are calculated from repeated iterations through training data. Each pass through the training data alters the weights such that the neural network produces the output with greater “accuracy” (lower error rate).
人工神经网络发明于 19 世纪 40 年代,它通过迭代计算训练数据得到连接的加权值(“突触”),然后用于对输入数据进行分类。通过一次次使用训练数据计算改变加权值以使得神经网络的输出得到更高的“准确率”(低错误率)。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(“突触”),
=>
(“突触”),


As in the prior method, each class is given with some number of example sentences. Once again each sentence is broken down by word (stemmed) and each word becomes an input for the neural network. The synaptic weights are then calculated by iterating through the training data thousands of times, each time adjusting the weights slightly to greater accuracy. By recalculating back across multiple layers (“back-propagation”) the weights of all synapses are calibrated while the results are compared to the training data output. These weights are like a ‘strength’ measure, in a neuron the synaptic weight is what causes something to be more memorable than not. You remember a thing more because you’ve seen it more times: each time the ‘weight’ increases slightly.
在前面的方法里,每个分类都会给定一些例句。接着,根据词干进行分句,将所有单词作为神经网络的输入。然后遍历数据,进行成千上万次迭代计算,每次迭代都通过改变突触权重来得到更高的准确率。接着通过反过来通过对训练集输出值和神经网络计算结果的对比,对各层重新进行计算权重(反向传播)。这个“权重”可以类比成神经突触想记住某个东西的“力度”,你能记住某个东西是因为你曾多次见过它,在每次见到它的时候这个“权重”都会轻微地上升。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接着通过反过来通过
通过重复哈

@lsvih
Copy link
Member Author

lsvih commented Aug 13, 2017

@sqrthree 根据校对意见修改完毕

@linhe0x0 linhe0x0 merged commit 799c024 into xitu:master Aug 14, 2017
@linhe0x0
Copy link
Member

@lsvih 已经 merge 啦~ 快快麻溜发布到掘金专栏然后给我发下链接,方便及时添加积分哟。

cdadar pushed a commit to cdadar/gold-miner that referenced this pull request Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

机器之魂:聊天机器人是怎么工作的
4 participants