Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

深度学习系列1:设置 AWS & 图像识别 #1967

Merged
merged 2 commits into from
Aug 7, 2017
Merged

深度学习系列1:设置 AWS & 图像识别 #1967

merged 2 commits into from
Aug 7, 2017

Conversation

TobiasLee
Copy link

Issue:#1926

@Tina92
Copy link

Tina92 commented Aug 4, 2017

@sqrthree 校对认领

@linhe0x0
Copy link
Member

linhe0x0 commented Aug 4, 2017

@Tina92 好的呢 🍺

Copy link

@Tina92 Tina92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqrthree 校对完成


*This post is part of a series on deep learning. Check-out part 2 *[*here*](https://medium.com/@r.ruizendaal/deep-learning-2-f81ebe632d5c)* and part 3 *[*here*](https://medium.com/@r.ruizendaal/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d)*.*
**这篇文章是深度学习系列的第一部分。你可以在[这里](https://medium.com/@r.ruizendaal/deep-learning-2-f81ebe632d5c)查看第二部分,以及[这里](https://medium.com/@r.ruizendaal/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d)查看第三部分。**

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

译文中图片还是要加上的


This week: classifying images of cats and dogs
Welcome to this first entry in this series on practical deep learning. In this entry I will setup the Amazon Web Services (AWS) instance and use a pre-trained model to classify images of cats and dogs.
欢迎阅读本系列第一篇关于实战深度学习的文章。在本文中,我将设置 Amazon Web Services(AWS)实例,并使用预先训练的模型对猫和狗的图像进行分类。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

设置 => 创建


In this complete series I will be blogging about my process in the first part of the Fast AI deep learning course. This course was first given at the Data Institute at the University of San Francisco and is now available as a MOOC. Recently the authors gave part 2 of the course which will become available online in a couple of months. The main reason for following this course is my extreme interest in deep learning. I have found many online resources regarding machine learning but practical courses on deep learning seem to be a rarity. Deep learning seems to be an exclusive group that is just a little harder to get into. The first thing needed to start on deep learning is a GPU. In this course we use the p2 instance from AWS. Let’s get that set up.
在这个完整的系列里,我会记录下我在 Fast AI 深度学习课程的第一部分内容的进度。这门课程是由旧金山大学数据研究所提供的并且现在能够在 MOOC 上观看。最近,这门课的作者提供了第二部分的内容,并且在接下来的几个月都可以在网上观看。我上这门课的主要是因为我对深度学习有着强烈的兴趣。我在网上发现了许多关于机器学习的课程,但有关深度学习的实战课程还是比较少见的。深度学习似乎因为进入门槛略高一点,而被单独列出。开始深度学习之前我们首先需要一个 GPU,在这门课程里我们会使用 AWS 的 p2 实例。现在让我们一起来准备它。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这门课程是由旧金山大学数据研究所提供的

  • 这门课程最初是由旧金山大学数据研究所提供的


The first week of this course really focused on the setup. Getting your deep learning setup right can take a while and it is important to get everything working correctly. This includes setting up AWS, creating and configuring the GPU instance, setting up the process of ssh-ing into the server and managing your directories.
这门课程第一周,我们会把重点放在准备工作上。做好深度学习的准备会花上一点时间,但把所有准备工作做好,并且正确运行时很重要的。这包括了设置 AWS,创建和配置 GPU 实例,设置 ssh 连接服务器以及管理你的目录。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 做好深度学习的准备会花上一点时间,但把所有准备工作做好,并且正确运行时很重要的。

  • 正确的准备深度学习需要一点时间,但这对一切能正确运行很重要。

[Here](https://gist.github.com/LeCoupa/122b12050f5fb267e75f) is a Bash cheat sheet for everyone who is not familiar with Bash. I greatly recommend this since you will need Bash to interact with your instance.

- The Anaconda install. The video mentions that you should install Anaconda before installing Cygwin. This can be a bit confusing as you need to use the ‘Cygwin python’ to run the pip commands in there and not a local Anaconda distribution.
- Anaconda 的安装。视频中提到你需要在安装 Cygwin 之前先安装 Anaconda,你可能感到有些疑惑,因为你需要用“Cygwin python”来运行 pip 命令而不是一个本地的 Anaconda 分发版。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

安装 Anaconda,=> 安装 Anaconda。

In this first lesson the goal is to use a pre-trained model, namely Vgg16 to classify images of cats and dogs. Vgg16 is a lightweight version of the model that won the Imagenet challenge in 2014. This is a yearly challenge and probably the biggest one in computer vision. We can take this pre-trained model and apply it to our dataset of cat and dog images. Our dataset has been edited by the authors of the course to make sure it is in the right format for our model. The original dataset can be found on [Kaggle](https://www.kaggle.com/c/dogs-vs-cats). When this competition was originally run in 2013, the state of the art was 80% accuracy. Our simple model will already achieve 97% accuracy. Mind-blowing right? This is how some of the pictures and their predicted labels look:
神经网络是通过模仿人脑而设计的。根据通用近似定理,它理论上能拟合任何函数。神经网络通过反向传播算法来训练,这使得我们能够调整模型的参数来适应不同的函数。最后一个原因,也是深度学习近期取得众多成就的主要原因。因为游戏行业的进步和 GPU 计算能力的强劲发展,现在我们以非常快速和可扩展的方式来训练深层的神经网络。

在第一节课里,我们的目标是使用一个叫做 Vgg16 的预先训练好的模型,来对猫和狗的图片进行分类。Vgg16 是 2014 年赢得 Imagenet 比赛模型的一个轻量级版本。这是一个年度的比赛并且可能是计算机视觉方面最大的一个比赛。我们可以利用这预先训练好的模型,并且把它应用到我们的猫和狗的图片数据集上。我们的数据集已经被课程的作者编辑过了,以确保它的格式正确。原始的数据集可以在[Kaggle](https://www.kaggle.com/c/dogs-vs-cats)上找到。这场比赛最初是在2013年进行的,那是最高的准确率是 80%。而我们的简单模型已经能够达到97%的准确度。大脑现在还清醒吧?下面是一些照片和他们被预测的标记:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kaggle 前后空格

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那是最高的准确率 => 那时的准确率

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有数字前后也需要空格。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

喔喔好的

In order to run our model we use the Keras library. This library sits on top of the popular deep learning libraries Theano and TensorFlow. Keras basically makes it more intuitive to code your network. This means that you can focus more on the structure of the network and worry less about the TensorFlow API. In order to know which picture belongs to which class Keras looks at the directory it is stored in. Therefore, it is important to make sure you move the images to the correct directories. The bash commands that are needed to do this can be run directly from the Jupyter Notebook where we do all our coding. [This](https://www.cyberciti.biz/faq/mv-command-howto-move-folder-in-linux-terminal/) link contains additional information on these commands.

One epoch, which is a full pass through the dataset, takes 10 minutes on my Amazon p2 instance. In this case that dataset is the training set which consists of 23.000 images. The other 2000 images are in the validation set. I decided to use 3 epochs here. The accuracy on the validation set is around 98%. After training the model we can take a look at some of the correctly classified images. In this case we use the probabilities of the image being a cat. 1.0 refers to full confidence that the image is of a cat and 0.0 that the image is of a dog.
一个 epoch,也就是在数据集完整地跑一遍,在我的 Amazon p2 实例上花费了10分钟时间。在这个例子里数据集是包含 23000 张图片的训练数据集,另外的 2000 张图片被保留下来作为验证数据集。在这里我决定使用 3 个 epoch。在验证数据集上的准确度在 98% 左右。训练好模型之后,我们可以看一些被正确分类的图片。在这个例子里,我们用图片中是一只猫的概率作为结果。1.0 表示模型非常自信地认为图片中是一只猫,而 0.0 则表示图片z中是一只狗。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉 z

Copy link
Member

@linhe0x0 linhe0x0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有一些小问题需要调整下。


*This post is part of a series on deep learning. Check-out part 2 *[*here*](https://medium.com/@r.ruizendaal/deep-learning-2-f81ebe632d5c)* and part 3 *[*here*](https://medium.com/@r.ruizendaal/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d)*.*
**这篇文章是深度学习系列的第一部分。你可以在[这里](https://medium.com/@r.ruizendaal/deep-learning-2-f81ebe632d5c)查看第二部分,以及[这里](https://medium.com/@r.ruizendaal/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d)查看第三部分。**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


I had a similar issue with the aws-alias.sh script. Adding a ‘ at the end of line 7 fixed this issue. Here is a before and after of line 7:
我在 aws-alias.sh 这个脚本上也遇到了同样的问题,在第七行的末尾加上 `'` 能够解决这个问题。下面是修改前和修改后的第七行:

alias aws-state='aws ec2 describe-instances --instance-ids $instanceId --query "Reservations[0].Instances[0].State.Name"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的排版 GitHub 解析失败了,需要调整一下。

image

In this first lesson the goal is to use a pre-trained model, namely Vgg16 to classify images of cats and dogs. Vgg16 is a lightweight version of the model that won the Imagenet challenge in 2014. This is a yearly challenge and probably the biggest one in computer vision. We can take this pre-trained model and apply it to our dataset of cat and dog images. Our dataset has been edited by the authors of the course to make sure it is in the right format for our model. The original dataset can be found on [Kaggle](https://www.kaggle.com/c/dogs-vs-cats). When this competition was originally run in 2013, the state of the art was 80% accuracy. Our simple model will already achieve 97% accuracy. Mind-blowing right? This is how some of the pictures and their predicted labels look:
神经网络是通过模仿人脑而设计的。根据通用近似定理,它理论上能拟合任何函数。神经网络通过反向传播算法来训练,这使得我们能够调整模型的参数来适应不同的函数。最后一个原因,也是深度学习近期取得众多成就的主要原因。因为游戏行业的进步和 GPU 计算能力的强劲发展,现在我们以非常快速和可扩展的方式来训练深层的神经网络。

在第一节课里,我们的目标是使用一个叫做 Vgg16 的预先训练好的模型,来对猫和狗的图片进行分类。Vgg16 是 2014 年赢得 Imagenet 比赛模型的一个轻量级版本。这是一个年度的比赛并且可能是计算机视觉方面最大的一个比赛。我们可以利用这预先训练好的模型,并且把它应用到我们的猫和狗的图片数据集上。我们的数据集已经被课程的作者编辑过了,以确保它的格式正确。原始的数据集可以在[Kaggle](https://www.kaggle.com/c/dogs-vs-cats)上找到。这场比赛最初是在2013年进行的,那是最高的准确率是 80%。而我们的简单模型已经能够达到97%的准确度。大脑现在还清醒吧?下面是一些照片和他们被预测的标记:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有数字前后也需要空格。


$ pip install kaggle-cli
训练数据集中有 25000 张已经被标记为猫或是的狗的图片,测试数据集中则包含 12500 张未被标记的图片。为了调整参数,我们还通过占用训练集的一小部分来创建验证数据集。设置一个完整数据集的“样本”也很有用,可以用来快速检查你的模型在构建过程中是否正常工作。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和上一行之间需要加一个空行。


If you liked this posts be sure to recommend it so others can see it. You can also follow this profile to keep up with my process in the Fast AI course. See you there!
同时,感谢所有更新 Github 脚本的人,这可帮了大忙!另外也要感谢所有参与 Fast AI 论坛的人,你们太棒了。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『Github』=>『GitHub』

In order to run our model we use the Keras library. This library sits on top of the popular deep learning libraries Theano and TensorFlow. Keras basically makes it more intuitive to code your network. This means that you can focus more on the structure of the network and worry less about the TensorFlow API. In order to know which picture belongs to which class Keras looks at the directory it is stored in. Therefore, it is important to make sure you move the images to the correct directories. The bash commands that are needed to do this can be run directly from the Jupyter Notebook where we do all our coding. [This](https://www.cyberciti.biz/faq/mv-command-howto-move-folder-in-linux-terminal/) link contains additional information on these commands.

One epoch, which is a full pass through the dataset, takes 10 minutes on my Amazon p2 instance. In this case that dataset is the training set which consists of 23.000 images. The other 2000 images are in the validation set. I decided to use 3 epochs here. The accuracy on the validation set is around 98%. After training the model we can take a look at some of the correctly classified images. In this case we use the probabilities of the image being a cat. 1.0 refers to full confidence that the image is of a cat and 0.0 that the image is of a dog.
一个 epoch,也就是在数据集完整地跑一遍,在我的 Amazon p2 实例上花费了10分钟时间。在这个例子里数据集是包含 23000 张图片的训练数据集,另外的 2000 张图片被保留下来作为验证数据集。在这里我决定使用 3 个 epoch。在验证数据集上的准确度在 98% 左右。训练好模型之后,我们可以看一些被正确分类的图片。在这个例子里,我们用图片中是一只猫的概率作为结果。1.0 表示模型非常自信地认为图片中是一只猫,而 0.0 则表示图片z中是一只狗。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『10分钟』数字前后加一个空格。

@TobiasLee
Copy link
Author

@sqrthree 根据校对意见修改

@linhe0x0 linhe0x0 merged commit 19ad80b into xitu:master Aug 7, 2017
@linhe0x0
Copy link
Member

linhe0x0 commented Aug 7, 2017

已经 merge 啦~ 快快麻溜发布到掘金专栏然后给我发下链接,方便及时添加积分哟。

@TobiasLee TobiasLee deleted the translation/deep-learning-1-setting-up-aws-image-recognition branch August 7, 2017 05:12
@TobiasLee
Copy link
Author

cdadar pushed a commit to cdadar/gold-miner that referenced this pull request Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants