使用 Python 进行自动化特征工程 #4262

SimonMing47 · 2018-08-07T02:20:21Z

译文翻译完成，resolve #3990

SimonMing47 · 2018-08-07T02:20:33Z

@leviding @fanyijihua 翻译完成

yqian1991 · 2018-08-07T21:16:23Z

校对认领

fanyijihua · 2018-08-07T21:16:26Z

@yqian1991 好的呢 🍺

yqian1991

@mingxing47 @leviding 校对完成，翻译通畅，只是存在一些格式问题。

yqian1991 · 2018-08-08T13:16:29Z

TODO1/automated-feature-engineering-in-python.md

 > * 校对者：

-# Automated Feature Engineering in Python
+# Python 中的自动特征工程


翻译没有问题，只是个人建议，“Python 中的特征工程自动化”

yqian1991 · 2018-08-08T13:25:02Z

TODO1/automated-feature-engineering-in-python.md


 ![](https://cdn-images-1.medium.com/max/1000/1*lg3OxWVYDsJFN-snBY7M5w.jpeg)

-Machine learning is increasingly moving from hand-designed models to automatically optimized pipelines using tools such as [H20](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html), [TPOT](https://epistasislab.github.io/tpot/), and [auto-sklearn](https://automl.github.io/auto-sklearn/stable/). These libraries, along with methods such as [random search](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf), aim to simplify the model selection and tuning parts of machine learning by finding the best model for a dataset with little to no manual intervention. However, feature engineering, an [arguably more valuable aspect](https://www.featurelabs.com/blog/secret-to-data-science-success/) of the machine learning pipeline, remains almost entirely a human labor.
+机器学习正在利用诸如 [H20](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)、[TPOT](https://epistasislab.github.io/tpot/) 和 [auto-sklearn](https://automl.github.io/auto-sklearn/stable/) 等工具越来越多地从手工设计模型向自动化优化管道迁移。以上这些类库，连同如 [random search](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) 等方法一起，目的都是在通过找到适合于几乎不需要人工干预的数据集的最佳模型来简化机器学习的模型选择和调优部分。然而，特征工程，作为机器学习管道中一个[可以说是更有价值的方面](https://www.featurelabs.com/blog/secret-to-data-science-success/)，几乎全部是手工活。


“目的都是通过找到适合于几乎不需要人工干预的数据集的最佳模型来简化机器学习的模型选择和调优部分”
=>
"目的是在不需要人工干预的情况下找到适合于数据集的最佳模型，以此来简化器学习的模型选择和调优部分"

分一些句，可能更好理解些。

yqian1991 · 2018-08-08T13:25:29Z

TODO1/automated-feature-engineering-in-python.md


-[Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering), also known as feature creation, is the process of constructing new features from existing data to train a machine learning model. This step can be more important than the actual model used because a machine learning algorithm only learns from the data we give it, and creating features that are relevant to a task is absolutely crucial (see the excellent paper [“A Few Useful Things to Know about Machine Learning”](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)).
+[特征工程](https://en.wikipedia.org/wiki/Feature_engineering)，也成为特征创建，是从已有数据中创建出新特征并且用于训练机器学习模型的过程。这个步骤可能要比实际使用的模型更加重要，因为机器学习算法仅仅从我们提供给他的数据中进行学习，创建出与任务相关的特征是非常关键的（可以参照这篇文章 ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) —— 《了解机器学习的一些有用的事》，译者注）。


”成为“ => "称为"

yqian1991 · 2018-08-08T13:29:42Z

TODO1/automated-feature-engineering-in-python.md


-[Feature engineering](https://www.datacamp.com/community/tutorials/feature-engineering-kaggle) means building additional features out of existing data which is often spread across multiple related tables. Feature engineering requires extracting the relevant information from the data and getting it into a single table which can then be used to train a machine learning model.
+[特征工程](https://www.datacamp.com/community/tutorials/feature-engineering-kaggle)意味着从分布在多个相关表格中的现有数据集中构建出额外的特性。特征工程需要从数据中提取相关信息，并且将其放入一个单独的表中，然后可以用来训练机器学习模型。


“特性” => "特征"

yqian1991 · 2018-08-08T13:32:05Z

TODO1/automated-feature-engineering-in-python.md


-The process of constructing features is very time-consuming because each new feature usually requires several steps to build, especially when using information from more than one table. We can group the operations of feature creation into two categories: **transformations** and **aggregations**. Let’s look at a few examples to see these concepts in action.
+构建特征的过程非常耗时，因为每获取一项新的特征都需要很多步骤才能构建出来，尤其是当需要从多于一张表格中获取信息时。我们可以把特征创建的操作分成两类：**转换**和**聚集**。让我们通过几个例子的实战来看看这些概念。


聚集 => 聚合

yqian1991 · 2018-08-08T14:07:14Z

TODO1/automated-feature-engineering-in-python.md


-For more information on featuretools, including advanced usage, check out the [online documentation](https://docs.featuretools.com/). To see how featuretools is used in practice, read about the [work of Feature Labs](https://www.featurelabs.com/), the company behind the open-source library.
+要获取更多关于特征工具的信息，包括这些工具的高级用法，可以查阅[在线文档](https://docs.featuretools.com/)。要查看特征工具如何在实践中应用，可以参见 [Feature Labs 的工作成果](https://www.featurelabs.com/)，这是一个开源库背后的公司。


这是一个开源库背后的公司
=> 这就是开发 featuretools 这个开源库的公司

yqian1991 · 2018-08-08T14:09:13Z

TODO1/automated-feature-engineering-in-python.md


-This process involves grouping the loans table by the client, calculating the aggregations, and then merging the resulting data into the client data. Here’s how we would do that in Python using the [language of Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html).
+这个过程包括了根据客户进行贷款表格分组、计算聚合、然后把计算结果数据合并到客户数据中。如下代码展示了我们如果使用 Python 中的 [language of Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html)库进行计算的过程：


Python 中的 language of Pandas库进行计算的过程
=>
Python 中的 language of Pandas 库进行计算的过程
库之前加一个空格

yqian1991 · 2018-08-08T14:12:16Z

TODO1/automated-feature-engineering-in-python.md


-The best way to think of a **relationship** between two tables is the [analogy of parent to child](https://stackoverflow.com/questions/7880921/what-is-a-parent-table-and-a-child-table-in-database). This is a one-to-many relationship: each parent can have multiple children. In the realm of tables, a parent table has one row for every parent, but the child table may have multiple rows corresponding to multiple children of the same parent.
+考虑两个表之间的**关系**的最佳方式是[父亲与孩子的类比](https://stackoverflow.com/questions/7880921/what-is- par-table -and- child-table-in-database)。这是一对多的关系:每个父亲可以有多个孩子。在表领域中，父亲在每个父表中都有一行，但是子表中可能有多个行对应于同一个父亲的多个孩子。


[父亲与孩子的类比](https://stackoverflow.com/questions/7880921/what-is- par-table -and- child-table-in-database)
=>
父亲与孩子的类比

这个格式不对，请查看一下

yqian1991 · 2018-08-08T14:13:13Z

TODO1/automated-feature-engineering-in-python.md


-To [formalize a relationship in featuretools](https://docs.featuretools.com/loading_data/using_entitysets.html#adding-a-relationship), we only need to specify the variable that links two tables together. The `clients` and the `loans` table are linked via the `client_id` variable and `loans` and `payments` are linked with the `loan_id`. The syntax for creating a relationship and adding it to the entityset are shown below:
+要[在 featuretools 中格式化关系](https://docs.featuretools.com/loading_data/using_entitysets.html#add -a-relationship)，我们只需指定将两个表链接在一起的变量。 `clients` 和 `loans` 表通过 `loan_id` 变量链接， `loans` 和 `payments` 通过 `loan_id` 联系在一起。创建关系并将其添加到实体集的语法如下所示:


[在 featuretools 中格式化关系](https://docs.featuretools.com/loading_data/using_entitysets.html#add -a-relationship)
=>
在 featuretools 中格式化关系

格式不对

yqian1991 · 2018-08-08T14:14:51Z

TODO1/automated-feature-engineering-in-python.md


-New features are created in featuretools using these primitives either by themselves or stacking multiple primitives. Below is a list of some of the feature primitives in featuretools (we can also [define custom primitives](https://docs.featuretools.com/guides/advanced_custom_primitives.html)):
+新特性是在 featruetools 中创建的，使用这些特征基元本身或叠加多个特征基元。下面是 featuretools 中的一些特征基元列表(我们还可以[定义自定义特征基元](https://docs.featuretools.com/guides/advanced_custom_basics .html)：


[定义自定义特征基元](https://docs.featuretools.com/guides/advanced_custom_basics .html)
=>
定义自定义特征基元

格式不对，这些格式问题应该是因为你修改了链接的格式

leviding · 2018-08-08T15:46:16Z

@mingxing47 可以修改啦

SimonMing47 · 2018-08-09T02:55:27Z

@yqian1991 感谢校对

修改完成

SimonMing47 · 2018-08-09T03:08:18Z

@fanyijihua @yqian1991 @leviding 校对修改完成

ghost · 2018-08-09T04:41:08Z

校对认领

fanyijihua · 2018-08-09T04:41:12Z

@park-ma 妥妥哒 🍻

ghost

建议代码高亮
同时译文质量很高，上一位校对的工作也很认真，几乎没有可以修改的地方。

ghost · 2018-08-09T10:50:20Z

TODO1/automated-feature-engineering-in-python.md


-Before we can quite get to deep feature synthesis, we need to understand [feature primitives](https://docs.featuretools.com/automated_feature_engineering/primitives.html). We already know what these are, but we have just been calling them by different names! These are simply the basic operations that we use to form new features:
+在深入了解特性合成之前，我们需要了解[特征基元](https://docs.featuretools.com/automated_feature_engineering/primartives.html)。我们已经知道它们是什么了，但是我们只是用不同的名字称呼它们！这些是我们用来形成新特征的基本操作:


使用全角冒号“：”

ghost · 2018-08-09T10:55:15Z

@mingxing47 @leviding 校对完成

fix bug

增加 python 代码高亮

SimonMing47 · 2018-08-09T11:56:17Z

感谢 @park-ma 的校对。
@leviding 校对修改完成，增加 python 代码高亮。

leviding · 2018-08-09T15:29:21Z

TODO1/automated-feature-engineering-in-python.md

@@ -2,109 +2,109 @@
 > * 原文作者：[William Koehrsen](https://towardsdatascience.com/@williamkoehrsen?source=post_header_lockup)
 > * 译文出自：[掘金翻译计划](https://github.com/xitu/gold-miner)
 > * 本文永久链接：[https://github.com/xitu/gold-miner/blob/master/TODO1/automated-feature-engineering-in-python.md](https://github.com/xitu/gold-miner/blob/master/TODO1/automated-feature-engineering-in-python.md)
-> * 译者：
+> * 译者：[mingxing47](https://github.com/mingxing47)
 > * 校对者：


校对者信息

已经增加校对者信息

增加校对者信息

leviding

Awesome！

leviding · 2018-08-10T06:04:32Z

@mingxing47 已经 merge 啦~ 快快麻溜发布到掘金然后给我发下链接，方便及时添加积分哟。

掘金翻译计划有自己的知乎专栏，你也可以投稿哈，推荐使用一个好用的插件。
专栏地址：https://zhuanlan.zhihu.com/juejinfanyi

leviding · 2018-08-10T06:04:57Z

@mingxing47 发布的时候放到人工智能分类吧

SimonMing47 · 2018-08-11T09:28:45Z

@leviding 你好，已经发布到掘金：https://juejin.im/post/5b6ea0e4e51d4519044adff0
知乎专栏：https://zhuanlan.zhihu.com/p/41809504

明星 and others added 6 commits June 16, 2018 17:33

初次提交

75106bf

update

1b01a9b

update

b8bb1e9

update

9dbeb57

update article

9ebe796

complete

0b64a3f

fanyijihua added the 校对认领 label Aug 7, 2018

leviding added the 后端 label Aug 7, 2018

leviding changed the title ~~Translate automated feature engineering in python~~ Python 中的自动特征工程 Aug 7, 2018

leviding mentioned this pull request Aug 7, 2018

使用 Python 进行自动化特征工程 #3990

Closed

leviding changed the title ~~Python 中的自动特征工程~~ 使用 Python 进行自动化特征工程 Aug 7, 2018

fanyijihua added the 正在校对 label Aug 7, 2018

yqian1991 suggested changes Aug 8, 2018

View reviewed changes

校对修改完成

8665cff

修改完成

fanyijihua removed the 校对认领 label Aug 9, 2018

ghost reviewed Aug 9, 2018

View reviewed changes

leviding added the enhancement 等待译者修改 label Aug 9, 2018

SimonMing47 added 2 commits August 9, 2018 19:52

fix

70a51c0

fix bug

增加代码高亮

1bf272b

增加 python 代码高亮

leviding added the 标注待管理员 Review label Aug 9, 2018

leviding removed the enhancement 等待译者修改 label Aug 9, 2018

leviding reviewed Aug 9, 2018

View reviewed changes

SimonMing47 added 2 commits August 10, 2018 09:52

增加校对者信息

89b2cc9

增加校对者信息

Update automated-feature-engineering-in-python.md

3c15a0a

leviding added AI and removed 后端 labels Aug 10, 2018

leviding approved these changes Aug 10, 2018

View reviewed changes

leviding merged commit 08ae3a8 into xitu:master Aug 10, 2018

leviding added 翻译完成 and removed 标注待管理员 Review labels Aug 10, 2018

leviding removed the 正在校对 label Aug 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用 Python 进行自动化特征工程 #4262

使用 Python 进行自动化特征工程 #4262

SimonMing47 commented Aug 7, 2018 •

edited by leviding

Loading

SimonMing47 commented Aug 7, 2018 •

edited

Loading

yqian1991 commented Aug 7, 2018

fanyijihua commented Aug 7, 2018

yqian1991 left a comment •

edited

Loading

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

yqian1991 Aug 8, 2018

leviding commented Aug 8, 2018

SimonMing47 commented Aug 9, 2018

SimonMing47 commented Aug 9, 2018

ghost commented Aug 9, 2018

fanyijihua commented Aug 9, 2018

ghost left a comment •

edited by ghost

Loading

ghost Aug 9, 2018

ghost commented Aug 9, 2018

SimonMing47 commented Aug 9, 2018

leviding Aug 9, 2018

SimonMing47 Aug 10, 2018

leviding left a comment

leviding commented Aug 10, 2018

leviding commented Aug 10, 2018

SimonMing47 commented Aug 11, 2018


		[Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering), also known as feature creation, is the process of constructing new features from existing data to train a machine learning model. This step can be more important than the actual model used because a machine learning algorithm only learns from the data we give it, and creating features that are relevant to a task is absolutely crucial (see the excellent paper [“A Few Useful Things to Know about Machine Learning”](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)).
		[特征工程](https://en.wikipedia.org/wiki/Feature_engineering)，也成为特征创建，是从已有数据中创建出新特征并且用于训练机器学习模型的过程。这个步骤可能要比实际使用的模型更加重要，因为机器学习算法仅仅从我们提供给他的数据中进行学习，创建出与任务相关的特征是非常关键的（可以参照这篇文章 ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) —— 《了解机器学习的一些有用的事》，译者注）。


		[Feature engineering](https://www.datacamp.com/community/tutorials/feature-engineering-kaggle) means building additional features out of existing data which is often spread across multiple related tables. Feature engineering requires extracting the relevant information from the data and getting it into a single table which can then be used to train a machine learning model.
		[特征工程](https://www.datacamp.com/community/tutorials/feature-engineering-kaggle)意味着从分布在多个相关表格中的现有数据集中构建出额外的特性。特征工程需要从数据中提取相关信息，并且将其放入一个单独的表中，然后可以用来训练机器学习模型。


		The process of constructing features is very time-consuming because each new feature usually requires several steps to build, especially when using information from more than one table. We can group the operations of feature creation into two categories: transformations and aggregations. Let’s look at a few examples to see these concepts in action.
		构建特征的过程非常耗时，因为每获取一项新的特征都需要很多步骤才能构建出来，尤其是当需要从多于一张表格中获取信息时。我们可以把特征创建的操作分成两类：转换和聚集。让我们通过几个例子的实战来看看这些概念。


		For more information on featuretools, including advanced usage, check out the [online documentation](https://docs.featuretools.com/). To see how featuretools is used in practice, read about the [work of Feature Labs](https://www.featurelabs.com/), the company behind the open-source library.
		要获取更多关于特征工具的信息，包括这些工具的高级用法，可以查阅[在线文档](https://docs.featuretools.com/)。要查看特征工具如何在实践中应用，可以参见 [Feature Labs 的工作成果](https://www.featurelabs.com/)，这是一个开源库背后的公司。


		This process involves grouping the loans table by the client, calculating the aggregations, and then merging the resulting data into the client data. Here’s how we would do that in Python using the [language of Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html).
		这个过程包括了根据客户进行贷款表格分组、计算聚合、然后把计算结果数据合并到客户数据中。如下代码展示了我们如果使用 Python 中的 [language of Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html)库进行计算的过程：


		The best way to think of a relationship between two tables is the [analogy of parent to child](https://stackoverflow.com/questions/7880921/what-is-a-parent-table-and-a-child-table-in-database). This is a one-to-many relationship: each parent can have multiple children. In the realm of tables, a parent table has one row for every parent, but the child table may have multiple rows corresponding to multiple children of the same parent.
		考虑两个表之间的关系的最佳方式是[父亲与孩子的类比](https://stackoverflow.com/questions/7880921/what-is- par-table -and- child-table-in-database)。这是一对多的关系:每个父亲可以有多个孩子。在表领域中，父亲在每个父表中都有一行，但是子表中可能有多个行对应于同一个父亲的多个孩子。


		To [formalize a relationship in featuretools](https://docs.featuretools.com/loading_data/using_entitysets.html#adding-a-relationship), we only need to specify the variable that links two tables together. The `clients` and the `loans` table are linked via the `client_id` variable and `loans` and `payments` are linked with the `loan_id`. The syntax for creating a relationship and adding it to the entityset are shown below:
		要[在 featuretools 中格式化关系](https://docs.featuretools.com/loading_data/using_entitysets.html#add -a-relationship)，我们只需指定将两个表链接在一起的变量。 `clients` 和 `loans` 表通过 `loan_id` 变量链接， `loans` 和 `payments` 通过 `loan_id` 联系在一起。创建关系并将其添加到实体集的语法如下所示:


		New features are created in featuretools using these primitives either by themselves or stacking multiple primitives. Below is a list of some of the feature primitives in featuretools (we can also [define custom primitives](https://docs.featuretools.com/guides/advanced_custom_primitives.html)):
		新特性是在 featruetools 中创建的，使用这些特征基元本身或叠加多个特征基元。下面是 featuretools 中的一些特征基元列表(我们还可以[定义自定义特征基元](https://docs.featuretools.com/guides/advanced_custom_basics .html)：


		Before we can quite get to deep feature synthesis, we need to understand [feature primitives](https://docs.featuretools.com/automated_feature_engineering/primitives.html). We already know what these are, but we have just been calling them by different names! These are simply the basic operations that we use to form new features:
		在深入了解特性合成之前，我们需要了解[特征基元](https://docs.featuretools.com/automated_feature_engineering/primartives.html)。我们已经知道它们是什么了，但是我们只是用不同的名字称呼它们！这些是我们用来形成新特征的基本操作:

使用 Python 进行自动化特征工程 #4262

使用 Python 进行自动化特征工程 #4262

Conversation

SimonMing47 commented Aug 7, 2018 • edited by leviding Loading

SimonMing47 commented Aug 7, 2018 • edited Loading

yqian1991 commented Aug 7, 2018

fanyijihua commented Aug 7, 2018

yqian1991 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leviding commented Aug 8, 2018

SimonMing47 commented Aug 9, 2018

SimonMing47 commented Aug 9, 2018

ghost commented Aug 9, 2018

fanyijihua commented Aug 9, 2018

ghost left a comment • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Aug 9, 2018

SimonMing47 commented Aug 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leviding left a comment

Choose a reason for hiding this comment

leviding commented Aug 10, 2018

leviding commented Aug 10, 2018

SimonMing47 commented Aug 11, 2018

SimonMing47 commented Aug 7, 2018 •

edited by leviding

Loading

SimonMing47 commented Aug 7, 2018 •

edited

Loading

yqian1991 left a comment •

edited

Loading

ghost left a comment •

edited by ghost

Loading