-
Notifications
You must be signed in to change notification settings - Fork 0
/
Project_checklist.html
405 lines (343 loc) · 26.8 KB
/
Project_checklist.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
<!DOCTYPE html>
<html>
<head>
<title>ML_project_checklist</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="file:///C:\Users\shadu\.vscode\extensions\shd101wyy.markdown-preview-enhanced-0.3.5\node_modules\@shd101wyy\mume\dependencies\katex\katex.min.css">
<style>
/**
* prism.js Github theme based on GitHub's theme.
* @author Sam Clarke
*/
code[class*="language-"],
pre[class*="language-"] {
color: #333;
background: none;
font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
text-align: left;
white-space: pre;
word-spacing: normal;
word-break: normal;
word-wrap: normal;
line-height: 1.4;
-moz-tab-size: 8;
-o-tab-size: 8;
tab-size: 8;
-webkit-hyphens: none;
-moz-hyphens: none;
-ms-hyphens: none;
hyphens: none;
}
/* Code blocks */
pre[class*="language-"] {
padding: .8em;
overflow: auto;
/* border: 1px solid #ddd; */
border-radius: 3px;
/* background: #fff; */
background: #f5f5f5;
}
/* Inline code */
:not(pre) > code[class*="language-"] {
padding: .1em;
border-radius: .3em;
white-space: normal;
background: #f5f5f5;
}
.token.comment,
.token.blockquote {
color: #969896;
}
.token.cdata {
color: #183691;
}
.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
color: #333;
}
.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
color: #a71d5d;
}
.token.string,
.token.url,
.token.regex,
.token.attr-value {
color: #183691;
}
.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
color: #0086b3;
}
.token.tag,
.token.selector,
.token.prolog {
color: #63a35c;
}
.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
color: #795da3;
}
.token.entity {
cursor: help;
}
.token.title,
.token.title .token.punctuation {
font-weight: bold;
color: #1d3e81;
}
.token.list {
color: #ed6a43;
}
.token.inserted {
background-color: #eaffea;
color: #55a532;
}
.token.deleted {
background-color: #ffecec;
color: #bd2c00;
}
.token.bold {
font-weight: bold;
}
.token.italic {
font-style: italic;
}
/* JSON */
.language-json .token.property {
color: #183691;
}
.language-markup .token.tag .token.punctuation {
color: #333;
}
/* CSS */
code.language-css,
.language-css .token.function {
color: #0086b3;
}
/* YAML */
.language-yaml .token.atrule {
color: #63a35c;
}
code.language-yaml {
color: #183691;
}
/* Ruby */
.language-ruby .token.function {
color: #333;
}
/* Markdown */
.language-markdown .token.url {
color: #795da3;
}
/* Makefile */
.language-makefile .token.symbol {
color: #795da3;
}
.language-makefile .token.variable {
color: #183691;
}
.language-makefile .token.builtin {
color: #0086b3;
}
/* Bash */
.language-bash .token.keyword {
color: #0086b3;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% - 300px);padding:2em calc(50% - 457px - 150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/* https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
</style>
</head>
<body for="html-export">
<div class="mume markdown-preview ">
<div><h1 class="mume-header" id="machine-learning-project-checklist">Machine Learning Project Checklist</h1>
<p>This checklist can guide you through your Machine Learning projects. There are<br>
eight main steps:</p>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Frame the problem and look at the big picture.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Get the data.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Explore the data to gain insights.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Prepare the data to better expose the underlying data patterns to Machine learning algorithms.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Explore many different models and short-list the best ones.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Fine-tune your models and combine them into a great solution.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Present your solution.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Launch, monitor, and maintain your system.</li>
</ul>
<h2 class="mume-header" id="frame-the-problem-and-look-at-the-big-picture">Frame the Problem and Look at the Big Picture</h2>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Define the objective in business terms.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> How will your solution be used?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> What are the current solutions/workarounds (if any)?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> How should you frame this problem (supervised/unsupervised, online/offline,<br>
etc.)?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> How should performance be measured?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Is the performance measure aligned with the business objective?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> What would be the minimum performance needed to reach the business objective?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> What are comparable problems? Can you reuse experience or tools?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Is human expertise available?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> How would you solve the problem manually?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> List the assumptions you (or others) have made so far.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Verify assumptions if possible.</li>
</ul>
<h2 class="mume-header" id="get-the-data">Get the Data</h2>
<p>Note: automate as much as possible so you can easily get fresh data.</p>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> List the data you need and how much you need.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Find and document where you can get that data.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Check how much space it will take.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Check legal obligations, and get authorization if necessary.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Get access authorizations.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Create a workspace (with enough storage space).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Get the data.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Convert the data to a format you can easily manipulate (without changing the<br>
data itself).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Ensure sensitive information is deleted or protected (e.g., anonymized).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Check the size and type of data (time series, sample, geographical, etc.).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Sample a test set, put it aside, and never look at it (no data snooping!).</li>
</ul>
<h2 class="mume-header" id="explore-the-data">Explore the Data</h2>
<p>Note: try to get insights from a field expert for these steps.</p>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Create a copy of the data for exploration (sampling it down to a manageable size<br>
if necessary).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Create a Jupyter notebook to keep a record of your data exploration.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Study each attribute and its characteristics:
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Name</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Type (categorical, int/float, bounded/unbounded, text, structured, etc.)</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> % of missing values</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Noisiness and type of noise (stochastic, outliers, rounding errors, etc.)</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Possibly useful for the task?</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Type of distribution (Gaussian, uniform, logarithmic, etc.)</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> For supervised learning tasks, identify the target attribute(s).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Visualize the data.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Study the correlations between attributes.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Study how you would solve the problem manually.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Identify the promising transformations you may want to apply.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Identify extra data that would be useful (go back to "Get the Data").</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Document what you have learned.</li>
</ul>
<h2 class="mume-header" id="prepare-the-data">Prepare the Data</h2>
<p>Notes:</p>
<ul>
<li>Work on copies of the data (keep the original dataset intact).</li>
<li>Write functions for all data transformations you apply, for five reasons:
<ul>
<li>So you can easily prepare the data the next time you get a fresh dataset</li>
<li>So you can apply these transformations in future projects</li>
<li>To clean and prepare the test set</li>
<li>To clean and prepare new data instances once your solution is live</li>
<li>To make it easy to treat your preparation choices as hyperparameters</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Data cleaning:
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Fix or remove outliers (optional).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Fill in missing values (e.g., with zero, mean, median…) or drop their rows (or columns).</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Feature selection (optional):
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Drop the attributes that provide no useful information for the task.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Feature engineering, where appropriate:
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Discretize continuous features.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Decompose features (e.g., categorical, date/time, etc.).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Add promising transformations of features (e.g., log(x), sqrt(x), x^- [ ] etc.).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Aggregate features into promising new features.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Feature scaling: standardize or normalize features.</li>
</ul>
<h2 class="mume-header" id="short-list-promising-models">Short-List Promising Models</h2>
<p>Notes:</p>
<ul>
<li>If the data is huge, you may want to sample smaller training sets so you can train many different models in a reasonable time (be aware that this penalizes complex models such as large neural nets or Random Forests).</li>
<li>Once again, try to automate these steps as much as possible.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Train many quick and dirty models from different categories (e.g., linear, naive Bayes, SVM, Random Forests, neural net, etc.) using standard parameters.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Measure and compare their performance.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> For each model, use N-fold cross-validation and compute the mean and standard deviation of the performance measure on the N folds.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Analyze the most significant variables for each algorithm.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Analyze the types of errors the models make.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> What data would a human have used to avoid these errors?</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Have a quick round of feature selection and engineering.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Have one or two more quick iterations of the five previous steps.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Short-list the top three to five most promising models, preferring models that make different types of errors.</li>
</ul>
<h2 class="mume-header" id="fine-tune-the-system">Fine-Tune the System</h2>
<p>Notes:</p>
<ul>
<li>You will want to use as much data as possible for this step, especially as you move toward the end of fine-tuning.</li>
<li>As always automate what you can.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Fine-tune the hyperparameters using cross-validation.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Treat your data transformation choices as hyperparameters, especially when you are not sure about them (e.g., should I replace missing values with zero or with the median value? Or just drop the rows?).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Unless there are very few hyperparameter values to explore, prefer random search over grid search. If training is very long, you may prefer a Bayesian optimization approach.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Try Ensemble methods. Combining your best models will often perform better than running them individually.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Once you are confident about your final model, measure its performance on the test set to estimate the generalization error.</li>
</ul>
<h2 class="mume-header" id="present-your-solution">Present Your Solution</h2>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Document what you have done.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Create a nice presentation.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Make sure you highlight the big picture first.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Explain why your solution achieves the business objective.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Don't forget to present interesting points you noticed along the way.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Describe what worked and what did not.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> List your assumptions and your system's limitations.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Ensure your key findings are communicated through beautiful visualizations or easy-to-remember statements (e.g., "the median income is the number-one predictor of housing prices").</li>
</ul>
<h2 class="mume-header" id="launch">Launch!</h2>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Get your solution ready for production (plug into production data inputs, write unit tests, etc.).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Beware of slow degradation too: models tend to "rot" as data evolves.</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Measuring performance may require a human pipeline (e.g., via a crowdsourcing service).</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Also monitor your inputs' quality (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale). This is particularly important for online learning systems.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Retrain your models on a regular basis on fresh data (automate as much as possible).</li>
</ul>
</div>
</div>
</body>
</html>