diff --git a/.gitignore b/.gitignore
index 3013a7b..62d2a82 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,7 +3,12 @@
 .RData
 .Ruserdata
 *_cache
-*_files
+highdim/*_files
+inference/*_files
+linear-models/*_files
+ml/*_files
+prob/*_files
+summaries/*_files
 .DS_Store
 copy-qmds.R
 crossref.sh
diff --git a/_quarto.yml b/_quarto.yml
index 086101e..560e94e 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -71,7 +71,7 @@ book:
         - ml/evaluation-metrics.qmd
         - ml/conditionals.qmd
         - ml/smoothing.qmd
-        - ml/cross-validation.qmd
+        - ml/resampling-methods.qmd
         - ml/algorithms.qmd
         - ml/ml-in-practice.qmd
         - ml/clustering.qmd
diff --git a/docs/highdim/dimension-reduction.html b/docs/highdim/dimension-reduction.html
index 419a92a..ca2c60b 100644
--- a/docs/highdim/dimension-reduction.html
+++ b/docs/highdim/dimension-reduction.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -512,10 +512,10 @@ <h1 class="title">
 z_1 = r \cos(\phi+ \theta), \,\,
 z_2 = r \sin(\phi + \theta)
 \]</span></p>
-<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="dimension-reduction_cache/html/unnamed-chunk-6_36a1101bac0d2263a8c9c7ad10c2b7cf">
+<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="dimension-reduction_cache/html/rotation-diagram_3dc61c8ea2294216bf8f34e4ca40807e">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="dimension-reduction_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="dimension-reduction_files/figure-html/rotation-diagram-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -528,10 +528,10 @@ <h1 class="title">
 \end{aligned}
 \]</span></p>
 <p>Now we can rotate each point in the dataset by simply applying the formula above to each pair <span class="math inline">\((x_{i,1}, x_{i,2})^\top\)</span>. Here is what the twin standardized heights look like after rotating each point by <span class="math inline">\(-45\)</span> degrees:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-7_ec3922bea5a5849b039c849692cbf216">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/before-after-rotation_59ccb2c09ac018768456e0bfa0b6512a">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="dimension-reduction_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="dimension-reduction_files/figure-html/before-after-rotation-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -591,12 +591,12 @@ <h1 class="title">
 .
 \]</span></p>
 <p>If we define:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-8_bdfd4e301923ae40156dafc749dba25f">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-6_96c2c07e063a866114e5d98d53abfb1c">
 <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">theta</span> <span class="op">&lt;-</span> <span class="fl">2</span><span class="op">*</span><span class="va">pi</span><span class="op">*</span><span class="op">-</span><span class="fl">45</span><span class="op">/</span><span class="fl">360</span> <span class="co">#convert to radians</span></span>
 <span><span class="va">A</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="op">-</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span><span class="op">)</span>, <span class="fl">2</span>, <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can write code implementing a rotation by any angle <span class="math inline">\(\theta\)</span> using linear algebra:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-9_a48f8b6461c427a3e856769a27e463f2">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-7_e9eb0132b224176ec37984f16d3164cd">
 <div class="sourceCode" id="cb7"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">rotate</span> <span class="op">&lt;-</span> <span class="kw">function</span><span class="op">(</span><span class="va">x</span>, <span class="va">theta</span><span class="op">)</span><span class="op">{</span></span>
 <span>  <span class="va">theta</span> <span class="op">&lt;-</span> <span class="fl">2</span><span class="op">*</span><span class="va">pi</span><span class="op">*</span><span class="va">theta</span><span class="op">/</span><span class="fl">360</span></span>
 <span>  <span class="va">A</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="op">-</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span><span class="op">)</span>, <span class="fl">2</span>, <span class="fl">2</span><span class="op">)</span></span>
@@ -623,7 +623,7 @@ <h1 class="title">
 \mathbf{Z} \mathbf{A}^\top = \mathbf{X} \mathbf{A}\mathbf{A}^\top\ = \mathbf{X}
 \]</span></p>
 <p>and therefore that <span class="math inline">\(\mathbf{A}^\top\)</span> is the inverse of <span class="math inline">\(\mathbf{A}\)</span>. This also implies that all the information in <span class="math inline">\(\mathbf{X}\)</span> is included in the rotation <span class="math inline">\(\mathbf{Z}\)</span>, and it can be retrieved via a linear transformation. A consequence is that for any rotation the distances are preserved. Here is an example for a 30 degree rotation, although it works for any angle:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-10_62fdcffb4685a9e5396429061e94fa71">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-8_112260767f8b668c4fd504d0cffdd9f8">
 <div class="sourceCode" id="cb8"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/all.equal.html">all.equal</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="fu">rotate</span><span class="op">(</span><span class="va">x</span>, <span class="fl">30</span><span class="op">)</span><span class="op">)</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] TRUE</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -654,7 +654,7 @@ <h1 class="title">
 <p>Note that if <span class="math inline">\(\mathbf{A} \mathbf{A} ^\top= \mathbf{I}\)</span>, then the distance between the <span class="math inline">\(h\)</span>th and <span class="math inline">\(i\)</span>th rows is the same for the original and transformed data.</p>
 <p>We refer to transformation with the property <span class="math inline">\(\mathbf{A} \mathbf{A}^\top = \mathbf{I}\)</span> as <em>orthogonal transformations</em>. These are guaranteed to preserve the distance between any two points.</p>
 <p>We previously demonstrated our rotation has this property. We can confirm using R:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-11_9dfc0f8bea47b9d41340743d7851c362">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-9_3ce05bc0152a7c19d913d91e646afcda">
 <div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">A</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">A</span><span class="op">)</span></span>
 <span><span class="co">#&gt;          [,1]     [,2]</span></span>
 <span><span class="co">#&gt; [1,] 1.00e+00 1.01e-17</span></span>
@@ -665,7 +665,7 @@ <h1 class="title">
 \sum_{1=1}^n ||\mathbf{z}_i||^2 = \sum_{i=1}^n ||\mathbf{A}^\top\mathbf{x}_i||^2 = \sum_{i=1}^n \mathbf{x}_i^\top \mathbf{A}\mathbf{A}^\top  \mathbf{x}_i = \sum_{i=1}^n \mathbf{x}_i^\top\mathbf{x}_i = \sum_{i=1}^n||\mathbf{x}_i||^2
 \]</span></p>
 <p>We can confirm using R:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-12_7fe28c526f3c1ba9aa2377a1da827228">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-10_bf181a82edb2b6c503bc12b94192c019">
 <div class="sourceCode" id="cb10"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">theta</span> <span class="op">&lt;-</span> <span class="op">-</span><span class="fl">45</span></span>
 <span><span class="va">z</span> <span class="op">&lt;-</span> <span class="fu">rotate</span><span class="op">(</span><span class="va">x</span>, <span class="va">theta</span><span class="op">)</span> <span class="co"># works for any theta</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">x</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
@@ -675,12 +675,12 @@ <h1 class="title">
 </div>
 <p>This can be interpreted as a consequence of the fact that an orthogonal transformation guarantees that all the information is preserved.</p>
 <p>However, although the total is preserved, the sum of squares for the individual columns changes. Here we compute the proportion of TSS attributed to each column, referred to as the <em>variance explained</em> or <em>variance captured</em> by each column, for <span class="math inline">\(\mathbf{X}\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-13_a3753161e3f32bbf1bd51a07d292df47">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-11_26c39a43b56d5af32c10b20b48572745">
 <div class="sourceCode" id="cb11"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="va">x</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">x</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 0.5 0.5</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>and <span class="math inline">\(\mathbf{Z}\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-14_35e99d6d581b5e36c133be2c83edd39b">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-12_c4ca05ae6f1fbd5315a21bdba4901033">
 <div class="sourceCode" id="cb12"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="va">z</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">z</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 0.9848 0.0152</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -697,14 +697,14 @@ <h1 class="title">
 </div>
 </div>
 </div>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-15_e5d60be922e3e128e5d09e831faefbac">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-13_978e8fbcd3ff60a7cb56eaca49fe4a19">
 <div class="sourceCode" id="cb13"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">angles</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="fl">0</span>, <span class="op">-</span><span class="fl">90</span><span class="op">)</span></span>
 <span><span class="va">v</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">sapply</a></span><span class="op">(</span><span class="va">angles</span>, <span class="kw">function</span><span class="op">(</span><span class="va">angle</span><span class="op">)</span> <span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="fu">rotate</span><span class="op">(</span><span class="va">x</span>, <span class="va">angle</span><span class="op">)</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="va">variance_explained</span> <span class="op">&lt;-</span> <span class="va">v</span><span class="op">[</span><span class="fl">1</span>,<span class="op">]</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">x</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">angles</span>, <span class="va">variance_explained</span>, type <span class="op">=</span> <span class="st">"l"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We find that a -45 degree rotation appears to achieve the maximum, with over 98% of the total variability explained by the first dimension. We denote this rotation matrix with <span class="math inline">\(\mathbf{V}\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-16_34062cb915280540a60eb5012b06f8e3">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-14_54aa238bd4a8c59e5688a4ebc949dd1d">
 <div class="sourceCode" id="cb14"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">theta</span> <span class="op">&lt;-</span> <span class="fl">2</span><span class="op">*</span><span class="va">pi</span><span class="op">*</span><span class="op">-</span><span class="fl">45</span><span class="op">/</span><span class="fl">360</span> <span class="co">#convert to radians</span></span>
 <span><span class="va">V</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="op">-</span><span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">sin</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Trig.html">cos</a></span><span class="op">(</span><span class="va">theta</span><span class="op">)</span><span class="op">)</span>, <span class="fl">2</span>, <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -712,11 +712,11 @@ <h1 class="title">
 <p><span class="math display">\[
 \mathbf{Z} = \mathbf{X}\mathbf{V}
 \]</span></p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-17_31c97fc2a10e463c36449b0ff6e6fc22">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-15_2ecb3a393677dd00fe07fa0636e58bb6">
 <div class="sourceCode" id="cb15"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">z</span> <span class="op">&lt;-</span> <span class="va">x</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="va">V</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The following animation further illustrates how different rotations affect the variability explained by the dimensions of the rotated data:</p>
-<div class="cell" data-layout-align="center" data-fig.asp="1" data-hash="dimension-reduction_cache/html/unnamed-chunk-18_b54098faa3f163e5bb8a6b99f3fd1b1f">
+<div class="cell" data-layout-align="center" data-fig.asp="1" data-hash="dimension-reduction_cache/html/unnamed-chunk-16_dc64f405cbbb972fa45c5f6545b323a5">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/pca.gif" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -734,15 +734,15 @@ <h1 class="title">
 </div>
 </div>
 <p>We also notice that the two groups, adults and children, can be clearly observed with the one number summary, better than with any of the two original dimensions.</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-19_303af56ade04d757293092fd69b0b753">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/histograms-of-dimensions_89c906ae089a74d98b39d9b2574b037f">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="dimension-reduction_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="dimension-reduction_files/figure-html/histograms-of-dimensions-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
 </div>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-20_291d076a45659fe51993b49f74b5d46c">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-17_9f671b6d095a452d5cbb74dccf821635">
 <div class="sourceCode" id="cb16"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/hist.html">hist</a></span><span class="op">(</span><span class="va">x</span><span class="op">[</span>,<span class="fl">1</span><span class="op">]</span>, breaks <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="op">-</span><span class="fl">4</span>,<span class="fl">4</span>,<span class="fl">0.5</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/hist.html">hist</a></span><span class="op">(</span><span class="va">x</span><span class="op">[</span>,<span class="fl">2</span><span class="op">]</span>, breaks <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="op">-</span><span class="fl">4</span>,<span class="fl">4</span>,<span class="fl">0.5</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/hist.html">hist</a></span><span class="op">(</span><span class="va">z</span><span class="op">[</span>,<span class="fl">1</span><span class="op">]</span>, breaks <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="op">-</span><span class="fl">4</span>,<span class="fl">4</span>,<span class="fl">0.5</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -780,25 +780,25 @@ <h1 class="title">
 </div>
 </div>
 <p>In R, we can find the principal components of any matrix with the function <code>prcomp</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-21_afcd7a3c1e76569f9fd76aeef00a6e35">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-18_2841d2fb66011f26a7ebbb9651ef97fb">
 <div class="sourceCode" id="cb17"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">x</span>, center <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Keep in mind that default behavior is to center the columns of <code>x</code> before computing the PCs, an operation we don’t need because our matrix is scaled.</p>
 <p>The object <code>pca</code> includes the rotated data <span class="math inline">\(Z\)</span> in <code>pca$x</code> and the rotation <span class="math inline">\(\mathbf{V}\)</span> in <code>pca$rotation</code>.</p>
 <p>We can see that columns of the <code>pca$rotation</code> are indeed the rotation obtained with -45 (remember the sign is arbitrary):</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-22_c9f78cfc96d17ac6bc033d49679a522a">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-19_d0eea11c6881d72f22c965a5ba5d6acf">
 <div class="sourceCode" id="cb18"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span><span class="op">$</span><span class="va">rotation</span></span>
 <span><span class="co">#&gt;         PC1    PC2</span></span>
 <span><span class="co">#&gt; [1,] -0.707  0.707</span></span>
 <span><span class="co">#&gt; [2,] -0.707 -0.707</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The square root of the variation of each column is included in the <code>pca$sdev</code> component. This implies we can compute the variance explained by each PC using:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-23_15974e83b82dff0a7071950582410bd7">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-20_67836fbd5f0420fab04b3d0e5e78ec87">
 <div class="sourceCode" id="cb19"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 0.9848 0.0152</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The function <code>summary</code> performs this calculation for us:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-24_aa60675815d8666aa30a21aadd62680b">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-21_3dbc23c50d0fae624720237092d902ff">
 <div class="sourceCode" id="cb20"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="va">pca</span><span class="op">)</span></span>
 <span><span class="co">#&gt; Importance of components:</span></span>
 <span><span class="co">#&gt;                          PC1    PC2</span></span>
@@ -807,7 +807,7 @@ <h1 class="title">
 <span><span class="co">#&gt; Cumulative Proportion  0.985 1.0000</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We also see that we can rotate <code>x</code> (<span class="math inline">\(\mathbf{X}\)</span>) and <code>pca$x</code> (<span class="math inline">\(\mathbf{Z}\)</span>) as explained with the mathematical formulas above:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-25_c27cbe175f6b086013932dcbb68bd563">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-22_b921f08df51e73647a750fa8d0257fea">
 <div class="sourceCode" id="cb21"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/all.equal.html">all.equal</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">x</span>, <span class="va">x</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="va">pca</span><span class="op">$</span><span class="va">rotation</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] TRUE</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/all.equal.html">all.equal</a></span><span class="op">(</span><span class="va">x</span>, <span class="va">pca</span><span class="op">$</span><span class="va">x</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">rotation</span><span class="op">)</span><span class="op">)</span></span>
@@ -818,14 +818,14 @@ <h1 class="title">
 <section id="iris-example" class="level3" data-number="22.6.1"><h3 data-number="22.6.1" class="anchored" data-anchor-id="iris-example">
 <span class="header-section-number">22.6.1</span> Iris example</h3>
 <p>The iris data is a widely used example in data analysis courses. It includes four botanical measurements related to three flower species:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-26_6c67c66694c1be7027e48a473fe2b4a8">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-23_d80212e9db175a9638efb00428bd0bc9">
 <div class="sourceCode" id="cb22"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/names.html">names</a></span><span class="op">(</span><span class="va">iris</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" </span></span>
 <span><span class="co">#&gt; [5] "Species"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>If you print <code>iris$Species</code>, you will see that the data is ordered by the species.</p>
 <p>If we visualize the distances, we can clearly see the three species with one species very different from the other two:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-27_459d89fe116cecfbc47db68aae054166">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-24_2fa34ded6d95e7f5cfa8162f02f4a9c5">
 <div class="sourceCode" id="cb23"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">x</span> <span class="op">&lt;-</span> <span class="va">iris</span><span class="op">[</span>,<span class="fl">1</span><span class="op">:</span><span class="fl">4</span><span class="op">]</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="op">)</span></span>
 <span><span class="va">d</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/image.html">image</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="va">d</span><span class="op">)</span>, col <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu">RColorBrewer</span><span class="fu">::</span><span class="fu"><a href="https://rdrr.io/pkg/RColorBrewer/man/ColorBrewer.html">brewer.pal</a></span><span class="op">(</span><span class="fl">9</span>, <span class="st">"RdBu"</span><span class="op">)</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -839,7 +839,7 @@ <h1 class="title">
 </div>
 </div>
 <p>Our features matrix has four dimensions, but three are very correlated:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-28_dbd84116a1de10fd8248c47862efb994">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-25_52fd7ad401d5de337fe74b8edffbabf0">
 <div class="sourceCode" id="cb24"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
 <span><span class="co">#&gt;              Sepal.Length Sepal.Width Petal.Length Petal.Width</span></span>
 <span><span class="co">#&gt; Sepal.Length        1.000      -0.118        0.872       0.818</span></span>
@@ -848,7 +848,7 @@ <h1 class="title">
 <span><span class="co">#&gt; Petal.Width         0.818      -0.366        0.963       1.000</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>If we apply PCA, we should be able to approximate this distance with just two dimensions, compressing the highly correlated dimensions. Using the <code>summary</code> function, we can see the variability explained by each PC:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-29_a78b71e6520b2eb652e8b7dcc6138a9c">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-26_eaf5bc75ff6d8eb10af894069bf7048c">
 <div class="sourceCode" id="cb25"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="va">pca</span><span class="op">)</span></span>
 <span><span class="co">#&gt; Importance of components:</span></span>
@@ -858,7 +858,7 @@ <h1 class="title">
 <span><span class="co">#&gt; Cumulative Proportion  0.925 0.9777 0.9948 1.00000</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The first two dimensions account for almost 98% of the variability. Thus, we should be able to approximate the distance very well with two dimensions. We confirm this by computing the distance from first two dimensions and comparing to the original:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-30_bb010a87bc3fa9c0ea7d8c05fb9eb288">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-27_46fd915d308cd87db1c29f44a117cd39">
 <div class="sourceCode" id="cb26"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">d_approx</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">x</span><span class="op">[</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">2</span><span class="op">]</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">d</span>, <span class="va">d_approx</span><span class="op">)</span>; <span class="fu"><a href="https://rdrr.io/r/graphics/abline.html">abline</a></span><span class="op">(</span><span class="fl">0</span>, <span class="fl">1</span>, col <span class="op">=</span> <span class="st">"red"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -898,13 +898,13 @@ <h1 class="title">
 <span class="header-section-number">22.6.2</span> MNIST example</h3>
 <p>The written digits example has 784 features. Is there any room for data reduction? We will use PCA to answer this.</p>
 <p>If not already loaded, let’s begin by loading the data:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-31_78511afa978d2cccb03a5de5e5b7bcbe">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-28_05d795a448e558e383d1696c2f11e6b9">
 <div class="sourceCode" id="cb28"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
 <span><span class="kw">if</span> <span class="op">(</span><span class="op">!</span><span class="fu"><a href="https://rdrr.io/r/base/exists.html">exists</a></span><span class="op">(</span><span class="st">"mnist"</span><span class="op">)</span><span class="op">)</span> <span class="va">mnist</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/dslabs/man/read_mnist.html">read_mnist</a></span><span class="op">(</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Because the pixels are so small, we expect pixels close to each other on the grid to be correlated, meaning that dimension reduction should be possible.</p>
 <p>Let’s compute the PCs. This will take a few seconds as it is a rather large matrix:</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-32_2f35022659a91252654daaa884476837">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-29_a6f6d75bf41708419a7b22428de4fecb">
 <div class="sourceCode" id="cb29"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">mnist</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">images</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/mnist-pca-variance-explained_fb59ce36bb026e4a670439e5f13ffdf3">
@@ -915,7 +915,7 @@ <h1 class="title">
 </div>
 </div>
 </div>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-33_97b9cabc0224dd55acb54c61bd4ecbdb">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-30_4d9983b1b0299af97a3e55e83bbed9a7">
 <div class="sourceCode" id="cb30"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span>, xlab <span class="op">=</span> <span class="st">"PC"</span>, ylab <span class="op">=</span> <span class="st">"Variance explained"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can see that the first few PCs already explain a large percent of the variability.</p>
@@ -958,7 +958,7 @@ <h1 class="title">
 </section></section><section id="exercises" class="level2" data-number="22.7"><h2 data-number="22.7" class="anchored" data-anchor-id="exercises">
 <span class="header-section-number">22.7</span> Exercises</h2>
 <p>1. We want to explore the <code>tissue_gene_expression</code> predictors by plotting them.</p>
-<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-34_07aa2f067539c8ae29210018fbfbe9b7">
+<div class="cell" data-layout-align="center" data-hash="dimension-reduction_cache/html/unnamed-chunk-31_2ccdcf41be153d18ed4a0a339dee6e56">
 <div class="sourceCode" id="cb31"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/dim.html">dim</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">x</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We hope to get an idea of which observations are close to each other, but the predictors are 500-dimensional so plotting is difficult. Plot the first two principal components with color representing tissue type.</p>
diff --git a/docs/highdim/dimension-reduction_files/figure-html/unnamed-chunk-7-1.png b/docs/highdim/dimension-reduction_files/figure-html/before-after-rotation-1.png
similarity index 100%
rename from docs/highdim/dimension-reduction_files/figure-html/unnamed-chunk-7-1.png
rename to docs/highdim/dimension-reduction_files/figure-html/before-after-rotation-1.png
diff --git a/docs/highdim/dimension-reduction_files/figure-html/digit-pc-boxplot-1.png b/docs/highdim/dimension-reduction_files/figure-html/digit-pc-boxplot-1.png
new file mode 100644
index 0000000..56a99c3
Binary files /dev/null and b/docs/highdim/dimension-reduction_files/figure-html/digit-pc-boxplot-1.png differ
diff --git a/docs/highdim/dimension-reduction_files/figure-html/histograms-of-dimensions-1.png b/docs/highdim/dimension-reduction_files/figure-html/histograms-of-dimensions-1.png
new file mode 100644
index 0000000..f6a961c
Binary files /dev/null and b/docs/highdim/dimension-reduction_files/figure-html/histograms-of-dimensions-1.png differ
diff --git a/docs/highdim/dimension-reduction_files/figure-html/rotation-diagram-1.png b/docs/highdim/dimension-reduction_files/figure-html/rotation-diagram-1.png
new file mode 100644
index 0000000..be2e591
Binary files /dev/null and b/docs/highdim/dimension-reduction_files/figure-html/rotation-diagram-1.png differ
diff --git a/docs/highdim/intro-highdim.html b/docs/highdim/intro-highdim.html
index 6837695..b3993de 100644
--- a/docs/highdim/intro-highdim.html
+++ b/docs/highdim/intro-highdim.html
@@ -355,7 +355,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/highdim/linear-algebra.html b/docs/highdim/linear-algebra.html
index 08dc08c..0ffd727 100644
--- a/docs/highdim/linear-algebra.html
+++ b/docs/highdim/linear-algebra.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -576,10 +576,10 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 <span class="header-section-number">21.3</span> Distance</h2>
 <p>Many of the analyses we perform with high-dimensional data relate directly or indirectly to distance. For example, most machine learning techniques rely on being able to define distances between observations, using features or predictors. Clustering algorithms, for example, search of observations that are <em>similar</em>. But what does this mean mathematically?</p>
 <p>To define distance, we introduce another linear algebra concept: the <em>norm</em>. Recall that a point in two dimensions can be represented in polar coordinates as:</p>
-<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="linear-algebra_cache/html/unnamed-chunk-6_83a0e7172c194f01666baf7c00c439f9">
+<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="linear-algebra_cache/html/polar-coords_92b201a5aa55431df4d775b5c28654cf">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="linear-algebra_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="linear-algebra_files/figure-html/polar-coords-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -593,10 +593,10 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 ||\mathbf{x}||^2 = \mathbf{x}^\top\mathbf{x}
 \]</span></p>
 <p>To define distance, suppose we have two two-dimensional points: <span class="math inline">\(\mathbf{x}_1\)</span> and <span class="math inline">\(\mathbf{x}_2\)</span>. We can define how similar they are by simply using euclidean distance:</p>
-<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="linear-algebra_cache/html/unnamed-chunk-7_9550ed7232707ad5ebfd513fc2a8839a">
+<div class="cell" data-layout-align="center" data-fig.asp="0.7" data-hash="linear-algebra_cache/html/euclidean-dist-diagram_02464ea0933026d85f7129732cfd7a5c">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="linear-algebra_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="linear-algebra_files/figure-html/euclidean-dist-diagram-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -614,35 +614,35 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 || \mathbf{x}_1 - \mathbf{x}_2 || = \sqrt{ \sum_{j=1}^{784} (x_{1,j}-x_{2,j })^2 }
 \]</span></p>
 <p>To demonstrate, let’s pick the features for three digits:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-8_02af8e219d7426c0efcf5e9b081ab7f5">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-6_933f984f939620e901b52050a858f4ea">
 <div class="sourceCode" id="cb5"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">x_1</span> <span class="op">&lt;-</span> <span class="va">x</span><span class="op">[</span><span class="fl">6</span>,<span class="op">]</span></span>
 <span><span class="va">x_2</span> <span class="op">&lt;-</span> <span class="va">x</span><span class="op">[</span><span class="fl">17</span>,<span class="op">]</span></span>
 <span><span class="va">x_3</span> <span class="op">&lt;-</span> <span class="va">x</span><span class="op">[</span><span class="fl">16</span>,<span class="op">]</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can compute the distances between each pair using the definitions we just learned:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-9_51719184383da70824f40307e9cc0d16">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-7_b3bc8f3d2cdbd52a3dbcf677ab4b476f">
 <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="op">(</span><span class="va">x_1</span> <span class="op">-</span> <span class="va">x_2</span><span class="op">)</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="op">(</span><span class="va">x_1</span> <span class="op">-</span> <span class="va">x_3</span><span class="op">)</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="op">(</span><span class="va">x_2</span> <span class="op">-</span> <span class="va">x_3</span><span class="op">)</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">sqrt</a></span><span class="op">(</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 2320 2331 2519</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>In R, the function <code>crossprod(x)</code> is convenient for computing norms. It multiplies <code>t(x)</code> by <code>x</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-10_9b1fb5ce878003791921aaf6b49b7f8a">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-8_f6b8998c0e5d64f143ec17d87e915fbb">
 <div class="sourceCode" id="cb7"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/crossprod.html">crossprod</a></span><span class="op">(</span><span class="va">x_1</span> <span class="op">-</span> <span class="va">x_2</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/crossprod.html">crossprod</a></span><span class="op">(</span><span class="va">x_1</span> <span class="op">-</span> <span class="va">x_3</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/crossprod.html">crossprod</a></span><span class="op">(</span><span class="va">x_2</span> <span class="op">-</span> <span class="va">x_3</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">sqrt</a></span><span class="op">(</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 2320 2331 2519</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Note <code>crossprod</code> takes a matrix as the first argument. As a result, the vectors used here are being coerced into single column matrices. Also, note that <code>crossprod(x,y)</code> multiples <code>t(x)</code> by <code>y</code>.</p>
 <p>We can see that the distance is smaller between the first two. This agrees with the fact that the first two are 2s and the third is a 7.</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-11_437cc853efc62bac74dc89af37c5499b">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-9_b832ea8e45b40ebc56e37b935614b55e">
 <div class="sourceCode" id="cb8"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">6</span>, <span class="fl">17</span>, <span class="fl">16</span><span class="op">)</span><span class="op">]</span></span>
 <span><span class="co">#&gt; [1] 2 2 7</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can also compute <strong>all</strong> the distances at once relatively quickly using the function <code>dist</code>, which computes the distance between each row and produces an object of class <code>dist</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-12_ffe606ba8cb363f127c5bd25de46dc7a">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-10_57019f7ee859e6c232b9c154fb30fd0a">
 <div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">d</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="va">x</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">6</span>,<span class="fl">17</span>,<span class="fl">16</span><span class="op">)</span>,<span class="op">]</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/class.html">class</a></span><span class="op">(</span><span class="va">d</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] "dist"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>There are several machine learning related functions in R that take objects of class <code>dist</code> as input. To access the entries using row and column indices, we need to coerce it into a matrix. We can see the distance we calculated above like this:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-13_cdd1a44ac4a67e85e7dabe7f4edb756d">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-11_fe89967693734f8a3daaa720aba716ef">
 <div class="sourceCode" id="cb10"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">d</span></span>
 <span><span class="co">#&gt;      1    2</span></span>
 <span><span class="co">#&gt; 2 2320     </span></span>
@@ -654,7 +654,7 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/image.html">image</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="va">d</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>If we order this distance by the labels, we can see yellowish squares near the diagonal. This is because observations from the same digits tend to be closer than to different digits:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-15_f7f064f028d5f1d4da7be1ab6125e5b6">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-13_49c3e4628b6718e1c2cb96fe2fb9ed7a">
 <div class="sourceCode" id="cb12"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/image.html">image</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="va">d</span><span class="op">)</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/order.html">order</a></span><span class="op">(</span><span class="va">y</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">300</span><span class="op">]</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/order.html">order</a></span><span class="op">(</span><span class="va">y</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">300</span><span class="op">]</span><span class="op">)</span><span class="op">]</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/diatance-image-ordered_645e91cee91a5cbeadc0d14cfa8230d5">
@@ -678,7 +678,7 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 </section><section id="exercises" class="level2" data-number="21.5"><h2 data-number="21.5" class="anchored" data-anchor-id="exercises">
 <span class="header-section-number">21.5</span> Exercises</h2>
 <p>1. Generate two matrix, <code>A</code> and <code>B</code>, containing randomly generated and normally distributed numbers. The dimensions of these two matrices should be <span class="math inline">\(4 \times 3\)</span> and <span class="math inline">\(3 \times 6\)</span>, respectively. Confirm that <code>C &lt;- A %*% B</code> produces the same results as:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-16_41a156ed9162f9bcf792c2bf3bb80eb0">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-14_9bde2fd6bc1c992058d2f61522f40140">
 <div class="sourceCode" id="cb13"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">m</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">nrow</a></span><span class="op">(</span><span class="va">A</span><span class="op">)</span></span>
 <span><span class="va">p</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">B</span><span class="op">)</span></span>
 <span><span class="va">C</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fl">0</span>, <span class="va">m</span>, <span class="va">p</span><span class="op">)</span></span>
@@ -698,13 +698,13 @@ <h1 class="title"><span id="sec-matrix-algebra" class="quarto-section-identifier
 \end{aligned}
 \]</span></p>
 <p>3. Define <code>x</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-17_b447e7725278e49c83a7b7b0f7ed2ca0">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-15_b7150ff71fcf13b79c38bcee49b17380">
 <div class="sourceCode" id="cb14"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">mnist</span> <span class="op">&lt;-</span> <span class="fu">read_mnist</span><span class="op">(</span><span class="op">)</span></span>
 <span><span class="va">x</span> <span class="op">&lt;-</span> <span class="va">mnist</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">images</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">300</span>,<span class="op">]</span> </span>
 <span><span class="va">y</span> <span class="op">&lt;-</span> <span class="va">mnist</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">labels</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">300</span><span class="op">]</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>and compute the distance matrix:</p>
-<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-18_78b8df1aac8d6a49c524f9c1d6ab21b9">
+<div class="cell" data-layout-align="center" data-hash="linear-algebra_cache/html/unnamed-chunk-16_bf307ff1dcfa1b53f5ceecc87aa7fe71">
 <div class="sourceCode" id="cb15"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">d</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/dist.html">dist</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/class.html">class</a></span><span class="op">(</span><span class="va">d</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
diff --git a/docs/highdim/linear-algebra_files/figure-html/unnamed-chunk-7-1.png b/docs/highdim/linear-algebra_files/figure-html/euclidean-dist-diagram-1.png
similarity index 100%
rename from docs/highdim/linear-algebra_files/figure-html/unnamed-chunk-7-1.png
rename to docs/highdim/linear-algebra_files/figure-html/euclidean-dist-diagram-1.png
diff --git a/docs/highdim/linear-algebra_files/figure-html/polar-coords-1.png b/docs/highdim/linear-algebra_files/figure-html/polar-coords-1.png
new file mode 100644
index 0000000..d3dd184
Binary files /dev/null and b/docs/highdim/linear-algebra_files/figure-html/polar-coords-1.png differ
diff --git a/docs/highdim/matrices-in-R.html b/docs/highdim/matrices-in-R.html
index a9474cb..5f7a33b 100644
--- a/docs/highdim/matrices-in-R.html
+++ b/docs/highdim/matrices-in-R.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/highdim/matrix-factorization.html b/docs/highdim/matrix-factorization.html
index 4195486..1345e8d 100644
--- a/docs/highdim/matrix-factorization.html
+++ b/docs/highdim/matrix-factorization.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -503,28 +503,28 @@ <h1 class="title">
 </div>
 </div>
 <p>The histogram below shows there are three type of users: those that love mob movies and hate romance movies, those that don’t care, and those that love romance movies and hate mob movies.</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-8_2aca162f31af8a7e8f62c42662998b8c">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/factor-histogram-example_8b7fe03a73fb86e5b06e5633a9f9346b">
 <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/hist.html">hist</a></span><span class="op">(</span><span class="va">p</span>, breaks <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="op">-</span><span class="fl">2</span>,<span class="fl">2</span>,<span class="fl">0.1</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="matrix-factorization_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="matrix-factorization_files/figure-html/factor-histogram-example-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
 </div>
 <p>To see that we can approximate <span class="math inline">\(\varepsilon_{i,j}\)</span> with $p_iq_j we convert the vectors to matrices and use linear algebra:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-9_fe1053d44518ef68d9f5505fd311031a">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/e-vs-pq_5dcbcc7c00ef1360479882671fc77466">
 <div class="sourceCode" id="cb7"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">p</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="va">p</span><span class="op">)</span>; <span class="va">q</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="va">q</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">p</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">q</span><span class="op">)</span>, <span class="va">e</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="matrix-factorization_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="matrix-factorization_files/figure-html/e-vs-pq-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
 </div>
 <p>However, after removing this mob/romance effect, we still see structure in the correlation:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-10_883831ab76726a08270268ca53ce2803">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/corr-due-to-alpacino_237766ff4631d5e130d2f3d99f6b18f9">
 <div class="sourceCode" id="cb8"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">e</span> <span class="op">-</span> <span class="va">p</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">q</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="co">#&gt;                      Godfather Godfather 2 Goodfellas Scent of a Woman</span></span>
 <span><span class="co">#&gt; Godfather                1.000       0.185     -0.545            0.557</span></span>
@@ -542,21 +542,21 @@ <h1 class="title">
 <span><span class="co">#&gt; Sleepless in Seattle           0.353                1.000</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>This structure seems to be driven by Al Pacino being in the movie or not. This implies we could add another factor:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-11_f68ebae43ddec6d288ad1ba61e2abe67">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-8_983eb697eee4cbac697bbc3768985f75">
 <div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">q</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span>, <span class="op">-</span><span class="fl">1</span>, <span class="op">-</span><span class="fl">1</span>, <span class="fl">1</span>, <span class="fl">1</span>, <span class="fl">1</span><span class="op">)</span>,</span>
 <span>           <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">1</span>, <span class="fl">1</span>, <span class="op">-</span><span class="fl">1</span>, <span class="fl">1</span>, <span class="op">-</span><span class="fl">1</span>, <span class="fl">1</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can then obtain estimates for each user:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-12_45453bc7a2859fc5dc0fd1c6f25cf277">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-9_22dbe768fac1a8cf42d4dd68cac9edff">
 <div class="sourceCode" id="cb10"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">p</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/apply.html">apply</a></span><span class="op">(</span><span class="va">e</span>, <span class="fl">1</span>, <span class="kw">function</span><span class="op">(</span><span class="va">y</span><span class="op">)</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="va">y</span><span class="op">~</span><span class="va">q</span><span class="op">-</span><span class="fl">1</span><span class="op">)</span><span class="op">$</span><span class="va">coefficient</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Note that we use the transpose <code>t</code> because <code>apply</code> binds results into columns and we want a row for each user.</p>
 <p>Our approximation based on two factors does a even better job of predicting how our residuals deviate from 0:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-13_90198675a6f8e9f0eee738a1eff5a01f">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/e-vs-pq-2_b2d09944213f857a1cd69000edad4491">
 <div class="sourceCode" id="cb11"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">p</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">q</span><span class="op">)</span>, <span class="va">e</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="matrix-factorization_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="matrix-factorization_files/figure-html/e-vs-pq-2-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -576,16 +576,16 @@ <h1 class="title">
 \]</span></p>
 <p>with <span class="math inline">\(\mathbf{Z}\)</span> the matrix of principal components.</p>
 <p>Let’s perform PCA and examine the results:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-14_ab888cbd5756f354de43ea66efe4e969">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-10_a9cbd8faa0c1c3d68bb5232cacf3b4eb">
 <div class="sourceCode" id="cb12"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">e</span>, center <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>First, notice that the first two PCs explain over 95% of the variability:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-15_c945a80d5feb929d74c1f218ed5b809b">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-11_63a5dd3ceafd12547f958bf1edbfc1a5">
 <div class="sourceCode" id="cb13"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">pca</span><span class="op">$</span><span class="va">sdev</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 0.6939 0.1790 0.0402 0.0313 0.0303 0.0253</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Next, notice that the first column of <span class="math inline">\(\mathbf{V}\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-16_638248ab3ab838b697ba46c7552a494a">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-12_c4688ef82e3d0946934821271c6354d6">
 <div class="sourceCode" id="cb14"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span><span class="op">$</span><span class="va">rotation</span><span class="op">[</span>,<span class="fl">1</span><span class="op">]</span></span>
 <span><span class="co">#&gt;            Godfather          Godfather 2           Goodfellas </span></span>
 <span><span class="co">#&gt;                0.306                0.261                0.581 </span></span>
@@ -594,7 +594,7 @@ <h1 class="title">
 </div>
 <p>is assigning positive values to the mob movies and negative values to the romance movies.</p>
 <p>The second column:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-17_50a2a5ca78ab10ac4b35f917ddd6df00">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-13_ca4738e9c893d687f2245584659349c1">
 <div class="sourceCode" id="cb15"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span><span class="op">$</span><span class="va">rotation</span><span class="op">[</span>,<span class="fl">2</span><span class="op">]</span></span>
 <span><span class="co">#&gt;            Godfather          Godfather 2           Goodfellas </span></span>
 <span><span class="co">#&gt;               -0.354               -0.377                0.382 </span></span>
@@ -619,7 +619,7 @@ <h1 class="title">
 </section><section id="case-study-movie-recommendations" class="level2" data-number="24.3"><h2 data-number="24.3" class="anchored" data-anchor-id="case-study-movie-recommendations">
 <span class="header-section-number">24.3</span> Case study: movie recommendations</h2>
 <p>Note that if we look at the correlation structure of the movies for which we simulated data in the previous sections, we see structure as well:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-18_59a9c6e6a6267217b58aacbcbf98c407">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-14_ecd07e91ee7a34dd75f5b552a21a9f83">
 <pre><code>#&gt;                         Godfather, The Godfather: Part II, The
 #&gt; Godfather, The                   1.000                   0.842
 #&gt; Godfather: Part II, The          0.842                   1.000
@@ -649,20 +649,20 @@ <h1 class="title">
 \]</span></p>
 <p>Unfortunately, we can’t fit this model with <code>prcomp</code> due to the missing values. We introduce the <strong>missMDA</strong> package that provides an approach to fit such models when matrix entries are missing, a very common occurrence in movie recommendations, through the function <code>imputePCA</code>. Also, because there are small sample sizes for several movie pairs, it is useful to regularize the <span class="math inline">\(p\)</span>s. The <code>imputePCA</code> function also permits regularization.</p>
 <p>We use the estimates for <span class="math inline">\(\mu\)</span>, the <span class="math inline">\(\alpha\)</span>s and <span class="math inline">\(\beta\)</span>s from the previous chapter, and estimate two factors (<code>ncp = 2</code>). We fit the model to movies rated more than 25 times, include <em>Scent of a Woman</em>, which does not meet this criterion, because we previously used it as an example. Finally, we use regularization by setting the parameter <code>coeff.ridge</code> to the same value used to estimate the <span class="math inline">\(\beta\)</span>s.</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-19_12c36da05923ee19dd5e6057c6653a72">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-15_51dcda555208594fd5d18b0725d9acbe">
 <div class="sourceCode" id="cb17"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="http://factominer.free.fr/missMDA/index.html">missMDA</a></span><span class="op">)</span></span>
 <span><span class="va">ind</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="op">!</span><span class="fu"><a href="https://rdrr.io/r/base/NA.html">is.na</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span> <span class="op">&gt;=</span> <span class="fl">25</span> <span class="op">|</span> <span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span> <span class="op">==</span> <span class="st">"3252"</span></span>
 <span><span class="va">imputed</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/missMDA/man/imputePCA.html">imputePCA</a></span><span class="op">(</span><span class="va">r</span><span class="op">[</span>,<span class="va">ind</span><span class="op">]</span>, ncp <span class="op">=</span> <span class="fl">2</span>, coeff.ridge <span class="op">=</span> <span class="va">lambda</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>To see how much we improve our previous prediction, we construct a matrix with the ratings in the test set:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-20_721c1c75df4dc1a87182161033a6e81d">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-16_859c6077f27749e51e39ea9980cac6e1">
 <div class="sourceCode" id="cb18"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y_test</span> <span class="op">&lt;-</span> <span class="fu">select</span><span class="op">(</span><span class="va">test_set</span>, <span class="va">movieId</span>, <span class="va">userId</span>, <span class="va">rating</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu">pivot_wider</span><span class="op">(</span>names_from <span class="op">=</span> <span class="va">movieId</span>, values_from <span class="op">=</span> <span class="va">rating</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu">column_to_rownames</span><span class="op">(</span><span class="st">"userId"</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">as.matrix</a></span><span class="op">(</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>and construct our predictor obtained with <code>imputePCA</code>. We start by constructing the predictor from the previous chapter:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-21_e77bb4d30a5006eb075bec5f048f86a2">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-17_189f5697dae7719b790b75b7b8e71002">
 <div class="sourceCode" id="cb19"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pred</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fl">0</span>, <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">nrow</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">pred</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>; <span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">pred</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span></span>
 <span><span class="va">pred</span> <span class="op">&lt;-</span> <span class="fu">clamp</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/sweep.html">sweep</a></span><span class="op">(</span><span class="va">pred</span> <span class="op">+</span> <span class="va">mu</span> <span class="op">+</span> <span class="va">a</span>, <span class="fl">2</span>, <span class="va">b_reg</span>, FUN <span class="op">=</span> <span class="st">"+"</span><span class="op">)</span><span class="op">)</span></span>
@@ -670,11 +670,11 @@ <h1 class="title">
 <span><span class="co">#&gt; [1] 0.889</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Then we adjust the prediction to include the imputed residuals for the test set:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-22_cc7e1a6371f810dde5ef052fb9e2e8bb">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-18_640ed778d38321c32e2469b39b0bda0d">
 <div class="sourceCode" id="cb20"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pred</span><span class="op">[</span>,<span class="va">ind</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="fu">clamp</span><span class="op">(</span><span class="va">pred</span><span class="op">[</span>,<span class="va">ind</span><span class="op">]</span> <span class="op">+</span> <span class="va">imputed</span><span class="op">$</span><span class="va">completeObs</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We see that our prediction improves:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-23_8da0aaf2237eaf4091aba03fe0247a07">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-19_42827f71fa90e56bcf7f691ab33556a1">
 <div class="sourceCode" id="cb21"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">rmse</span><span class="op">(</span><span class="va">y_test</span> <span class="op">-</span> <span class="va">pred</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">y_test</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y_test</span><span class="op">)</span><span class="op">]</span><span class="op">)</span></span>
 <span><span class="co">#&gt; [1] 0.875</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -682,11 +682,11 @@ <h1 class="title">
 <section id="visualizing-factors" class="level3" data-number="24.3.1"><h3 data-number="24.3.1" class="anchored" data-anchor-id="visualizing-factors">
 <span class="header-section-number">24.3.1</span> Visualizing factors</h3>
 <p>We can compute the first two principal components used for the prediction using <code>prcomp</code>.</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-24_17bc74ca9ae733086a808ec3d78c87bf">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-20_faaa2970ee366b06f10c96c7c086f5b4">
 <div class="sourceCode" id="cb22"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">imputed</span><span class="op">$</span><span class="va">completeObs</span>, center <span class="op">=</span> <span class="cn">FALSE</span>, rank. <span class="op">=</span> <span class="fl">3</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>By adding the movie names to the rotation matrix:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-25_a1ae975c258d5a36db3193917ea7ebb4">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-21_f0b023503412516947249baaf482c2ac">
 <div class="sourceCode" id="cb23"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">v</span> <span class="op">&lt;-</span> <span class="va">pca</span><span class="op">$</span><span class="va">rotation</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">v</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">movie_map</span>, <span class="va">title</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/match.html">match</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">r</span><span class="op">[</span>,<span class="va">ind</span><span class="op">]</span><span class="op">)</span>, <span class="va">movieId</span><span class="op">)</span><span class="op">]</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -700,7 +700,7 @@ <h1 class="title">
 </div>
 </div>
 <p>By looking at the highest and lowest values for the first principal component, we see a meaningful pattern. The first PC shows the difference between Hollywood blockbusters on one side:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-26_1da0045353debdbd27527cdbad64f4d3">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-22_0659b4ef5debe51136c04619c69c34e4">
 <pre><code>#&gt;  [1] "Armageddon"                    "Pearl Harbor"                 
 #&gt;  [3] "X2: X-Men United"              "Dark Knight Rises, The"       
 #&gt;  [5] "X-Men"                         "Independence Day (a.k.a. ID4)"
@@ -708,7 +708,7 @@ <h1 class="title">
 #&gt;  [9] "World Is Not Enough, The"      "Spider-Man 2"</code></pre>
 </div>
 <p>and critically acclaimed movies on the other:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-27_7a85fcfd571756c3f9025d9af58414ba">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-23_4052521db06852d42e497556f47f67c5">
 <pre><code>#&gt;  [1] "2001: A Space Odyssey"          "American Psycho"               
 #&gt;  [3] "Royal Tenenbaums, The"          "Harold and Maude"              
 #&gt;  [5] "Apocalypse Now"                 "Fear and Loathing in Las Vegas"
@@ -725,7 +725,7 @@ <h1 class="title">
 \mathbf{U}^\top\mathbf{D}\mathbf{U} = \mathbf{D}^2
 \]</span></p>
 <p>In R, we can obtain the SVD using the function <code>svd</code>. To see the connection to PCA, notice that:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-28_a9e9a553b24f04791c84cd83650df452">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-24_8270d6f136f04adc50dd69420ee2d0dd">
 <div class="sourceCode" id="cb26"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">x</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="fl">1000</span><span class="op">)</span>, <span class="fl">100</span>, <span class="fl">10</span><span class="op">)</span></span>
 <span><span class="va">pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/prcomp.html">prcomp</a></span><span class="op">(</span><span class="va">x</span>, center <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></span>
 <span><span class="va">s</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/svd.html">svd</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
@@ -742,7 +742,7 @@ <h1 class="title">
 <span class="header-section-number">24.5</span> Exercises</h2>
 <p>In this exercise set, we use the singular value decomposition (SVD) to estimate factors in an example related to the first application of factor analysis: finding factors related to student performance in school.</p>
 <p>We construct a dataset that represents grade scores for 100 students in 24 different subjects. The overall average has been removed so this data represents the percentage points each student received above or below the average test score. So a 0 represents an average grade (C), a 25 is a high grade (A+), and a -25 represents a low grade (F). You can simulate the data like this:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-29_6437723707489fd8319c06a46c596045">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-25_86cd90fa7064d9bec925362d64b84359">
 <div class="sourceCode" id="cb27"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1987</span><span class="op">)</span></span>
 <span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fl">100</span></span>
 <span><span class="va">k</span> <span class="op">&lt;-</span> <span class="fl">8</span></span>
@@ -757,7 +757,7 @@ <h1 class="title">
 </div>
 <p>Our goal is to describe the student performances as succinctly as possible. For example, we want to know if these test results are all simply random independent numbers. Are all students just about as good? Does being good in one subject imply one will be good in another? How does the SVD help with all this? We will go step by step to show that with just three relatively small pairs of vectors, we can explain much of the variability in this <span class="math inline">\(100 \times 24\)</span> dataset.</p>
 <p>You can visualize the 24 test scores for the 100 students by plotting an image:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-30_064402a365f29bd1b4904fca5eb654df">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-26_dc3f723edc34be10d0f12f63b928b715">
 <div class="sourceCode" id="cb28"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">my_image</span> <span class="op">&lt;-</span> <span class="kw">function</span><span class="op">(</span><span class="va">x</span>, <span class="va">zlim</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/range.html">range</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span>, <span class="va">...</span><span class="op">)</span><span class="op">{</span></span>
 <span>  <span class="va">colors</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu">RColorBrewer</span><span class="fu">::</span><span class="fu"><a href="https://rdrr.io/pkg/RColorBrewer/man/ColorBrewer.html">brewer.pal</a></span><span class="op">(</span><span class="fl">9</span>, <span class="st">"RdBu"</span><span class="op">)</span><span class="op">)</span></span>
 <span>  <span class="va">cols</span> <span class="op">&lt;-</span> <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span></span>
@@ -778,7 +778,7 @@ <h1 class="title">
 <li>The students that are good at math are not good at humanities.</li>
 </ol>
 <p>2. You can examine the correlation between the test scores directly like this:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-31_d1c1ae450bc22b4225461e81eaea7a8e">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-27_1c96935f11be4fc36ec2a28d4ed0f6f7">
 <div class="sourceCode" id="cb29"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">my_image</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, zlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span>,<span class="fl">1</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/range.html">range</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/axis.html">axis</a></span><span class="op">(</span>side <span class="op">=</span> <span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span>, las <span class="op">=</span> <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -794,12 +794,12 @@ <h1 class="title">
 <p><span class="math display">\[ \mathbf{Y V} = \mathbf{U D} \mbox{ or } \mathbf{U}^{\top}\mathbf{Y} = \mathbf{D V}^{\top}\]</span></p>
 <p>We can think of <span class="math inline">\(\mathbf{YV}\)</span> and <span class="math inline">\(\mathbf{U}^{\top}\mathbf{V}\)</span> as two transformations of <span class="math inline">\(\mathbf{Y}\)</span> that preserve the total variability.</p>
 <p>Use the function <code>svd</code> to compute the SVD of <code>y</code>. This function will return <span class="math inline">\(U\)</span>, <span class="math inline">\(V\)</span> and the diagonal entries of <span class="math inline">\(D\)</span>.</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-32_f1faec422db2701fc53765c1228c3be5">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-28_8c483fd06f6585ff543f8757e7a90cdf">
 <div class="sourceCode" id="cb30"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">s</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/svd.html">svd</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/names.html">names</a></span><span class="op">(</span><span class="va">s</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>You can check that the SVD works by typing:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-33_84a48f72b776a680e24ce4d4834a941f">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-29_d73d48b2b34137566654f09badf29ec1">
 <div class="sourceCode" id="cb31"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y_svd</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/sweep.html">sweep</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">u</span>, <span class="va">d</span><span class="op">)</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">v</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">max</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">abs</a></span><span class="op">(</span><span class="va">y</span> <span class="op">-</span> <span class="va">y_svd</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
@@ -813,7 +813,7 @@ <h1 class="title">
 \]</span></p>
 <p>Use the <code>sweep</code> function to compute <span class="math inline">\(UD\)</span> without constructing <code>diag(s$d)</code> and without using matrix multiplication.</p>
 <p>8. We know that <span class="math inline">\(\mathbf{u}_1 d_{1,1}\)</span>, the first column of <span class="math inline">\(\mathbf{UD}\)</span>, has the most variability of all the columns of <span class="math inline">\(\mathbf{UD}\)</span>. Earlier we saw an image of <span class="math inline">\(Y\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-34_6c0be4d95bd79df54e88a6d19ad9a3c6">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-30_22aff54741535d50e2896f6cbcb61837">
 <div class="sourceCode" id="cb32"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">my_image</span><span class="op">(</span><span class="va">y</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>in which we can see that the student to student variability is quite large and that it appears that students that are good in one subject are good in all. This implies that the average (across all subjects) for each student should explain a lot of the variability. Compute the average score for each student and plot it against <span class="math inline">\(\mathbf{u}_1 d_{1,1}\)</span>, and describe what you find.</p>
@@ -832,7 +832,7 @@ <h1 class="title">
 <p><span class="math display">\[
 \mathbf{Y} \approx d_{1,1} \mathbf{u}_1 \mathbf{v}_1^{\top}
 \]</span> We know it explains <code>s$d[1]^2/sum(s$d^2) * 100</code> percent of the total variability. Our approximation only explains the observation that good students tend to be good in all subjects. But another aspect of the original data that our approximation does not explain was the higher similarity we observed within subjects. We can see this by computing the difference between our approximation and original data and then computing the correlations. You can see this by running this code:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-35_e38f87072b8fa8de79892ee71760db8a">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-31_0c96bc428e8236fe8fccef4d23d97f32">
 <div class="sourceCode" id="cb33"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">resid</span> <span class="op">&lt;-</span> <span class="va">y</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">s</span>,<span class="op">(</span><span class="va">u</span><span class="op">[</span>,<span class="fl">1</span>, drop<span class="op">=</span><span class="cn">FALSE</span><span class="op">]</span><span class="op">*</span><span class="va">d</span><span class="op">[</span><span class="fl">1</span><span class="op">]</span><span class="op">)</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">v</span><span class="op">[</span>,<span class="fl">1</span>, drop<span class="op">=</span><span class="cn">FALSE</span><span class="op">]</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu">my_image</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">resid</span><span class="op">)</span>, zlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span>,<span class="fl">1</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/axis.html">axis</a></span><span class="op">(</span>side <span class="op">=</span> <span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span>, las <span class="op">=</span> <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -843,11 +843,11 @@ <h1 class="title">
 \mathbf{Y} \approx d_{1,1} \mathbf{u}_1 \mathbf{v}_1^{\top} + d_{2,2} \mathbf{u}_2 \mathbf{v}_2^{\top}
 \]</span></p>
 <p>We know it will explain:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-36_b00b802a0439652c13b9b1e76d770e31">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-32_88890dacd277423fd8249701d9546652">
 <div class="sourceCode" id="cb34"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">d</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">2</span><span class="op">]</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">d</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span> <span class="op">*</span> <span class="fl">100</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>percent of the total variability. We can compute new residuals like this:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-37_eb24f1d9720fa8783886488541a428e8">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-33_4adf32275d3c1c3c32385608fccea7a5">
 <div class="sourceCode" id="cb35"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">resid</span> <span class="op">&lt;-</span> <span class="va">y</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">s</span>,<span class="fu"><a href="https://rdrr.io/r/base/sweep.html">sweep</a></span><span class="op">(</span><span class="va">u</span><span class="op">[</span>,<span class="fl">1</span><span class="op">:</span><span class="fl">2</span><span class="op">]</span>, <span class="fl">2</span>, <span class="va">d</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">2</span><span class="op">]</span>, FUN<span class="op">=</span><span class="st">"*"</span><span class="op">)</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">v</span><span class="op">[</span>,<span class="fl">1</span><span class="op">:</span><span class="fl">2</span><span class="op">]</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu">my_image</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">resid</span><span class="op">)</span>, zlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span>,<span class="fl">1</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/axis.html">axis</a></span><span class="op">(</span>side <span class="op">=</span> <span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span>, las <span class="op">=</span> <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -858,11 +858,11 @@ <h1 class="title">
 \mathbf{Y} \approx d_{1,1} \mathbf{u}_1 \mathbf{v}_1^{\top} + d_{2,2} \mathbf{u}_2 \mathbf{v}_2^{\top} + d_{3,3} \mathbf{u}_3 \mathbf{v}_3^{\top}
 \]</span></p>
 <p>We know it will explain:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-38_68226186f4e158dbdc35528a34008d3b">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-34_9ba123c64b83499c295c4a293dc71598">
 <div class="sourceCode" id="cb36"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">d</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">3</span><span class="op">]</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span><span class="op">(</span><span class="va">s</span><span class="op">$</span><span class="va">d</span><span class="op">^</span><span class="fl">2</span><span class="op">)</span> <span class="op">*</span> <span class="fl">100</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>percent of the total variability. We can compute new residuals like this:</p>
-<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-39_d24fb69465d8725b163a0f811ee95cf1">
+<div class="cell" data-layout-align="center" data-hash="matrix-factorization_cache/html/unnamed-chunk-35_76e5ac4cd4f9887423a2ba79c33115e2">
 <div class="sourceCode" id="cb37"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">resid</span> <span class="op">&lt;-</span> <span class="va">y</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">s</span>,<span class="fu"><a href="https://rdrr.io/r/base/sweep.html">sweep</a></span><span class="op">(</span><span class="va">u</span><span class="op">[</span>,<span class="fl">1</span><span class="op">:</span><span class="fl">3</span><span class="op">]</span>, <span class="fl">2</span>, <span class="va">d</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">3</span><span class="op">]</span>, FUN<span class="op">=</span><span class="st">"*"</span><span class="op">)</span> <span class="op"><a href="https://rdrr.io/r/base/matmult.html">%*%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="va">v</span><span class="op">[</span>,<span class="fl">1</span><span class="op">:</span><span class="fl">3</span><span class="op">]</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu">my_image</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/cor.html">cor</a></span><span class="op">(</span><span class="va">resid</span><span class="op">)</span>, zlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span>,<span class="fl">1</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/axis.html">axis</a></span><span class="op">(</span>side <span class="op">=</span> <span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/rev.html">rev</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span>, las <span class="op">=</span> <span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
diff --git a/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-1.png b/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-1.png
new file mode 100644
index 0000000..6f638ac
Binary files /dev/null and b/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-1.png differ
diff --git a/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-2-1.png b/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-2-1.png
new file mode 100644
index 0000000..e943ba8
Binary files /dev/null and b/docs/highdim/matrix-factorization_files/figure-html/e-vs-pq-2-1.png differ
diff --git a/docs/highdim/regularization.html b/docs/highdim/regularization.html
index 6e542c8..963e28b 100644
--- a/docs/highdim/regularization.html
+++ b/docs/highdim/regularization.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -599,7 +599,7 @@ <h1 class="title">
 <i class="callout-icon"></i>
 </div>
 <div class="callout-body-container">
-<p>In <a href="../ml/cross-validation.html"><span>Chapter&nbsp;29</span></a>, we provide a formal discussion of the mean squared error.</p>
+<p>In <a href="../ml/resampling-methods.html"><span>Chapter&nbsp;29</span></a>, we provide a formal discussion of the mean squared error.</p>
 </div>
 </div>
 </div>
@@ -735,7 +735,7 @@ <h1 class="title">
 </div>
 </div>
 <p>This approach will have our desired effect: when our sample size <span class="math inline">\(n_j\)</span> is very large, we obtain a stable estimate and the penalty <span class="math inline">\(\lambda\)</span> is effectively ignored since <span class="math inline">\(n_j+\lambda \approx n_j\)</span>. Yet when the <span class="math inline">\(n_j\)</span> is small, then the estimate <span class="math inline">\(\hat{\beta}_i(\lambda)\)</span> is shrunken towards 0. The larger the <span class="math inline">\(\lambda\)</span>, the more we shrink.</p>
-<p>But how do we select <span class="math inline">\(\lambda\)</span>? In <a href="../ml/cross-validation.html"><span>Chapter&nbsp;29</span></a>, we describe an approach to do this. Here we will simply compute the RMSE we for different values of <span class="math inline">\(\lambda\)</span> to illustrate the effect:</p>
+<p>But how do we select <span class="math inline">\(\lambda\)</span>? In <a href="../ml/resampling-methods.html"><span>Chapter&nbsp;29</span></a>, we describe an approach to do this. Here we will simply compute the RMSE we for different values of <span class="math inline">\(\lambda\)</span> to illustrate the effect:</p>
 <div class="cell" data-layout-align="center" data-hash="regularization_cache/html/unnamed-chunk-26_b9a1a97986e3cbf4854587c4d64238bb">
 <div class="sourceCode" id="cb23"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="op">!</span><span class="fu"><a href="https://rdrr.io/r/base/NA.html">is.na</a></span><span class="op">(</span><span class="va">y</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="va">sums</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/colSums.html">colSums</a></span><span class="op">(</span><span class="va">y</span> <span class="op">-</span> <span class="va">mu</span> <span class="op">-</span> <span class="va">a</span>, na.rm <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></span>
diff --git a/docs/index.html b/docs/index.html
index 264f4af..8df4982 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -352,7 +352,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="./ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/bayes.html b/docs/inference/bayes.html
index bad6233..6d8823a 100644
--- a/docs/inference/bayes.html
+++ b/docs/inference/bayes.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/bootstrap.html b/docs/inference/bootstrap.html
index a72c8c8..18bfc5d 100644
--- a/docs/inference/bootstrap.html
+++ b/docs/inference/bootstrap.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/bootstrap_files/figure-html/income-distribution-1.png b/docs/inference/bootstrap_files/figure-html/income-distribution-1.png
new file mode 100644
index 0000000..e12b04d
Binary files /dev/null and b/docs/inference/bootstrap_files/figure-html/income-distribution-1.png differ
diff --git a/docs/inference/bootstrap_files/figure-html/median-is-normal-1.png b/docs/inference/bootstrap_files/figure-html/median-is-normal-1.png
new file mode 100644
index 0000000..e017081
Binary files /dev/null and b/docs/inference/bootstrap_files/figure-html/median-is-normal-1.png differ
diff --git a/docs/inference/clt.html b/docs/inference/clt.html
index 83ebf65..d9b1f7c 100644
--- a/docs/inference/clt.html
+++ b/docs/inference/clt.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/confidence-intervals.html b/docs/inference/confidence-intervals.html
index 3fe8f79..d596a23 100644
--- a/docs/inference/confidence-intervals.html
+++ b/docs/inference/confidence-intervals.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/hierarchical-models.html b/docs/inference/hierarchical-models.html
index 5ee5f1a..1208d75 100644
--- a/docs/inference/hierarchical-models.html
+++ b/docs/inference/hierarchical-models.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/hypothesis-testing.html b/docs/inference/hypothesis-testing.html
index 3fd264f..f4a7199 100644
--- a/docs/inference/hypothesis-testing.html
+++ b/docs/inference/hypothesis-testing.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/intro-inference.html b/docs/inference/intro-inference.html
index 34e18ab..84e9633 100644
--- a/docs/inference/intro-inference.html
+++ b/docs/inference/intro-inference.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/models.html b/docs/inference/models.html
index 208c34a..8f584c2 100644
--- a/docs/inference/models.html
+++ b/docs/inference/models.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/inference/parameters-estimates.html b/docs/inference/parameters-estimates.html
index 1019efc..dfc408a 100644
--- a/docs/inference/parameters-estimates.html
+++ b/docs/inference/parameters-estimates.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/intro.html b/docs/intro.html
index 760bf14..64561d7 100644
--- a/docs/intro.html
+++ b/docs/intro.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="./ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/association-not-causation.html b/docs/linear-models/association-not-causation.html
index 146bf26..219bd1d 100644
--- a/docs/linear-models/association-not-causation.html
+++ b/docs/linear-models/association-not-causation.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/association-tests.html b/docs/linear-models/association-tests.html
index 560ca07..e7b30dd 100644
--- a/docs/linear-models/association-tests.html
+++ b/docs/linear-models/association-tests.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/intro-to-linear-models.html b/docs/linear-models/intro-to-linear-models.html
index 9c7fc87..e567cc6 100644
--- a/docs/linear-models/intro-to-linear-models.html
+++ b/docs/linear-models/intro-to-linear-models.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/measurement-error-models.html b/docs/linear-models/measurement-error-models.html
index aa2e53a..f2284b2 100644
--- a/docs/linear-models/measurement-error-models.html
+++ b/docs/linear-models/measurement-error-models.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/multivariate-regression.html b/docs/linear-models/multivariate-regression.html
index c8873de..c4d4490 100644
--- a/docs/linear-models/multivariate-regression.html
+++ b/docs/linear-models/multivariate-regression.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -446,14 +446,14 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
   </div>
   
 
-</header><p>Since Galton’s original development, regression has become one of the most widely used tools in data analysis. One reason has to do with the fact that an adaptation of the original regression approach, based on linear models, permits us to find relationships between two variables taking into account the effects of other variables that affect both. This has been particularly popular in fields where randomized experiments are hard to run, such as economics and epidemiology.</p>
-<p>When we are not able to randomly assign each individual to a treatment or control group, confounding is particularly prevalent. As an example, consider estimating the effect of eating fast foods on life expectancy using data collected from a random sample of people in a jurisdiction. Fast food consumers are more likely to be smokers, drinkers, and have lower incomes. Therefore, a naive regression model may lead to an overestimate of the negative health effects of fast food. So how do we account for confounding in practice? In this chapter, we learn how <em>multivariate regression</em> can help with such situations and can be used to describe how one or more variables affect an outcome variable. We illustrate with a real-world example in which data was used to help pick underappreciated players to improve a resource limited sports team.</p>
+</header><p>Since Galton’s original development, regression has become one of the most widely used tools in data analysis. One reason is the fact that an adaptation of the original regression approach, based on linear models, permits us to find relationships between two variables taking into account the effects of other variables that affect both. This has been particularly popular in fields where randomized experiments are hard to run, such as economics and epidemiology.</p>
+<p>When we are unable to randomly assign each individual to a treatment or control group, confounding becomes particularly prevalent. For instance, consider estimating the effect of eating fast foods on life expectancy using data collected from a random sample of people in a jurisdiction. Fast food consumers are more likely to be smokers, drinkers, and have lower incomes. Consequently, a naive regression model may lead to an overestimate of the negative health effects of fast food. So, how do we account for confounding in practice? In this chapter, we learn how <em>multivariate regression</em> can help with such situations and can be used to describe how one or more variables affect an outcome variable. We illustrate with a real-world example in which data was used to help pick underappreciated players to improve a resource limited sports team.</p>
 <section id="case-study-moneyball" class="level2" data-number="15.1"><h2 data-number="15.1" class="anchored" data-anchor-id="case-study-moneyball">
 <span class="header-section-number">15.1</span> Case study: Moneyball</h2>
 <p><em>Moneyball: The Art of Winning an Unfair Game</em> by Michael Lewis focuses on the Oakland Athletics (A’s) baseball team and its general manager, Billy Beane, the person tasked with building the team.</p>
 <p>Traditionally, baseball teams use <em>scouts</em> to help them decide what players to hire. These scouts evaluate players by observing them perform, tending to favor athletic players with observable physical abilities. For this reason, scouts generally agree on who the best players are and, as a result, these players are often in high demand. This in turn drives up their salaries.</p>
-<p>From 1989 to 1991, the A’s had one of the highest payrolls in baseball. They were able to buy the best players and, during that time, were one of the best teams. However, in 1995, the A’s team owner changed and the new management cut the budget drastically, leaving then general manager, Sandy Alderson, with one of the lowest payrolls in baseball. He could no longer afford the most sought-after players. As a result, Alderson began using a statistical approach to find inefficiencies in the market. Alderson was a mentor to Billy Beane, who succeeded him in 1998 and fully embraced data science, as opposed to scouts, as a method for finding low-cost players that data predicted would help the team win. Today, this strategy has been adapted by most baseball teams. As we will see, regression plays a large role in this approach.</p>
-<p>As motivation for this part of the book, we will pretend it is 2002 and try to build a baseball team with a limited budget, just like the A’s had to do. To appreciate what you are up against, note that in 2002 the Yankees’ payroll of $125,928,583 more than tripled the Oakland A’s $39,679,746:</p>
+<p>From 1989 to 1991, the A’s had one of the highest payrolls in baseball. They were able to buy the best players and, during that time, were one of the best teams. However, in 1995, the A’s team owner changed and the new management cut the budget drastically, leaving then general manager, Sandy Alderson, with one of the lowest payrolls in baseball. He could no longer afford the most sought-after players. As a result, Alderson began using a statistical approach to find inefficiencies in the market. Alderson was a mentor to Billy Beane, who succeeded him in 1998 and fully embraced data science, as opposed to relying exclusively on scouts, as a method for finding low-cost players that data predicted would help the team win. Today, this strategy has been adapted by most baseball teams. As we will see, regression plays a significant role in this approach.</p>
+<p>As motivation for this part of the book, let’s imagine it is 2002, and attempt to build a baseball team with a limited budget, just like the A’s had to do. To appreciate what you are up against, note that in 2002 the Yankees’ payroll of $125,928,583 more than tripled the Oakland A’s payroll of $39,679,746:</p>
 <div class="cell" data-layout-align="center" data-hash="multivariate-regression_cache/html/mlb-2002-payroll_b2e9bc1aa8e3fa2c2fc8a989178bd3fe">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
@@ -462,14 +462,14 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 </div>
 </div>
 </div>
-<p>Statistics have been used in baseball since its beginnings. The dataset we will be using, included in the <strong>Lahman</strong> library, goes back to the 19th century. For example, a summary statistics we will describe soon, the <em>batting average</em>, has been used for decades to summarize a batter’s success. Other statistics<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> such as home runs (HR), runs batted in (RBI), and stolen bases (SB) are reported for each player in the game summaries included in the sports section of newspapers, with players rewarded for high numbers. Although summary statistics such as these were widely used in baseball, data analysis per se was not. These statistics were arbitrarily decided on without much thought as to whether they actually predicted anything or were related to helping a team win.</p>
-<p>This changed with Bill James<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. In the late 1970s, this aspiring writer and baseball fan started publishing articles describing more in-depth analysis of baseball data. He named the approach of using data to predict what outcomes best predicted if a team would win <em>sabermetrics</em><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. Yet until Billy Beane made sabermetrics the center of his baseball operation, Bill James’ work was mostly ignored by the baseball world. Currently, sabermetrics popularity is no longer limited to just baseball, with other sports also adopting this approach.</p>
+<p>Statistics have been used in baseball since its beginnings. The dataset we will be using, included in the <strong>Lahman</strong> library, goes back to the 19th century. For example, a summary statistics we will describe soon, the <em>batting average</em>, has been used for decades to summarize a batter’s success. Other statistics<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>, such as home runs (HR), runs batted in (RBI), and stolen bases (SB), are reported for each player in the game summaries included in the sports section of newspapers, with players rewarded for high numbers. Although summary statistics such as these were widely used in baseball, data analysis per se was not. These statistics were arbitrarily chosen without much thought as to whether they actually predicted anything or were related to helping a team win.</p>
+<p>This changed with Bill James<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. In the late 1970s, this aspiring writer and baseball fan started publishing articles describing more in-depth analysis of baseball data. He named the approach of using data to determine which outcomes best predicted if a team would win <em>sabermetrics</em><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. Yet until Billy Beane made sabermetrics the center of his baseball operation, Bill James’ work was mostly ignored by the baseball world. Currently, sabermetrics popularity is no longer limited to just baseball, with other sports also adopting this approach.</p>
 <p>To simplify the exercise, we will focus on scoring runs and ignore the two other important aspects of the game: pitching and fielding. We will see how regression analysis can help develop strategies to build a competitive baseball team with a constrained budget. The approach can be divided into two separate data analyses. In the first, we determine which recorded player-specific statistics predict runs. In the second, we examine if players were undervalued based on the predictions from our first analysis.</p>
 <section id="baseball-basics" class="level3" data-number="15.1.1"><h3 data-number="15.1.1" class="anchored" data-anchor-id="baseball-basics">
 <span class="header-section-number">15.1.1</span> Baseball basics</h3>
-<p>To see how regression will help us find undervalued players, we actually don’t need to understand all the details about the game of baseball, which has over 100 rules. Here, we distill the sport to the basic knowledge one needs to know how to effectively attack the data science problem.</p>
+<p>To understand how regression helps us find undervalued players, we don’t need to delve into all the details of the game of baseball, which has over 100 rules. Here, we distill the sport to the basic knowledge necessary for effectively addressing the data science problem.</p>
 <p>The goal of a baseball game is to score more runs (points) than the other team. Each team has 9 batters that have an opportunity to hit a ball with a bat in a predetermined order. After the 9th batter has had their turn, the first batter bats again, then the second, and so on. Each time a batter has an opportunity to bat, we call it a plate appearance (PA). At each PA, the other team’s <em>pitcher</em> throws the ball and the batter tries to hit it. The PA ends with an binary outcome: the batter either makes an <em>out</em> (failure) and returns to the bench, or the batter doesn’t (success) and can run around the bases, potentially scoring a run (reaching all 4 bases). Each team gets nine tries, referred to as <em>innings</em>, to score runs, and each inning ends after three outs (three failures).</p>
-<p>Here is a video showing a success: <a href="https://www.youtube.com/watch?v=HL-XjMCPfio" class="uri">https://www.youtube.com/watch?v=HL-XjMCPfio</a>. And here is one showing a failure: <a href="https://www.youtube.com/watch?v=NeloljCx-1g" class="uri">https://www.youtube.com/watch?v=NeloljCx-1g</a>. In these videos, we see how luck is involved in the process. When at bat, the batter wants to hit the ball hard. If the batter hits it hard enough, it is a HR, the best possible outcome as the batter gets at least one automatic run. But sometimes, due to chance, the batter hits the ball very hard and a defender catches it, resulting in an out. In contrast, sometimes the batter hits the ball softly, but it lands just in the right place. The fact that there is chance involved hints at why probability models will be involved.</p>
+<p>Here is a video showing a success: <a href="https://www.youtube.com/watch?v=HL-XjMCPfio" class="uri">https://www.youtube.com/watch?v=HL-XjMCPfio</a>. And here is one showing a failure: <a href="https://www.youtube.com/watch?v=NeloljCx-1g" class="uri">https://www.youtube.com/watch?v=NeloljCx-1g</a>. In these videos, we see how luck is involved in the process. When at bat, the batter wants to hit the ball hard. If the batter hits it hard enough, it is a HR, the best possible outcome as the batter gets at least one automatic run. But sometimes, due to chance, the batter hits the ball very hard and a defender catches it, resulting in an out. In contrast, sometimes the batter hits the ball softly, but it lands in just the right place. The fact that there is chance involved hints at why probability models will be involved.</p>
 <p>Now, there are several ways to succeed. Understanding this distinction will be important for our analysis. When the batter hits the ball, the batter wants to pass as many <em>bases</em> as possible. There are four bases, with the fourth one called <em>home plate</em>. Home plate is where batters start by trying to hit, so the bases form a cycle.</p>
 <div class="cell" data-layout-align="center" data-hash="multivariate-regression_cache/html/unnamed-chunk-2_7bbe9ee6c5880afb93a446c0f3fe0f39">
 <div class="cell-output-display">
@@ -481,16 +481,16 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 </div>
 <p>(Courtesy of Cburnett<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. CC BY-SA 3.0 license<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.) <!--Source: [Wikipedia Commons](https://commons.wikimedia.org/wiki/File:Baseball_diamond_simplified.svg))--></p>
 <p>A batter who <em>goes around the bases</em> and arrives home, scores a run.</p>
-<p>We are simplifying a bit, but there are five ways a batter can succeed, that is, not make an out:</p>
+<p>We are simplifying a bit, but there are five ways a batter can succeed, meaning not make an out:</p>
 <ul>
-<li>Bases on balls (BB) - the pitcher fails to throw the ball through a predefined area considered to be hittable (the strike zone), so the batter is permitted to go to first base.</li>
-<li>Single - Batter hits the ball and gets to first base.</li>
-<li>Double (2B) - Batter hits the ball and gets to second base.</li>
-<li>Triple (3B) - Batter hits the ball and gets to third base.</li>
-<li>Home Run (HR) - Batter hits the ball and goes all the way home and scores a run.</li>
+<li>Bases on balls (BB): The pitcher fails to throw the ball through a predefined area considered to be hittable (the strike zone), so the batter is permitted to go to first base.</li>
+<li>Single: The batter hits the ball and gets to first base.</li>
+<li>Double (2B): The batter hits the ball and gets to second base.</li>
+<li>Triple (3B): The batter hits the ball and gets to third base.</li>
+<li>Home Run (HR): The batter hits the ball and goes all the way home and scores a run.</li>
 </ul>
 <p>Here is an example of a HR: <a href="https://www.youtube.com/watch?v=xYxSZJ9GZ-w" class="uri">https://www.youtube.com/watch?v=xYxSZJ9GZ-w</a>. If a batter reaches a base, the batter still has a chance of reaching home and scoring a run if the next batter succeeds with a hit. While the batter is <em>on base</em>, the batter can also try to steal a base (SB). If a batter runs fast enough, the batter can try to advance from one base to the next without the other team tagging the runner. Here is an example of a stolen base: <a href="https://www.youtube.com/watch?v=JSE5kfxkzfk" class="uri">https://www.youtube.com/watch?v=JSE5kfxkzfk</a>.</p>
-<p>All these events are tracked throughout the season and are available to us through the <strong>Lahman</strong> package. Now we will start discussing how data analysis can help us decide how to use these statistics to evaluate players.</p>
+<p>All these events are tracked throughout the season and are available to us through the <strong>Lahman</strong> package. Now, we can begin discussing how data analysis can help us determine how to use these statistics to evaluate players.</p>
 </section><section id="no-awards-for-bb" class="level3" data-number="15.1.2"><h3 data-number="15.1.2" class="anchored" data-anchor-id="no-awards-for-bb">
 <span class="header-section-number">15.1.2</span> No awards for BB</h3>
 <p>Historically, the <em>batting average</em> has been considered the most important offensive statistic. To define this average, we define a <em>hit</em> (H) and an <em>at bat</em> (AB). Singles, doubles, triples, and home runs are hits. The fifth way to be successful, BB, is not a hit. An AB is the number of times in which you either get a hit or make an out; BBs are excluded. The batting average is simply H/AB and is considered the main measure of a success rate. Today, this success rate ranges from 20% to 38%. We refer to the batting average in thousands so, for example, if your success rate is 28%, we call it <em>batting 280</em>.</p>
@@ -503,7 +503,7 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 </div>
 </div>
 <p>(Picture courtesy of Keith Allison<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. CC BY-SA 2.0 license<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>.)</p>
-<p>One of Bill James’ first important insights is that the batting average ignores BB, but a BB is a success. Instead of batting average, James proposed the use of the <em>on base percentage</em> (OBP), which he defined as (H+BB)/(AB+BB), or simply the proportion of plate appearances that don’t result in an out, a very intuitive measure. He noted that a player that accumulates many more BB than the average player might go unrecognized if the batter does not excel in batting average. But is this player not helping produce runs? No award is given to the player with the most BB. However, bad habits are hard to break and baseball did not immediately adopt OBP as an important statistic. In contrast, total stolen bases were considered important and an award<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a> given to the player with the most. But players with high totals of SB also made more outs as they did not always succeed. Does a player with high SB total help produce runs? Can we use data science to determine if it’s better to pay for players with high BB or SB?</p>
+<p>One of Bill James’ first important insights is that the batting average ignores BB, but a BB is a success. Instead of batting average, James proposed the use of the <em>on-base percentage</em> (OBP), which he defined as (H+BB)/(AB+BB), or simply the proportion of plate appearances that don’t result in an out, a very intuitive measure. He noted that a player that accumulates many more BB than the average player might go unrecognized if the batter does not excel in batting average. But is this player not helping produce runs? No award is given to the player with the most BB. However, bad habits are hard to break and baseball did not immediately adopt OBP as an important statistic. In contrast, total stolen bases were considered important and an award<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a> given to the player with the most. But players with high totals of SB also made more outs as they did not always succeed. Does a player with high SB total help produce runs? Can we use data science to determine if it’s better to pay for players with high BB or SB?</p>
 </section><section id="base-on-balls-or-stolen-bases" class="level3" data-number="15.1.3"><h3 data-number="15.1.3" class="anchored" data-anchor-id="base-on-balls-or-stolen-bases">
 <span class="header-section-number">15.1.3</span> Base on balls or stolen bases?</h3>
 <p>One of the challenges in this analysis is that it is not obvious how to determine if a player produces runs because so much depends on his teammates. Although we keep track of the number of runs scored by a player, remember that if player X bats right before someone who hits many HRs, batter X will score many runs. Note these runs don’t necessarily happen if we hire player X, but not his HR hitting teammate.</p>
@@ -572,7 +572,7 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 <p>Linear regression will help us parse out the information and quantify the associations. This, in turn, will aid us in determining what players to recruit. Specifically, we will try to predict things like how many more runs will a team score if we increase the number of BBs, but keep the HRs fixed? Regression will help us answer questions like this one.</p>
 </section><section id="regression-applied-to-baseball-statistics" class="level3" data-number="15.1.4"><h3 data-number="15.1.4" class="anchored" data-anchor-id="regression-applied-to-baseball-statistics">
 <span class="header-section-number">15.1.4</span> Regression applied to baseball statistics</h3>
-<p>Can we use regression with these data? First, notice that the HR and Run data, shown above, appear to be bivariate normal. Specifically, the qq-plots confirm that the normal approximation for each HR strata is useful here:</p>
+<p>Can we use regression with these data? First, notice that the HR and Run data, shown above, appear to be bivariate normal. Specifically, the qqplots confirm that the normal approximation for each HR strata is useful here:</p>
 <div class="cell" data-layout-align="center" data-hash="multivariate-regression_cache/html/hr-by-runs-qq_dd8bfaf678872239efb8372529af1d5f">
 <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">dat</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>z_hr <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/Round.html">round</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/scale.html">scale</a></span><span class="op">(</span><span class="va">hr</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span><span class="op">(</span><span class="va">z_hr</span> <span class="op"><a href="https://rdrr.io/r/base/match.html">%in%</a></span> <span class="op">-</span><span class="fl">2</span><span class="op">:</span><span class="fl">3</span><span class="op">)</span> <span class="op">|&gt;</span></span>
@@ -1126,7 +1126,7 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 <p>Are these differences statistically significant? To answer this, we will compute the slopes of the regression line along with their standard errors. Start by using <code>lm</code> and the <strong>broom</strong> package to compute the slopes LSE and the standard errors.</p>
 <p>6. Repeat the exercise above, but compute a confidence interval as well.</p>
 <p>7. Plot the confidence intervals and notice that they overlap, which implies that the data is consistent with the inheritance of height being independent of sex.</p>
-<p>8. Because we are selecting children at random, we can actually do something like a permutation test here. Repeat the computation of correlations 100 times taking a different sample each time. Hint: use similar code to what we used with simulations.</p>
+<p>8. Because we are selecting children at random, we can actually do something like a permutation test here. Repeat the computation of correlations 100 times taking a different sample each time. Hint: Use similar code to what we used with simulations.</p>
 <p>9. Fit a linear regression model to obtain the effects of BB and HR on Runs (at the team level) in 1971. Use the <code>tidy</code> function in the <strong>broom</strong> package to obtain the results in a data frame.</p>
 <p>10. Now let’s repeat the above for each year since 1962 and make a plot. Use <code>summarize</code> and the <strong>broom</strong> package to fit this model for every year since 1962.</p>
 <p>11. Use the results of the previous exercise to plot the estimated effects of BB on runs.</p>
@@ -1138,7 +1138,7 @@ <h1 class="title"><span id="sec-multivariate-regression" class="quarto-section-i
 <p>They called this on-base-percentage plus slugging percentage (OPS). Although the sabermetricians probably did not use regression, here we demonstrate how this metric closely aligns with regression results.</p>
 <p>Compute the OPS for each team in the 2001 season. Then plot Runs per game versus OPS.</p>
 <p>14. For every year since 1962, compute the correlation between runs per game and OPS. Then plot these correlations as a function of year.</p>
-<p>15. Keep in mind that we can rewrite OPS as a weighted average of BBs, singles, doubles, triples, and HRs. We know that the weights for doubles, triples, and HRs are 2, 3, and 4 times that of singles. But what about BB? What is the weight for BB relative to singles? Hint: the weight for BB relative to singles will be a function of AB and PA.</p>
+<p>15. Keep in mind that we can rewrite OPS as a weighted average of BBs, singles, doubles, triples, and HRs. We know that the weights for doubles, triples, and HRs are 2, 3, and 4 times that of singles. But what about BB? What is the weight for BB relative to singles? Hint: The weight for BB relative to singles will be a function of AB and PA.</p>
 <p>16. Consider that the weight for BB, <span class="math inline">\(\frac{\mbox{AB}}{\mbox{PA}}\)</span>, will change from team to team. To assess its variability, compute and plot this quantity for each team for each year since 1962. Then plot it again, but instead of computing it for every team, compute and plot the ratio for the entire year. Then, once you are convinced that there is not much of a time or team trend, report the overall average.</p>
 <p>17. So now we know that the formula for OPS is proportional to <span class="math inline">\(0.91 \times \mbox{BB} + \mbox{singles} + 2 \times \mbox{doubles} + 3 \times \mbox{triples} + 4 \times \mbox{HR}\)</span>. Let’s see how these coefficients compare to those obtained with regression. Fit a regression model to the data after 1962, as done earlier: using per game statistics for each year for each team. After fitting this model, report the coefficients as weights relative to the coefficient for singles.</p>
 <p>18. We see that our linear regression model coefficients follow the same general trend as those used by OPS, but with slightly less weight for metrics other than singles. For each team in years after 1962, compute the OPS, the predicted runs with the regression model, and compute the correlation between the two, as well as the correlation with runs per game.</p>
diff --git a/docs/linear-models/regression.html b/docs/linear-models/regression.html
index 1d7c207..9922312 100644
--- a/docs/linear-models/regression.html
+++ b/docs/linear-models/regression.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -1005,15 +1005,15 @@ <h1 class="title"><span id="sec-regression" class="quarto-section-identifier"><s
 <li><p>Because we assume the standard deviation of the errors is constant, if we plot the absolute value of the residuals, it should appear constant.</p></li>
 </ol>
 <p>We prefer plots rather than summaries based on, for example, correlation because, as noted in Section <span class="citation" data-cites="ascombe">@ascombe</span>, correlation is not always the best summary of association. The function <code>plot</code> applied to an <code>lm</code> object automatically plots these.</p>
-<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-23_1b72ced33c3e63a9963a3dc964e325ab">
+<div class="cell" data-layout-align="center" data-hash="regression_cache/html/diagnostic-plots_17096dd936de8c746d0dbf6ad2621cc6">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="regression_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
+<figure class="figure"><p><img src="regression_files/figure-html/diagnostic-plots-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
 </figure>
 </div>
 </div>
 </div>
-<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-24_5e04dc0409225d10a9ffff749f392f15">
+<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-23_2ef311a7b05242835cdf6c563d8f23e2">
 <div class="sourceCode" id="cb35"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">fit</span>, which <span class="op">=</span> <span class="fl">1</span><span class="op">:</span><span class="fl">3</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>This function can produce six different plots, and the argument <code>which</code> let’s you specify which you want to see. You can learn more by reading the <code>plot.lm</code> help file. However, some of the plots are based on more advanced concepts beyond the scope of this book. To learn more, we recommend an advanced book on regression analysis.</p>
@@ -1079,7 +1079,7 @@ <h1 class="title"><span id="sec-regression" class="quarto-section-identifier"><s
 <p>In fact, the proportion of players that have a lower batting average during their sophomore year is 0.6981132.</p>
 <p>So is it “jitters” or “jinx”? To answer this question, let’s turn our attention to all the players that played the 2013 and 2014 seasons and batted more than 130 times (minimum to win Rookie of the Year).</p>
 <p>The same pattern arises when we look at the top performers: batting averages go down for most of the top performers.</p>
-<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-27_1acec4a1e30b3794a348619d2e1bc5b0">
+<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-26_4b146a2eddea6dfc183af2c2486c1384">
 <div class="cell-output-display">
 <table class="table table-striped table-sm small" data-quarto-postprocess="true">
 <thead><tr class="header">
@@ -1124,7 +1124,7 @@ <h1 class="title"><span id="sec-regression" class="quarto-section-identifier"><s
 </div>
 </div>
 <p>But these are not rookies! Also, look at what happens to the worst performers of 2013:</p>
-<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-28_f44842b02ca56caacefb42cffc1bd3db">
+<div class="cell" data-layout-align="center" data-hash="regression_cache/html/unnamed-chunk-27_cbbf062f4d6d097058772aefddd300c5">
 <div class="cell-output-display">
 <table class="table table-striped table-sm small" data-quarto-postprocess="true">
 <thead><tr class="header">
diff --git a/docs/linear-models/regression_files/figure-html/diagnostic-plots-1.png b/docs/linear-models/regression_files/figure-html/diagnostic-plots-1.png
new file mode 100644
index 0000000..b1b9578
Binary files /dev/null and b/docs/linear-models/regression_files/figure-html/diagnostic-plots-1.png differ
diff --git a/docs/linear-models/treatment-effect-models.html b/docs/linear-models/treatment-effect-models.html
index 6d534c7..e7a8e22 100644
--- a/docs/linear-models/treatment-effect-models.html
+++ b/docs/linear-models/treatment-effect-models.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/linear-models/treatment-effect-models_files/figure-html/lm-diagnostic-plot-1.png b/docs/linear-models/treatment-effect-models_files/figure-html/lm-diagnostic-plot-1.png
new file mode 100644
index 0000000..766ad59
Binary files /dev/null and b/docs/linear-models/treatment-effect-models_files/figure-html/lm-diagnostic-plot-1.png differ
diff --git a/docs/linear-models/treatment-effect-models_files/figure-html/lm-residual-boxplots-1.png b/docs/linear-models/treatment-effect-models_files/figure-html/lm-residual-boxplots-1.png
new file mode 100644
index 0000000..d727a38
Binary files /dev/null and b/docs/linear-models/treatment-effect-models_files/figure-html/lm-residual-boxplots-1.png differ
diff --git a/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-diet-boxplots-1.png b/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-diet-boxplots-1.png
new file mode 100644
index 0000000..46a2687
Binary files /dev/null and b/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-diet-boxplots-1.png differ
diff --git a/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-sex-diet-boxplots-1.png b/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-sex-diet-boxplots-1.png
new file mode 100644
index 0000000..8a107f4
Binary files /dev/null and b/docs/linear-models/treatment-effect-models_files/figure-html/weight-by-sex-diet-boxplots-1.png differ
diff --git a/docs/ml/algorithms.html b/docs/ml/algorithms.html
index feba72e..fffbcfa 100644
--- a/docs/ml/algorithms.html
+++ b/docs/ml/algorithms.html
@@ -62,7 +62,7 @@
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
 <link href="../ml/ml-in-practice.html" rel="next">
-<link href="../ml/cross-validation.html" rel="prev">
+<link href="../ml/resampling-methods.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -496,7 +496,7 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
 </div>
 </section><section id="k-nearest-neighbors" class="level2" data-number="30.2"><h2 data-number="30.2" class="anchored" data-anchor-id="k-nearest-neighbors">
 <span class="header-section-number">30.2</span> k-nearest neighbors</h2>
-<p>We introduced the kNN algorithm in <a href="cross-validation.html#sec-knn-cv-intro"><span>Section&nbsp;29.1</span></a>. In <a href="cross-validation.html#sec-mse-estimates"><span>Section&nbsp;29.7.1</span></a>, we noted that <span class="math inline">\(k=31\)</span> provided the highest accuracy in the test set. Using <span class="math inline">\(k=31\)</span>, we obtain an accuracy 0.825, an improvement over regression. A plot of the estimated conditional probability shows that the kNN estimate is flexible enough and does indeed capture the shape of the true conditional probability.</p>
+<p>We introduced the kNN algorithm in <a href="resampling-methods.html#sec-knn-cv-intro"><span>Section&nbsp;29.1</span></a>. In <a href="resampling-methods.html#sec-mse-estimates"><span>Section&nbsp;29.7.1</span></a>, we noted that <span class="math inline">\(k=31\)</span> provided the highest accuracy in the test set. Using <span class="math inline">\(k=31\)</span>, we obtain an accuracy 0.825, an improvement over regression. A plot of the estimated conditional probability shows that the kNN estimate is flexible enough and does indeed capture the shape of the true conditional probability.</p>
 <div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/best-knn-fit_4b99f8177f3a74903194c35349ad394a">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
@@ -1097,14 +1097,10 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
 <p>Compare the accuracy of linear regression and logistic regression.</p>
 <p>10. Repeat the simulation from exercise 1 100 times and compare the average accuracy for each method and notice they give practically the same answer.</p>
 <p>11. Generate 25 different datasets changing the difference between the two class: <code>delta &lt;- seq(0, 3, len = 25)</code>. Plot accuracy versus <code>delta</code>.</p>
-<p>12. If we add 1s to our 2 or 7 examples, we get data that looks like this:</p>
-<div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-37_c29e4c05334d7362ef6e14bcb6c6f3f5">
-<div class="cell-output-display">
-<div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="algorithms_files/figure-html/unnamed-chunk-37-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
-</figure>
-</div>
-</div>
+<p>12. We cam see what the data looks like if we add 1s to our 2 or 7 examples using this code:</p>
+<div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-37_e60c28f3b70b7e93a06a518c558162a7">
+<div class="sourceCode" id="cb36"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
+<span><span class="va">mnist_127</span><span class="op">$</span><span class="va">train</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span><span class="va">x_1</span>, <span class="va">x_2</span>, color <span class="op">=</span> <span class="va">y</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_point.html">geom_point</a></span><span class="op">(</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Fit QDA using the <code>qda</code> function in the <strong>MASS</strong> package the create a confusion matrix for predictions on the test. Which of the following best describes the confusion matrix:</p>
 <p>a. It is a two-by-two table. b. Because we have three classes, it is a two-by-three table. c. Because we have three classes, it is a three-by-three table. d. Confusion matrices only make sense when the outcomes are binary.</p>
@@ -1114,7 +1110,7 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
 <li>Create a grid of <code>x_1</code> and <code>x_2</code> using:</li>
 </ol>
 <div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-38_ce5f9cbc864d21d5148e93da2ba0c2f5">
-<div class="sourceCode" id="cb36"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">GS</span> <span class="op">&lt;-</span> <span class="fl">150</span></span>
+<div class="sourceCode" id="cb37"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">GS</span> <span class="op">&lt;-</span> <span class="fl">150</span></span>
 <span><span class="va">new_x</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">mnist_127</span><span class="op">$</span><span class="va">train</span>,</span>
 <span>  <span class="fu"><a href="https://rdrr.io/r/base/expand.grid.html">expand.grid</a></span><span class="op">(</span>x_1 <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">min</a></span><span class="op">(</span><span class="va">x_1</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">max</a></span><span class="op">(</span><span class="va">x_1</span><span class="op">)</span>, len <span class="op">=</span> <span class="va">GS</span><span class="op">)</span>,</span>
 <span>              x_2 <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">min</a></span><span class="op">(</span><span class="va">x_2</span><span class="op">)</span>, <span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">max</a></span><span class="op">(</span><span class="va">x_2</span><span class="op">)</span>, len <span class="op">=</span> <span class="va">GS</span><span class="op">)</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -1134,7 +1130,7 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
 <p>17. Earlier we used logistic regression to predict sex from height. Use kNN to do the same. Use the code described in this chapter to select the <span class="math inline">\(F_1\)</span> measure and plot it against <span class="math inline">\(k\)</span>. Compare to the <span class="math inline">\(F_1\)</span> of about 0.6 we obtained with regression.</p>
 <p>18. Create a simple dataset where the outcome grows 0.75 units on average for every increase in a predictor:</p>
 <div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-39_ad38a41d13db288d09f68d7983ebda69">
-<div class="sourceCode" id="cb37"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fl">1000</span></span>
+<div class="sourceCode" id="cb38"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fl">1000</span></span>
 <span><span class="va">sigma</span> <span class="op">&lt;-</span> <span class="fl">0.25</span></span>
 <span><span class="va">x</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="va">n</span>, <span class="fl">0</span>, <span class="fl">1</span><span class="op">)</span></span>
 <span><span class="va">y</span> <span class="op">&lt;-</span> <span class="fl">0.75</span> <span class="op">*</span> <span class="va">x</span> <span class="op">+</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="va">n</span>, <span class="fl">0</span>, <span class="va">sigma</span><span class="op">)</span></span>
@@ -1148,38 +1144,14 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
 <p>23. It seems that the default values for the random forest result in an estimate that is too flexible (not smooth). Re-run the random forest but this time with <code>nodesize</code> set at 50 and <code>maxnodes</code> set at 25. Remake the plot.</p>
 <p>24. This **dslabs* dataset includes the <code>tissue_gene_expression</code> with a matrix <code>x</code>:</p>
 <div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-40_6ad6c090ec48fc049b99173ae87fe2f2">
-<div class="sourceCode" id="cb38"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
+<div class="sourceCode" id="cb39"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/dim.html">dim</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">x</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>with the gene expression measured on 500 genes for 189 biological samples representing seven different tissues. The tissue type is stored in <code>tissue_gene_expression$y</code>.</p>
 <div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-41_39d488a980e2b68dc327b21910d97eea">
-<div class="sourceCode" id="cb39"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/table.html">table</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">y</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode" id="cb40"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/table.html">table</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">y</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Fit a random forest using the <code>randomForest</code> function in the package <strong>randomForest</strong>. Then use the <code>varImp</code> function to see which are the top 10 most predictive genes. Make a histogram of the reported importance to get an idea of the distribution of the importance values.</p>
-<div class="cell" data-layout-align="center" data-hash="algorithms_cache/html/unnamed-chunk-42_879bd2d158d75df65c1cf3e0e6ebab49">
-<div class="sourceCode" id="cb40"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://www.stat.berkeley.edu/~breiman/RandomForests/">randomForest</a></span><span class="op">)</span></span>
-<span><span class="va">fit_rf</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span>, <span class="fu"><a href="https://rdrr.io/pkg/randomForest/man/randomForest.html">randomForest</a></span><span class="op">(</span><span class="va">x</span>, <span class="va">y</span><span class="op">)</span><span class="op">)</span></span>
-<span><span class="va">vi</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/varImp.html">varImp</a></span><span class="op">(</span><span class="va">fit_rf</span><span class="op">)</span> </span>
-<span><span class="va">vi</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/top_n.html">top_n</a></span><span class="op">(</span><span class="fl">10</span>, <span class="va">Overall</span><span class="op">)</span> <span class="op">|&gt;</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange</a></span><span class="op">(</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/desc.html">desc</a></span><span class="op">(</span><span class="va">Overall</span><span class="op">)</span><span class="op">)</span></span>
-<span><span class="co">#&gt;          Overall</span></span>
-<span><span class="co">#&gt; GPA33       3.61</span></span>
-<span><span class="co">#&gt; KIF2C       2.65</span></span>
-<span><span class="co">#&gt; CLIP3       2.55</span></span>
-<span><span class="co">#&gt; RARRES2     2.29</span></span>
-<span><span class="co">#&gt; COLGALT2    2.27</span></span>
-<span><span class="co">#&gt; LRRN3       2.26</span></span>
-<span><span class="co">#&gt; CEP55       2.21</span></span>
-<span><span class="co">#&gt; GTF2IRD1    2.15</span></span>
-<span><span class="co">#&gt; KCTD2       2.12</span></span>
-<span><span class="co">#&gt; LTBR        2.01</span></span>
-<span><span class="fu"><a href="https://rdrr.io/r/graphics/hist.html">hist</a></span><span class="op">(</span><span class="va">vi</span><span class="op">$</span><span class="va">Overall</span>, breaks <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/seq.html">seq</a></span><span class="op">(</span><span class="fl">0</span>,<span class="fl">4</span>,<span class="fl">0.1</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output-display">
-<div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="algorithms_files/figure-html/unnamed-chunk-42-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
-</figure>
-</div>
-</div>
-</div>
 
 
 </section><section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes"><hr>
@@ -1420,7 +1392,7 @@ <h1 class="title"><span id="sec-example-alogirhms" class="quarto-section-identif
   }
 });
 </script><nav class="page-navigation"><div class="nav-page nav-page-previous">
-      <a href="../ml/cross-validation.html" class="pagination-link">
+      <a href="../ml/resampling-methods.html" class="pagination-link">
         <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span>
       </a>          
   </div>
diff --git a/docs/ml/clustering.html b/docs/ml/clustering.html
index 4688976..3cfd938 100644
--- a/docs/ml/clustering.html
+++ b/docs/ml/clustering.html
@@ -372,7 +372,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/ml/conditionals.html b/docs/ml/conditionals.html
index aa5a973..25f6c40 100644
--- a/docs/ml/conditionals.html
+++ b/docs/ml/conditionals.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/ml/cross-validation_files/figure-html/accuracy-vs-k-knn-1.png b/docs/ml/cross-validation_files/figure-html/accuracy-vs-k-knn-1.png
deleted file mode 100644
index 419e17b..0000000
Binary files a/docs/ml/cross-validation_files/figure-html/accuracy-vs-k-knn-1.png and /dev/null differ
diff --git a/docs/ml/evaluation-metrics.html b/docs/ml/evaluation-metrics.html
index 9348a6f..4b4eca2 100644
--- a/docs/ml/evaluation-metrics.html
+++ b/docs/ml/evaluation-metrics.html
@@ -374,7 +374,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -717,10 +717,10 @@ <h1 class="title"><span id="sec-evaluation-metrics" class="quarto-section-identi
 <p>The doctor shares data with about 1/2 cases and 1/2 controls and some predictors. You then develop an algorithm with TPR=0.99 and TNR = 0.99. You are excited to explain to the doctor that this means that if a patient has the disease, the algorithm is very likely to predict correctly. The doctor is not impressed and explains that your TNR is too low for this algorithm to be used in practice. This is because this is a rare disease with a prevalence in the general population of 0.5%. The doctor reminds you of Bayes formula:</p>
 <p><span class="math display">\[ \mbox{Pr}(Y = 1\mid \hat{Y}=1) = \mbox{Pr}(\hat{Y}=1 \mid Y=1) \frac{\mbox{Pr}(Y=1)}{\mbox{Pr}(\hat{Y}=1)} \implies \text{Precision} = \text{TPR} \times \frac{\text{Prevalence}}{\text{TPR}\times \text{Prevalence} + \text{FPR}\times(1-\text{Prevalence})} \approx 0.33  \]</span></p>
 <p>Here is plot of precision as a function of prevalence with TPR and TNR are 95%:</p>
-<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/unnamed-chunk-24_e0963f93c2675f288ac94fb57e689b17">
+<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/precision-vs-prevalence_da49ae53499badc981313bc8bd55fec3">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="evaluation-metrics_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="evaluation-metrics_files/figure-html/precision-vs-prevalence-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -729,7 +729,7 @@ <h1 class="title"><span id="sec-evaluation-metrics" class="quarto-section-identi
 </section><section id="roc-and-precision-recall-curves" class="level2" data-number="26.7"><h2 data-number="26.7" class="anchored" data-anchor-id="roc-and-precision-recall-curves">
 <span class="header-section-number">26.7</span> ROC and precision-recall curves</h2>
 <p>When comparing the two methods (guessing versus using a height cutoff), we looked at accuracy and <span class="math inline">\(F_1\)</span>. The second method clearly outperformed the first. However, while we considered several cutoffs for the second method, for the first we only considered one approach: guessing with equal probability. Be aware that guessing <code>Male</code> with higher probability would give us higher accuracy due to the bias in the sample:</p>
-<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/unnamed-chunk-25_5a4498dbc69e22a176102fdf724d5a5f">
+<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/unnamed-chunk-24_7b92bfb61e33ea3dfc793b09f2ca0f33">
 <div class="sourceCode" id="cb23"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">p</span> <span class="op">&lt;-</span> <span class="fl">0.9</span></span>
 <span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">test_index</span><span class="op">)</span></span>
 <span><span class="va">y_hat</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/sample.html">sample</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Male"</span>, <span class="st">"Female"</span><span class="op">)</span>, <span class="va">n</span>, replace <span class="op">=</span> <span class="cn">TRUE</span>, prob <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="va">p</span>, <span class="fl">1</span> <span class="op">-</span> <span class="va">p</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span> </span>
@@ -798,12 +798,12 @@ <h1 class="title"><span id="sec-evaluation-metrics" class="quarto-section-identi
 </div>
 </div>
 </div>
-<p>However, the estimate <span class="math inline">\(\hat{\text{MSE}}\)</span> is a random variable. In fact, <span class="math inline">\(\text{MSE}\)</span> and <span class="math inline">\(\hat{\text{MSE}}\)</span> are often referred to as the true error and apparent error, respectively. Due to the complexity of some machine learning, it is difficult to derive the statistical properties of how well the apparent error estimates the true error. In <a href="cross-validation.html"><span>Chapter&nbsp;29</span></a>, we introduce cross-validation an approach to estimating the MSE.</p>
+<p>However, the estimate <span class="math inline">\(\hat{\text{MSE}}\)</span> is a random variable. In fact, <span class="math inline">\(\text{MSE}\)</span> and <span class="math inline">\(\hat{\text{MSE}}\)</span> are often referred to as the true error and apparent error, respectively. Due to the complexity of some machine learning, it is difficult to derive the statistical properties of how well the apparent error estimates the true error. In <a href="resampling-methods.html"><span>Chapter&nbsp;29</span></a>, we introduce cross-validation an approach to estimating the MSE.</p>
 <p>We end this chapter by pointing out that there are loss functions other than the squared loss. For example, the <em>Mean Absolute Error</em> uses absolute values, <span class="math inline">\(|\hat{Y}_i - Y_i|\)</span> instead of squaring the errors <span class="math inline">\((\hat{Y}_i - Y_i)^2\)</span>. However, in this book we focus on minimizing square loss since it is the most widely used.</p>
 </section><section id="exercises" class="level2" data-number="26.9"><h2 data-number="26.9" class="anchored" data-anchor-id="exercises">
 <span class="header-section-number">26.9</span> Exercises</h2>
 <p>The <code>reported_height</code> and <code>height</code> datasets were collected from three classes taught in the Departments of Computer Science and Biostatistics, as well as remotely through the Extension School. The Biostatistics class was taught in 2016 along with an online version offered by the Extension School. On 2016-01-25 at 8:15 AM, during one of the lectures, the instructors asked students to fill in the sex and height questionnaire that populated the <code>reported_height</code> dataset. The online students filled the survey during the next few days, after the lecture was posted online. We can use this insight to define a variable, call it <code>type</code>, to denote the type of student: <code>inclass</code> or <code>online</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/unnamed-chunk-27_cd75decebddf1313a274c81dedf7a888">
+<div class="cell" data-layout-align="center" data-hash="evaluation-metrics_cache/html/unnamed-chunk-26_a692b3cbfd6464fe9f9430b0ba440bec">
 <div class="sourceCode" id="cb25"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://lubridate.tidyverse.org">lubridate</a></span><span class="op">)</span></span>
 <span><span class="va">dat</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span><span class="va">reported_heights</span>, date_time <span class="op">=</span> <span class="fu"><a href="https://lubridate.tidyverse.org/reference/ymd_hms.html">ymd_hms</a></span><span class="op">(</span><span class="va">time_stamp</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span><span class="op">(</span><span class="va">date_time</span> <span class="op">&gt;=</span> <span class="fu"><a href="https://lubridate.tidyverse.org/reference/make_datetime.html">make_date</a></span><span class="op">(</span><span class="fl">2016</span>, <span class="fl">01</span>, <span class="fl">25</span><span class="op">)</span> <span class="op">&amp;</span> </span>
diff --git a/docs/ml/intro-ml.html b/docs/ml/intro-ml.html
index 65f1037..ba61918 100644
--- a/docs/ml/intro-ml.html
+++ b/docs/ml/intro-ml.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/ml/ml-in-practice.html b/docs/ml/ml-in-practice.html
index 97e0e77..3db07c0 100644
--- a/docs/ml/ml-in-practice.html
+++ b/docs/ml/ml-in-practice.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -768,11 +768,11 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 </div>
 <p>For random forest, we can also speed up the training step by running less trees per fit. After running the algorithm once, we can use the plot function to see how the error rate changes as the number of trees grows.</p>
 <p>Here we can see that error rate stabilizes after about 200 trees:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-38_121fe9c6494da940ee956c650937192f">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/error-versus-number-of-trees_c3dbdd33d9ce465d12aeac6a65abed12">
 <div class="sourceCode" id="cb40"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/graphics/plot.default.html">plot</a></span><span class="op">(</span><span class="va">fit_rf</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="ml-in-practice_files/figure-html/unnamed-chunk-38-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="ml-in-practice_files/figure-html/error-versus-number-of-trees-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -781,11 +781,11 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 </section><section id="variable-importance" class="level2" data-number="31.6"><h2 data-number="31.6" class="anchored" data-anchor-id="variable-importance">
 <span class="header-section-number">31.6</span> Variable importance</h2>
 <p>The following function computes the importance of each feature:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-39_bf00918359116727683475b65982a144">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-38_1c9088636bd1bfcbfb2bad4dfd057f46">
 <div class="sourceCode" id="cb41"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">imp</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/randomForest/man/importance.html">importance</a></span><span class="op">(</span><span class="va">fit_rf</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can see which features are being used most by plotting an image:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-40_49397e287dc2567c5e464b4bf7dbcc4c">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-39_b8bb34688fefaf9981c9c365ae9e26d1">
 <div class="sourceCode" id="cb42"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">mat</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/rep.html">rep</a></span><span class="op">(</span><span class="fl">0</span>, <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">ncol</a></span><span class="op">(</span><span class="va">x</span><span class="op">)</span><span class="op">)</span></span>
 <span><span class="va">mat</span><span class="op">[</span><span class="va">col_index</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="va">imp</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/graphics/image.html">image</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="va">mat</span>, <span class="fl">28</span>, <span class="fl">28</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -815,7 +815,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>The idea of an ensemble is similar to the idea of combining data from different pollsters to obtain a better estimate of the true support for each candidate.</p>
 <p>In machine learning, one can usually greatly improve the final results by combining the results of different algorithms.</p>
 <p>Here is a simple example where we compute new class probabilities by taking the average of random forest and kNN. We can see that the accuracy improves:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-41_b9ad0edbf0a4183902c231fff5d07650">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-40_38c50a82eefb03eeee2280ece2f7bc19">
 <div class="sourceCode" id="cb43"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">p_rf</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">fit_rf</span>, <span class="va">x_test</span><span class="op">[</span>,<span class="va">col_index</span><span class="op">]</span>, type <span class="op">=</span> <span class="st">"prob"</span><span class="op">)</span>  </span>
 <span><span class="va">p_rf</span> <span class="op">&lt;-</span> <span class="va">p_rf</span> <span class="op">/</span> <span class="fu"><a href="https://rdrr.io/r/base/colSums.html">rowSums</a></span><span class="op">(</span><span class="va">p_rf</span><span class="op">)</span></span>
 <span><span class="va">p_knn_pca</span>  <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">fit_knn_pca</span>, <span class="va">newdata</span><span class="op">)</span></span>
@@ -830,18 +830,18 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <span class="header-section-number">31.9</span> Exercises</h2>
 <p>1. In the exercises in <a href="algorithms.html"><span>Chapter&nbsp;30</span></a> we saw that changing <code>maxnodes</code> or <code>nodesize</code> in the <code>randomForest</code> function improved our estimate. Let’s use the <code>train</code> function to help us pick these values. From the <strong>caret</strong> manual we see that we can’t tune the <code>maxnodes</code> parameter or the <code>nodesize</code> argument with <code>randomForest</code>, so we will use the <strong>Rborist</strong> package and tune the <code>minNode</code> argument. Use the <code>train</code> function to try values <code>minNode &lt;- seq(5, 250, 25)</code>. See which value minimizes the estimated RMSE.</p>
 <p>2. This **dslabs* dataset includes a matrix <code>x</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-42_c2253b9d705297be37e99671d90a11d7">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-41_962fcb10bedec3fb7d76f7dc6125e359">
 <div class="sourceCode" id="cb44"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/base/dim.html">dim</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">x</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>with the gene expression measured on 500 genes for 189 biological samples representing seven different tissues. The tissue type is stored in <code>y</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-43_12ddadbb6898afaeeb65c55b7b8dcf99">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-42_1e1fa7362c89135b77a47007ca0c86ec">
 <div class="sourceCode" id="cb45"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/table.html">table</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">y</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Split the data in training and test sets, then use kNN to predict tissue type and see what accuracy you obtain. Try it for <span class="math inline">\(k = 1, 3, \dots, 11\)</span>.</p>
 <p>3. We are going to apply LDA and QDA to the <code>tissue_gene_expression</code> dataset. We will start with simple examples based on this dataset and then develop a realistic example.</p>
 <p>Create a dataset with just the classes <code>cerebellum</code> and <code>hippocampus</code> (two parts of the brain) and a predictor matrix with 10 randomly selected columns. Estimate the accuracy of LDA.</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-44_d105678275d3cc80ffd0f58620653bdb">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-43_73508bf9599eb65f639d16c4d0206635">
 <div class="sourceCode" id="cb46"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1993</span><span class="op">)</span></span>
 <span><span class="va">tissues</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"cerebellum"</span>, <span class="st">"hippocampus"</span><span class="op">)</span></span>
 <span><span class="va">ind</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/which.html">which</a></span><span class="op">(</span><span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">y</span> <span class="op"><a href="https://rdrr.io/r/base/match.html">%in%</a></span> <span class="va">tissues</span><span class="op">)</span></span>
@@ -855,7 +855,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>7. One thing we see in the previous plot is that the value of predictors correlate in both groups: some predictors are low in both groups while others are high in both groups. The mean value of each predictor, <code>colMeans(x)</code>, is not informative or useful for prediction and often, for interpretation purposes, it is useful to center or scale each column. This can be achieved with the <code>preProcessing</code> argument in <code>train</code>. Re-run LDA with <code>preProcessing = "scale"</code>. Note that accuracy does not change but see how it is easier to identify the predictors that differ more between groups in the plot made in exercise 4.</p>
 <p>8. In the previous exercises, we saw that both approaches worked well. Plot the predictor values for the two genes with the largest differences between the two groups in a scatterplot to see how they appear to follow a bivariate distribution as assumed by the LDA and QDA approaches. Color the points by the outcome.</p>
 <p>9. Now we are going to increase the complexity of the challenge slightly: we will consider all the tissue types.</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-45_d63f5c0f64a176ed60f9545330965172">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-44_430da0f50908a8ba52b4a3573471f97b">
 <div class="sourceCode" id="cb47"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1993</span><span class="op">)</span></span>
 <span><span class="va">y</span> <span class="op">&lt;-</span> <span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">y</span></span>
 <span><span class="va">x</span> <span class="op">&lt;-</span> <span class="va">tissue_gene_expression</span><span class="op">$</span><span class="va">x</span></span>
@@ -872,7 +872,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>17. We can see that with just six genes, we are able to predict the tissue type. Now let’s see if we can do even better with a random forest. Use the <code>train</code> function and the <code>rf</code> method to train a random forest. Try out values of <code>mtry</code> ranging from, at least, <code>seq(50, 200, 25)</code>. What <code>mtry</code> value maximizes accuracy? To permit small <code>nodesize</code> to grow as we did with the classification trees, use the following argument: <code>nodesize = 1</code>. This will take several seconds to run. If you want to test it out, try using smaller values with <code>ntree</code>. Set the seed to 1990.</p>
 <p>18. Use the function <code>varImp</code> on the output of <code>train</code> and save it to an object called <code>imp</code>.</p>
 <p>19. The <code>rpart</code> model we ran above produced a tree that used just six predictors. Extracting the predictor names is not straightforward, but can be done. If the output of the call to train was <code>fit_rpart</code>, we can extract the names like this:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-46_06cd3ba436cccd8e839849ac6c18e2c1">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-45_57bb3f0b3f411d5d8b70ce5c34c9c841">
 <div class="sourceCode" id="cb48"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">ind</span> <span class="op">&lt;-</span> <span class="op">!</span><span class="op">(</span><span class="va">fit_rpart</span><span class="op">$</span><span class="va">finalModel</span><span class="op">$</span><span class="va">frame</span><span class="op">$</span><span class="va">var</span> <span class="op">==</span> <span class="st">"&lt;leaf&gt;"</span><span class="op">)</span></span>
 <span><span class="va">tree_terms</span> <span class="op">&lt;-</span> </span>
 <span>  <span class="va">fit_rpart</span><span class="op">$</span><span class="va">finalModel</span><span class="op">$</span><span class="va">frame</span><span class="op">$</span><span class="va">var</span><span class="op">[</span><span class="va">ind</span><span class="op">]</span> <span class="op">|&gt;</span></span>
@@ -883,7 +883,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>What is the variable importance in the random forest call for these predictors? Where do they rank?</p>
 <p>20. Extract the top 50 predictors based on importance, take a subset of <code>x</code> with just these predictors and apply the function <code>heatmap</code> to see how these genes behave across the tissues. We will introduce the <code>heatmap</code> function in <a href="clustering.html"><span>Chapter&nbsp;32</span></a>.</p>
 <p>21. Previously, we compared the conditional probability <span class="math inline">\(p(\mathbf{x})\)</span> give two predictors <span class="math inline">\(\mathbf{x} = (x_1, x_2)^\top\)</span> to the fit <span class="math inline">\(\hat{p}(\mathbf{x})\)</span> obtained with a machine learning algorithm by making image plots. The following code can be used to make these images and include a curve at the values of <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> for which the function is <span class="math inline">\(0.5\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-47_f3c8678c068060c9875ca48f5432e6a1">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-46_f942af32d2aa82377017395f332a08a5">
 <div class="sourceCode" id="cb49"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">plot_cond_prob</span> <span class="op">&lt;-</span> <span class="kw">function</span><span class="op">(</span><span class="va">x_1</span>, <span class="va">x_2</span>, <span class="va">p</span><span class="op">)</span><span class="op">{</span></span>
 <span>  <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>x_1 <span class="op">=</span> <span class="va">x_1</span>, x_2 <span class="op">=</span> <span class="va">x_2</span>, p <span class="op">=</span> <span class="va">p</span><span class="op">)</span> <span class="op">|&gt;</span></span>
 <span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span><span class="va">x_1</span>, <span class="va">x_2</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span></span>
@@ -893,12 +893,12 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <span><span class="op">}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can see the true conditional probability for the 2 or 7 example like this:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-48_9e1c9609929d9f7d016d482584d5d96b">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-47_2f8860122f1634610295fa74463894db">
 <div class="sourceCode" id="cb50"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">mnist_27</span><span class="op">$</span><span class="va">true_p</span>, <span class="fu">plot_cond_prob</span><span class="op">(</span><span class="va">x_1</span>, <span class="va">x_2</span>, <span class="va">p</span><span class="op">)</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>Fit a kNN model and make this plot for the estimated conditional probability. Hint: Use the argument <code>newdata = mnist27$train</code> to obtain predictions for a grid points.</p>
 <p>22. Notice that, in the plot made in exercise 1, the boundary is somewhat wiggly. This is because kNN, like the basic bin smoother, does not use a kernel. To improve this we could try loess. By reading through the available models part of the <strong>caret</strong> manual, we see that we can use the <code>gamLoess</code> method. We need to install the <strong>gam</strong> package, if we have not done so already. We see that we have two parameters to optimize:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-49_9be1c1741b6c0581a021e110ae5d409b">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-48_39267f545710af73f3ee829176fda8fd">
 <div class="sourceCode" id="cb51"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/modelLookup.html">modelLookup</a></span><span class="op">(</span><span class="st">"gamLoess"</span><span class="op">)</span></span>
 <span><span class="co">#&gt;      model parameter  label forReg forClass probModel</span></span>
 <span><span class="co">#&gt; 1 gamLoess      span   Span   TRUE     TRUE      TRUE</span></span>
@@ -907,7 +907,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>Use cross-validation to pick a span between 0.15 and 0.75. Keep <code>degree = 1</code>. What span does cross-validation select?</p>
 <p>23. Show an image plot of the estimate <span class="math inline">\(\hat{p}(x,y)\)</span> resulting from the model fit in the exercise 2. How does the accuracy compare to that of kNN? Comment on the difference between the estimate obtained with kNN.</p>
 <p>24. Use the <code>mnist_27</code> training set to build a model with several of the models available from the <strong>caret</strong> package. For example, you can try these:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-50_bee7761a0704d6a927a20270bab3d932">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-49_f7a0a41341c10f41394b6d8f3be3036a">
 <div class="sourceCode" id="cb52"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">models</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"glm"</span>, <span class="st">"lda"</span>,  <span class="st">"naive_bayes"</span>,  <span class="st">"svmLinear"</span>, <span class="st">"gamboost"</span>,  </span>
 <span>            <span class="st">"gamLoess"</span>, <span class="st">"qda"</span>, <span class="st">"knn"</span>, <span class="st">"kknn"</span>, <span class="st">"loclda"</span>, <span class="st">"gam"</span>, <span class="st">"rf"</span>, </span>
 <span>            <span class="st">"ranger"</span>,<span class="st">"wsrf"</span>, <span class="st">"Rborist"</span>, <span class="st">"avNNet"</span>, <span class="st">"mlp"</span>, <span class="st">"monmlp"</span>, <span class="st">"gbm"</span>, </span>
@@ -922,7 +922,7 @@ <h1 class="title"><span id="sec-ml-in-practice" class="quarto-section-identifier
 <p>30. Now let’s only consider the methods with an estimated accuracy of 0.8 when constructing the ensemble. What is the accuracy now?</p>
 <p>31. Note that if two machine algorithms methods predict the same outcome, ensembling them will not change the prediction. For each pair of algorithms compare the percent of observations for which they make the same prediction. Use this to define a function and then use the <code>heatmap</code> function to visualize the results. Hint: use the <code>method = "binary"</code> argument in the <code>dist</code> function.</p>
 <p>32. Note that each method can also produce an estimated conditional probability. Instead of majority vote, we can take the average of these estimated conditional probabilities. For most methods, we can the use the <code>type = "prob"</code> in the train function. Note that some of the methods require you to use the argument <code>trControl=trainControl(classProbs=TRUE)</code> when calling train. Also, these methods do not work if classes have numbers as names. Hint: change the levels like this:</p>
-<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-51_6e5197638f8fc07198f36be198939f3a">
+<div class="cell" data-layout-align="center" data-hash="ml-in-practice_cache/html/unnamed-chunk-50_c9897365354e61e3b686a376cd4180cb">
 <div class="sourceCode" id="cb53"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">dat</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">y</span> <span class="op">&lt;-</span> <span class="fu">recode_factor</span><span class="op">(</span><span class="va">dat</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">y</span>, <span class="st">"2"</span><span class="op">=</span><span class="st">"two"</span>, <span class="st">"7"</span><span class="op">=</span><span class="st">"seven"</span><span class="op">)</span></span>
 <span><span class="va">dat</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span> <span class="op">&lt;-</span> <span class="fu">recode_factor</span><span class="op">(</span><span class="va">dat</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span>, <span class="st">"2"</span><span class="op">=</span><span class="st">"two"</span>, <span class="st">"7"</span><span class="op">=</span><span class="st">"seven"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
diff --git a/docs/ml/ml-in-practice_files/figure-html/error-versus-number-of-trees-1.png b/docs/ml/ml-in-practice_files/figure-html/error-versus-number-of-trees-1.png
new file mode 100644
index 0000000..1949e54
Binary files /dev/null and b/docs/ml/ml-in-practice_files/figure-html/error-versus-number-of-trees-1.png differ
diff --git a/docs/ml/notation-and-terminology.html b/docs/ml/notation-and-terminology.html
index c77e287..a8a9271 100644
--- a/docs/ml/notation-and-terminology.html
+++ b/docs/ml/notation-and-terminology.html
@@ -339,7 +339,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/ml/cross-validation.html b/docs/ml/resampling-methods.html
similarity index 88%
rename from docs/ml/cross-validation.html
rename to docs/ml/resampling-methods.html
index 8dabd86..2c16945 100644
--- a/docs/ml/cross-validation.html
+++ b/docs/ml/resampling-methods.html
@@ -98,7 +98,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../ml/intro-ml.html">Machine Learning</a></li><li class="breadcrumb-item"><a href="../ml/cross-validation.html"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../ml/intro-ml.html">Machine Learning</a></li><li class="breadcrumb-item"><a href="../ml/resampling-methods.html"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="Search" onclick="window.quartoOpenSearch();">
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link active">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link active">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -425,7 +425,7 @@
   </ul>
 </li>
   <li><a href="#exercises" id="toc-exercises" class="nav-link" data-scroll-target="#exercises"><span class="header-section-number">29.8</span> Exercises</a></li>
-  </ul><div class="toc-actions"><div><i class="bi bi-github"></i></div><div class="action-links"><p><a href="https://github.com/rafalab/dsbook-part-2/blob/main/ml/cross-validation.qmd" class="toc-action">View source</a></p><p><a href="https://github.com/rafalab/dsbook-part-2/issues/new" class="toc-action">Report an issue</a></p></div></div></nav>
+  </ul><div class="toc-actions"><div><i class="bi bi-github"></i></div><div class="action-links"><p><a href="https://github.com/rafalab/dsbook-part-2/blob/main/ml/resampling-methods.qmd" class="toc-action">View source</a></p><p><a href="https://github.com/rafalab/dsbook-part-2/issues/new" class="toc-action">Report an issue</a></p></div></div></nav>
     </div>
 <!-- main -->
 <main class="content" id="quarto-document-content"><header id="title-block-header" class="quarto-title-block default"><div class="quarto-title">
@@ -453,34 +453,34 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <p>With k-nearest neighbors (kNN) we estimate <span class="math inline">\(p(\mathbf{x})\)</span> in a similar way to bin smoothing. First, we define the distance between all observations based on the features. Then, for any point <span class="math inline">\(\mathbf{x}_0\)</span>, we estimate <span class="math inline">\(p(\mathbf{x})\)</span> by identifying the <span class="math inline">\(k\)</span> nearest points to <span class="math inline">\(mathbf{x}_0\)</span> and afterwards taking an average of the <span class="math inline">\(y\)</span>s associated with these points. We refer to the set of points used to compute the average as the <em>neighborhood</em>.</p>
 <p>Due to the connection we described earlier between conditional expectations and conditional probabilities, this gives us <span class="math inline">\(\hat{p}(\mathbf{x}_0)\)</span>, just like the bin smoother gave us an estimate of a trend. As with bin smoothers, we can control the flexibility of our estimate through the <span class="math inline">\(k\)</span> parameter: larger <span class="math inline">\(k\)</span>s result in smoother estimates, while smaller <span class="math inline">\(k\)</span>s result in more flexible and wiggly estimates.</p>
 <p>To implement the algorithm, we can use the <code>knn3</code> function from the <strong>caret</strong> package. Looking at the help file for this package, we see that we can call it in one of two ways. We will use the first way in which we specify a <em>formula</em> and a data frame. The data frame contains all the data to be used. The formula has the form <code>outcome ~ predictor_1 + predictor_2 + predictor_3</code> and so on. Therefore, we type <code>y ~ x_1 + x_2</code>. If we are going to use variables in the data frame, we can use the <code>.</code> like this <code>y ~ .</code>. We also need to pick <span class="math inline">\(k\)</span>, which is set to <code>k = 5</code> by default. The final call looks like this:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-1_c29f995814a575beb1f0f1b4286f92cd">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-1_89bc5beef7d140e0512dc1aa3ca64cd6">
 <div class="sourceCode" id="cb1"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">dslabs</span><span class="op">)</span></span>
 <span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://github.com/topepo/caret/">caret</a></span><span class="op">)</span></span>
 <span><span class="va">knn_fit</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/knn3.html">knn3</a></span><span class="op">(</span><span class="va">y</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, k <span class="op">=</span> <span class="fl">5</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>In this case, since our dataset is balanced and we care just as much about sensitivity as we do about specificity, we will use accuracy to quantify performance.</p>
 <p>The <code>predict</code> function for <code>knn3</code> produces a probability for each class. We can keep the probability of being a 7 as the estimate <span class="math inline">\(\hat{p}(\mathbf{x})\)</span> using <code>type = "prob"</code>. Here we obtain the actual prediction using <code>type = "class"</code>:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-2_b90a7e7cb975dd5c24af1266e3901301">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-2_c3d0cd4e044d8058aa043b9bb6753b96">
 <div class="sourceCode" id="cb2"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y_hat_knn</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[</span><span class="st">"Accuracy"</span><span class="op">]</span></span>
 <span><span class="co">#&gt; Accuracy </span></span>
 <span><span class="co">#&gt;    0.815</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We see that kNN, with the default parameter, already beats regression. To see why this is the case, we plot <span class="math inline">\(\hat{p}(\mathbf{x})\)</span> and compare it to the true conditional probability <span class="math inline">\(p(\mathbf{x})\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/knn-fit_770e246744713c849f007372fff9cce6">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/knn-fit_ed86afde9f9554fa9ef84d06917e495d">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="cross-validation_files/figure-html/knn-fit-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
+<figure class="figure"><p><img src="resampling-methods_files/figure-html/knn-fit-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
 </figure>
 </div>
 </div>
 </div>
 <p>We see that kNN better adapts to the non-linear shape of <span class="math inline">\(p(\mathbf{x})\)</span>. However, our estimate has some islands of blue in the red area, which intuitively does not make much sense. We notice that we have higher accuracy in the train set compared to the test set:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-4_0320d9d4dc75072915b93e6de858699b">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-4_3f4f2edec641ea5bdb5c961597732b83">
 <div class="sourceCode" id="cb3"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y_hat_knn</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[</span><span class="st">"Accuracy"</span><span class="op">]</span></span>
 <span><span class="co">#&gt; Accuracy </span></span>
-<span><span class="co">#&gt;    0.858</span></span>
+<span><span class="co">#&gt;    0.859</span></span>
 <span></span>
 <span><span class="va">y_hat_knn</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[</span><span class="st">"Accuracy"</span><span class="op">]</span></span>
@@ -492,24 +492,24 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <span class="header-section-number">29.2</span> Over-training</h2>
 <p>With kNN, over-training is at its worst when we set <span class="math inline">\(k = 1\)</span>. With <span class="math inline">\(k = 1\)</span>, the estimate for each <span class="math inline">\(\mathbf{x}\)</span> in the training set is obtained with just the <span class="math inline">\(y\)</span> corresponding to that point. In this case, if the <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> are unique, we will obtain perfect accuracy in the training set because each point is used to predict itself (if the predictors are not unique and have different outcomes for at least one set of predictors, then it is impossible to predict perfectly).</p>
 <p>Here we fit a kNN model with <span class="math inline">\(k = 1\)</span> and confirm we get nearer to perfect accuracy in the training set:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-5_c8415b1bdd6fc3984ebf0f03f195621f">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-5_9d9b69f026fbf6c41e6fb413dbd37088">
 <div class="sourceCode" id="cb4"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">knn_fit_1</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/knn3.html">knn3</a></span><span class="op">(</span><span class="va">y</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, k <span class="op">=</span> <span class="fl">1</span><span class="op">)</span></span>
 <span><span class="va">y_hat_knn_1</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit_1</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn_1</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[[</span><span class="st">"Accuracy"</span><span class="op">]</span><span class="op">]</span></span>
-<span><span class="co">#&gt; [1] 0.991</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span><span class="co">#&gt; [1] 0.995</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>But in the test set, accuracy is actually worse than what we obtained with regression:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-6_99410ce136690259937c692884532cb6">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-6_018c4b780c7b601708f5587918671802">
 <div class="sourceCode" id="cb5"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">y_hat_knn_1</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit_1</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn_1</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[</span><span class="st">"Accuracy"</span><span class="op">]</span></span>
 <span><span class="co">#&gt; Accuracy </span></span>
 <span><span class="co">#&gt;     0.81</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We can see the over-fitting problem by plotting the decision rule boundaries produced by <span class="math inline">\(p(\mathbf{x})\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/knn-1-overfit_76ed8981a6dd5f18e6f3aa89d8083e1f">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/knn-1-overfit_505d07dce06b0b2dfbf29766e3344e6b">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="cross-validation_files/figure-html/knn-1-overfit-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
+<figure class="figure"><p><img src="resampling-methods_files/figure-html/knn-1-overfit-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
 </figure>
 </div>
 </div>
@@ -518,7 +518,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </section><section id="over-smoothing" class="level2" data-number="29.3"><h2 data-number="29.3" class="anchored" data-anchor-id="over-smoothing">
 <span class="header-section-number">29.3</span> Over-smoothing</h2>
 <p>Although not as badly as with <span class="math inline">\(k=1\)</span>, we saw that with <span class="math inline">\(k = 5\)</span> we also over-trained. Hence, we should consider a larger <span class="math inline">\(k\)</span>. Let’s try, as an example, a much larger number: <span class="math inline">\(k = 401\)</span>.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-7_30f2f9d3483307a48dbb9da03601e38c">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-7_dee44a02dccdc704e25434a0bcac309e">
 <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">knn_fit_401</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/knn3.html">knn3</a></span><span class="op">(</span><span class="va">y</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, k <span class="op">=</span> <span class="fl">401</span><span class="op">)</span></span>
 <span><span class="va">y_hat_knn_401</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span><span class="op">(</span><span class="va">knn_fit_401</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span>, type <span class="op">=</span> <span class="st">"class"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/pkg/caret/man/confusionMatrix.html">confusionMatrix</a></span><span class="op">(</span><span class="va">y_hat_knn_401</span>, <span class="va">mnist_27</span><span class="op">$</span><span class="va">test</span><span class="op">$</span><span class="va">y</span><span class="op">)</span><span class="op">$</span><span class="va">overall</span><span class="op">[</span><span class="st">"Accuracy"</span><span class="op">]</span></span>
@@ -526,10 +526,10 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <span><span class="co">#&gt;     0.76</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The estimate turns out to be similar to the one obtained with regression:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/mnist-27-glm-est_2a4d9fd085f6b066eb5407e4c1edb684">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/mnist-27-glm-est_93fbe855d3727e4558ad14d9cb6ab2aa">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="cross-validation_files/figure-html/mnist-27-glm-est-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
+<figure class="figure"><p><img src="resampling-methods_files/figure-html/mnist-27-glm-est-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
 </figure>
 </div>
 </div>
@@ -539,10 +539,10 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <span class="header-section-number">29.4</span> Parameter tuning</h2>
 <p>It is very common for machine learning algorithms to require that we set a value, or values, before we even fit the model. An example is the <span class="math inline">\(k\)</span> in kNN. In <a href="algorithms.html"><span>Chapter&nbsp;30</span></a> we learn of other examples. These values are referred to as <em>parameters</em> and an important step in machine learning in practice is picking or <em>tunning</em> those parameters.</p>
 <p>So how do we tune these parameters? For example, how do we pick the <span class="math inline">\(k\)</span> in kNN? In principle, we want to pick the <span class="math inline">\(k\)</span> that maximizes accuracy or minimizes the expected MSE as defined in <a href="evaluation-metrics.html#sec-mse"><span>Section&nbsp;26.8</span></a>. The goal of resampling methods is to estimate these quantities for any given algorithm and set of tuning parameters such as <span class="math inline">\(k\)</span>. To understand why we need a special method to do this, let’s repeat what we did above, comparing the training set and test set accuracy, but for different values of <span class="math inline">\(k\)</span>. We can plot the accuracy estimates for each value of <span class="math inline">\(k\)</span>:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/accuracy-vs-k-knn_6de403955de81deab91bc3536da80ae2">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/accuracy-vs-k-knn_340fedce8f17e47d718e2606b1541782">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="cross-validation_files/figure-html/accuracy-vs-k-knn-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="resampling-methods_files/figure-html/accuracy-vs-k-knn-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -571,7 +571,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </section><section id="cross-validation" class="level2" data-number="29.6"><h2 data-number="29.6" class="anchored" data-anchor-id="cross-validation">
 <span class="header-section-number">29.6</span> Cross validation</h2>
 <p>Overall, we are provided a dataset (blue) and we need to build an algorithm, using this dataset, that will eventually be used in completely independent datasets (yellow) that we might not even see.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-10_89885cb058a194209bb6c0aff423a2c0">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-10_b50872fb6ab70d37309defdbebfe923f">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -581,7 +581,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </div>
 <p>So to imitate this situation, we start by carving out a piece of our dataset and pretend it is an independent dataset: we divide the dataset into a <em>training set</em> (blue) and a <em>test set</em> (red). We will train the entirety of our algorithm, including the choice of parameter <span class="math inline">\(\lambda\)</span>, exclusively on the training set and use the test set only for evaluation purposes.</p>
 <p>We usually try to select a small piece of the dataset so that we have as much data as possible to train. However, we also want the test set to be large so that we obtain a stable estimate of the MSE without fitting an impractical number of models. Typical choices are to use 10%-20% of the data for testing.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-11_88220119a7f81b245a97cc6f041d0774">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-11_f1901d0b6981af4cbcbd237b151db618">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-3.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -591,7 +591,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </div>
 <p>Let’s reiterate that it is indispensable that we not use the test set at all: not for filtering out rows, not for selecting features, not for anything!</p>
 <p>But then how do we optimize <span class="math inline">\(\lambda\)</span>? In cross validation, we achieve this by splitting the training set into two: the training and validation sets.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-12_69bbf825738e69f5085f24b703f346dd">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-12_6835e2db9f9944c33401780629bcd145">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-4.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -613,7 +613,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 \hat{\mbox{MSE}}_b(\lambda) = \frac{1}{M}\sum_{i = 1}^M \left(\hat{y}_i^b(\lambda) - y_i^b\right)^2
 \]</span></p>
 <p>As a reminder, this is just one sample and will therefore return a noisy estimate of the true error. In K-fold cross validation, we randomly split the observations into <span class="math inline">\(B\)</span> non-overlapping sets:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-13_70cf2baf55f141aa186b8bc34a8df667">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-13_78d6430801cfab1f006a79853dccc0d8">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-5.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -633,7 +633,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </section><section id="estimate-mse-of-our-optimized-algorithm" class="level3" data-number="29.6.3"><h3 data-number="29.6.3" class="anchored" data-anchor-id="estimate-mse-of-our-optimized-algorithm">
 <span class="header-section-number">29.6.3</span> Estimate MSE of our optimized algorithm</h3>
 <p>We have described how to use cross validation to optimize parameters. However, we now have to take into account the fact that the optimization occurred on the training data and we therefore need an estimate of our final algorithm based on data that was not used to optimize the choice. Here is where we use the test set we separated early on:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-14_9de04140217fc5b6c5c8ceba3f0272e7">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-14_0f177c6d8b0fe87eff7c7b0fd05f5504">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-6.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -642,7 +642,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </div>
 </div>
 <p>We can actually do cross validation again:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-15_5a6b7623fb1f791edd57e5766fc400ea">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-15_d3b8b5a1dd88c8609379fa2a064e8ae8">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-7.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -652,7 +652,7 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </div>
 <p>and obtain a final estimate of our expected loss. However, note that last cross validation iteration means that our entire compute time gets multiplied by <span class="math inline">\(K\)</span>. You will soon learn that fitting each algorithm takes time because we are performing many complex computations. As a result, we are always looking for ways to reduce this time. For the final evaluation, we often just use the one test set.</p>
 <p>Once we are satisfied with this model and want to make it available to others, we could refit the model on the entire dataset, without changing the optimized parameters.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-16_9f6060a4ed6f7f52f132042e06995b95">
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-16_97afd612cf15bbfdea8165ffaf7e1e0a">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="img/cv-8.png" class="img-fluid figure-img" style="width:70.0%"></p>
@@ -669,28 +669,10 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <section id="sec-mse-estimates" class="level3" data-number="29.7.1"><h3 data-number="29.7.1" class="anchored" data-anchor-id="sec-mse-estimates">
 <span class="header-section-number">29.7.1</span> Comparison of MSE estimates</h3>
 <p>In <a href="#sec-knn-cv-intro"><span>Section&nbsp;29.1</span></a>, we computed an estimate of MSE based just on the provided test set (shown in red in the plot below). Here we show how the cross-validation techniques described above help reduce variability. The green curve below shows the results of applying K-fold cross validation with 10 folds, leaving out 10% of the data for validation. We can see that the variance is reduced substantially. The blue curve is the result of using 100 bootstrap samples to estimate MSE. The variability is reduced even further, but at the cost of a 10 fold increase in computation time.</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-18_d4407987cd895ea16a7d122430da2471">
-<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">2023</span><span class="op">-</span><span class="fl">11</span><span class="op">-</span><span class="fl">30</span><span class="op">)</span></span>
-<span><span class="va">boot</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/train.html">train</a></span><span class="op">(</span><span class="va">y</span><span class="op">~</span><span class="va">.</span>, method <span class="op">=</span> <span class="st">"knn"</span>, tuneGrid <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>k<span class="op">=</span><span class="va">ks</span><span class="op">)</span>, </span>
-<span>              data <span class="op">=</span> <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>, </span>
-<span>              trControl <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/trainControl.html">trainControl</a></span><span class="op">(</span>number <span class="op">=</span> <span class="fl">100</span><span class="op">)</span><span class="op">)</span></span>
-<span><span class="va">cv</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/train.html">train</a></span><span class="op">(</span><span class="va">y</span> <span class="op">~</span> <span class="va">.</span>, method <span class="op">=</span> <span class="st">"knn"</span>, </span>
-<span>            tuneGrid <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>k <span class="op">=</span> <span class="va">ks</span><span class="op">)</span>, </span>
-<span>            data <span class="op">=</span> <span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span>,</span>
-<span>            trControl <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/trainControl.html">trainControl</a></span><span class="op">(</span>method <span class="op">=</span> <span class="st">"cv"</span>, </span>
-<span>                                     number <span class="op">=</span> <span class="fl">10</span>, p <span class="op">=</span> <span class="fl">.9</span><span class="op">)</span><span class="op">)</span></span>
-<span></span>
-<span><span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>k <span class="op">=</span> <span class="va">ks</span>, naive <span class="op">=</span> <span class="va">accuracy</span><span class="op">[</span><span class="st">"test"</span>,<span class="op">]</span>, </span>
-<span>           cv <span class="op">=</span> <span class="va">cv</span><span class="op">$</span><span class="va">results</span><span class="op">[</span>,<span class="fl">2</span><span class="op">]</span>,</span>
-<span>           boot <span class="op">=</span> <span class="va">boot</span><span class="op">$</span><span class="va">results</span><span class="op">[</span>,<span class="fl">2</span><span class="op">]</span><span class="op">)</span> <span class="op">|&gt;</span></span>
-<span>  <span class="fu">pivot_longer</span><span class="op">(</span><span class="op">-</span><span class="va">k</span>, values_to <span class="op">=</span> <span class="st">"accuracy"</span>, names_to <span class="op">=</span> <span class="st">"set"</span><span class="op">)</span> <span class="op">|&gt;</span></span>
-<span>  <span class="fu">mutate</span><span class="op">(</span>set <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/factor.html">factor</a></span><span class="op">(</span><span class="va">set</span>, levels <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"naive"</span>, <span class="st">"cv"</span>, <span class="st">"boot"</span><span class="op">)</span>,</span>
-<span>                      labels <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Simple"</span>, <span class="st">"K-fold"</span>, <span class="st">"Boostrap"</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op">|&gt;</span></span>
-<span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span><span class="va">k</span>, <span class="va">accuracy</span>, color <span class="op">=</span> <span class="va">set</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span> </span>
-<span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_path.html">geom_line</a></span><span class="op">(</span><span class="op">)</span> </span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/k-fold-versus-bootstrap_b14bb4eae2aa76bac4d58e6c220a18bf">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
-<figure class="figure"><p><img src="cross-validation_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
+<figure class="figure"><p><img src="resampling-methods_files/figure-html/k-fold-versus-bootstrap-1.png" class="img-fluid figure-img" style="width:70.0%"></p>
 </figure>
 </div>
 </div>
@@ -698,8 +680,8 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </section></section><section id="exercises" class="level2" data-number="29.8"><h2 data-number="29.8" class="anchored" data-anchor-id="exercises">
 <span class="header-section-number">29.8</span> Exercises</h2>
 <p>Generate a set of random predictors and outcomes like this:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-19_b41aafa1dc9e20ec3d9bc131b1e67cee">
-<div class="sourceCode" id="cb8"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1996</span><span class="op">)</span></span>
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-18_ee4282f38d5c69d861f9169a30c0ce52">
+<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1996</span><span class="op">)</span></span>
 <span><span class="va">n</span> <span class="op">&lt;-</span> <span class="fl">1000</span></span>
 <span><span class="va">p</span> <span class="op">&lt;-</span> <span class="fl">10000</span></span>
 <span><span class="va">x</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="va">n</span> <span class="op">*</span> <span class="va">p</span><span class="op">)</span>, <span class="va">n</span>, <span class="va">p</span><span class="op">)</span></span>
@@ -710,8 +692,8 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 </div>
 <p>1. Because <code>x</code> and <code>y</code> are completely independent, you should not be able to predict <code>y</code> using <code>x</code> with accuracy larger than 0.5. Confirm this by running cross validation using logistic regression to fit the model. Because we have so many predictors, we selected a random sample <code>x_subset</code>. Use the subset when training the model. Hint: use the caret <code>train</code> function. The <code>results</code> component of the output of <code>train</code> shows you the accuracy. Ignore the warnings.</p>
 <p>2. Now instead of a random selection of predictors, we are going to search for those that are most predictive of the outcome. We can do this by comparing the values for the <span class="math inline">\(y = 1\)</span> group to those in the <span class="math inline">\(y = 0\)</span> group, for each predictor, using a t-test. You can perform this step as follows:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-20_b34fde006a2514b8c53f090fae906ba8">
-<div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">devtools</span><span class="fu">::</span><span class="fu"><a href="https://remotes.r-lib.org/reference/install_bioc.html">install_bioc</a></span><span class="op">(</span><span class="st">"genefilter"</span><span class="op">)</span></span>
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-19_3a88dd84922b62f95a6c93175b2872ec">
+<div class="sourceCode" id="cb8"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">devtools</span><span class="fu">::</span><span class="fu"><a href="https://remotes.r-lib.org/reference/install_bioc.html">install_bioc</a></span><span class="op">(</span><span class="st">"genefilter"</span><span class="op">)</span></span>
 <span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html">install.packages</a></span><span class="op">(</span><span class="st">"genefilter"</span><span class="op">)</span></span>
 <span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va">genefilter</span><span class="op">)</span></span>
 <span><span class="va">tt</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/genefilter/man/rowFtests.html">colttests</a></span><span class="op">(</span><span class="va">x</span>, <span class="va">y</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -730,8 +712,8 @@ <h1 class="title"><span id="sec-cross-validation" class="quarto-section-identifi
 <p>7. Advanced. Re-do the cross validation but this time include the selection step in the cross validation. The accuracy should now be close to 50%.</p>
 <p>8. Load the <code>tissue_gene_expression</code> dataset. Use the <code>train</code> function to predict tissue from gene expression. Use kNN. What <code>k</code> works best?</p>
 <p>9. The <code>createResample</code> function can be used to create bootstrap samples. For example, we can create 10 bootstrap samples for the <code>mnist_27</code> dataset like this:</p>
-<div class="cell" data-layout-align="center" data-hash="cross-validation_cache/html/unnamed-chunk-21_434ed5eb4810e20a7afa21a5009795ff">
-<div class="sourceCode" id="cb10"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1995</span><span class="op">)</span></span>
+<div class="cell" data-layout-align="center" data-hash="resampling-methods_cache/html/unnamed-chunk-20_a9374356018ffeb6dd2d80e62bfcbef9">
+<div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">1995</span><span class="op">)</span></span>
 <span><span class="va">indexes</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/caret/man/createDataPartition.html">createResample</a></span><span class="op">(</span><span class="va">mnist_27</span><span class="op">$</span><span class="va">train</span><span class="op">$</span><span class="va">y</span>, <span class="fl">10</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>How many times do <code>3</code>, <code>4</code>, and <code>7</code> appear in the first re-sampled index?</p>
diff --git a/docs/ml/resampling-methods_files/figure-html/accuracy-vs-k-knn-1.png b/docs/ml/resampling-methods_files/figure-html/accuracy-vs-k-knn-1.png
new file mode 100644
index 0000000..d5fbb34
Binary files /dev/null and b/docs/ml/resampling-methods_files/figure-html/accuracy-vs-k-knn-1.png differ
diff --git a/docs/ml/resampling-methods_files/figure-html/k-fold-versus-bootstrap-1.png b/docs/ml/resampling-methods_files/figure-html/k-fold-versus-bootstrap-1.png
new file mode 100644
index 0000000..c044353
Binary files /dev/null and b/docs/ml/resampling-methods_files/figure-html/k-fold-versus-bootstrap-1.png differ
diff --git a/docs/ml/cross-validation_files/figure-html/knn-1-overfit-1.png b/docs/ml/resampling-methods_files/figure-html/knn-1-overfit-1.png
similarity index 100%
rename from docs/ml/cross-validation_files/figure-html/knn-1-overfit-1.png
rename to docs/ml/resampling-methods_files/figure-html/knn-1-overfit-1.png
diff --git a/docs/ml/cross-validation_files/figure-html/knn-fit-1.png b/docs/ml/resampling-methods_files/figure-html/knn-fit-1.png
similarity index 100%
rename from docs/ml/cross-validation_files/figure-html/knn-fit-1.png
rename to docs/ml/resampling-methods_files/figure-html/knn-fit-1.png
diff --git a/docs/ml/cross-validation_files/figure-html/mnist-27-glm-est-1.png b/docs/ml/resampling-methods_files/figure-html/mnist-27-glm-est-1.png
similarity index 100%
rename from docs/ml/cross-validation_files/figure-html/mnist-27-glm-est-1.png
rename to docs/ml/resampling-methods_files/figure-html/mnist-27-glm-est-1.png
diff --git a/docs/ml/smoothing.html b/docs/ml/smoothing.html
index 83b7586..4b85541 100644
--- a/docs/ml/smoothing.html
+++ b/docs/ml/smoothing.html
@@ -61,7 +61,7 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
-<link href="../ml/cross-validation.html" rel="next">
+<link href="../ml/resampling-methods.html" rel="next">
 <link href="../ml/conditionals.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
@@ -757,7 +757,7 @@ <h1 class="title">
 <p><span class="math display">\[
 p(\mathbf{x}) = \mbox{Pr}(Y=1 \mid X_1=x_1 , X_2 = x_2).
 \]</span></p>
-<p>with <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> the two predictors defined in <a href="#sec-two-or-seven"><span>Section&nbsp;28.1</span></a>. In this example, the 0s and 1s we observe are “noisy” because for some regions the probabilities <span class="math inline">\(p(\mathbf{x})\)</span> are not that close to 0 or 1. We therefore need to estimate <span class="math inline">\(p(\mathbf{x})\)</span>. Smoothing is an alternative to accomplishing this. In <a href="#sec-two-or-seven"><span>Section&nbsp;28.1</span></a>, we saw that linear regression was not flexible enough to capture the non-linear nature of <span class="math inline">\(p(\mathbf{x})\)</span>, thus smoothing approaches provide an improvement. In <a href="cross-validation.html#sec-knn-cv-intro"><span>Section&nbsp;29.1</span></a>, we describe a popular machine learning algorithm, k-nearest neighbors, which is based on the concept of smoothing.</p>
+<p>with <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> the two predictors defined in <a href="#sec-two-or-seven"><span>Section&nbsp;28.1</span></a>. In this example, the 0s and 1s we observe are “noisy” because for some regions the probabilities <span class="math inline">\(p(\mathbf{x})\)</span> are not that close to 0 or 1. We therefore need to estimate <span class="math inline">\(p(\mathbf{x})\)</span>. Smoothing is an alternative to accomplishing this. In <a href="#sec-two-or-seven"><span>Section&nbsp;28.1</span></a>, we saw that linear regression was not flexible enough to capture the non-linear nature of <span class="math inline">\(p(\mathbf{x})\)</span>, thus smoothing approaches provide an improvement. In <a href="resampling-methods.html#sec-knn-cv-intro"><span>Section&nbsp;29.1</span></a>, we describe a popular machine learning algorithm, k-nearest neighbors, which is based on the concept of smoothing.</p>
 </section><section id="exercises" class="level2" data-number="28.7"><h2 data-number="28.7" class="anchored" data-anchor-id="exercises">
 <span class="header-section-number">28.7</span> Exercises</h2>
 <p>1. The <strong>dslabs</strong> package provides the following dataset with mortality counts for Puerto Rico for 2015-2018.</p>
@@ -1025,7 +1025,7 @@ <h1 class="title">
       </a>          
   </div>
   <div class="nav-page nav-page-next">
-      <a href="../ml/cross-validation.html" class="pagination-link">
+      <a href="../ml/resampling-methods.html" class="pagination-link">
         <span class="nav-page-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
diff --git a/docs/prob/continuous-probability.html b/docs/prob/continuous-probability.html
index f826259..b34e777 100644
--- a/docs/prob/continuous-probability.html
+++ b/docs/prob/continuous-probability.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/prob/discrete-probability.html b/docs/prob/discrete-probability.html
index b28944f..b502bd0 100644
--- a/docs/prob/discrete-probability.html
+++ b/docs/prob/discrete-probability.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/prob/intro-to-prob.html b/docs/prob/intro-to-prob.html
index c666a79..2c5aca4 100644
--- a/docs/prob/intro-to-prob.html
+++ b/docs/prob/intro-to-prob.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/prob/random-variables-sampling-models-clt.html b/docs/prob/random-variables-sampling-models-clt.html
index af88529..42b34cc 100644
--- a/docs/prob/random-variables-sampling-models-clt.html
+++ b/docs/prob/random-variables-sampling-models-clt.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/search.json b/docs/search.json
index 662f82c..85d63c5 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -739,7 +739,7 @@
     "href": "linear-models/multivariate-regression.html#case-study-moneyball",
     "title": "15  Multivariate Regression",
     "section": "\n15.1 Case study: Moneyball",
-    "text": "15.1 Case study: Moneyball\nMoneyball: The Art of Winning an Unfair Game by Michael Lewis focuses on the Oakland Athletics (A’s) baseball team and its general manager, Billy Beane, the person tasked with building the team.\nTraditionally, baseball teams use scouts to help them decide what players to hire. These scouts evaluate players by observing them perform, tending to favor athletic players with observable physical abilities. For this reason, scouts generally agree on who the best players are and, as a result, these players are often in high demand. This in turn drives up their salaries.\nFrom 1989 to 1991, the A’s had one of the highest payrolls in baseball. They were able to buy the best players and, during that time, were one of the best teams. However, in 1995, the A’s team owner changed and the new management cut the budget drastically, leaving then general manager, Sandy Alderson, with one of the lowest payrolls in baseball. He could no longer afford the most sought-after players. As a result, Alderson began using a statistical approach to find inefficiencies in the market. Alderson was a mentor to Billy Beane, who succeeded him in 1998 and fully embraced data science, as opposed to scouts, as a method for finding low-cost players that data predicted would help the team win. Today, this strategy has been adapted by most baseball teams. As we will see, regression plays a large role in this approach.\nAs motivation for this part of the book, we will pretend it is 2002 and try to build a baseball team with a limited budget, just like the A’s had to do. To appreciate what you are up against, note that in 2002 the Yankees’ payroll of $125,928,583 more than tripled the Oakland A’s $39,679,746:\n\n\n\n\n\n\n\n\nStatistics have been used in baseball since its beginnings. The dataset we will be using, included in the Lahman library, goes back to the 19th century. For example, a summary statistics we will describe soon, the batting average, has been used for decades to summarize a batter’s success. Other statistics1 such as home runs (HR), runs batted in (RBI), and stolen bases (SB) are reported for each player in the game summaries included in the sports section of newspapers, with players rewarded for high numbers. Although summary statistics such as these were widely used in baseball, data analysis per se was not. These statistics were arbitrarily decided on without much thought as to whether they actually predicted anything or were related to helping a team win.\nThis changed with Bill James2. In the late 1970s, this aspiring writer and baseball fan started publishing articles describing more in-depth analysis of baseball data. He named the approach of using data to predict what outcomes best predicted if a team would win sabermetrics3. Yet until Billy Beane made sabermetrics the center of his baseball operation, Bill James’ work was mostly ignored by the baseball world. Currently, sabermetrics popularity is no longer limited to just baseball, with other sports also adopting this approach.\nTo simplify the exercise, we will focus on scoring runs and ignore the two other important aspects of the game: pitching and fielding. We will see how regression analysis can help develop strategies to build a competitive baseball team with a constrained budget. The approach can be divided into two separate data analyses. In the first, we determine which recorded player-specific statistics predict runs. In the second, we examine if players were undervalued based on the predictions from our first analysis.\n\n15.1.1 Baseball basics\nTo see how regression will help us find undervalued players, we actually don’t need to understand all the details about the game of baseball, which has over 100 rules. Here, we distill the sport to the basic knowledge one needs to know how to effectively attack the data science problem.\nThe goal of a baseball game is to score more runs (points) than the other team. Each team has 9 batters that have an opportunity to hit a ball with a bat in a predetermined order. After the 9th batter has had their turn, the first batter bats again, then the second, and so on. Each time a batter has an opportunity to bat, we call it a plate appearance (PA). At each PA, the other team’s pitcher throws the ball and the batter tries to hit it. The PA ends with an binary outcome: the batter either makes an out (failure) and returns to the bench, or the batter doesn’t (success) and can run around the bases, potentially scoring a run (reaching all 4 bases). Each team gets nine tries, referred to as innings, to score runs, and each inning ends after three outs (three failures).\nHere is a video showing a success: https://www.youtube.com/watch?v=HL-XjMCPfio. And here is one showing a failure: https://www.youtube.com/watch?v=NeloljCx-1g. In these videos, we see how luck is involved in the process. When at bat, the batter wants to hit the ball hard. If the batter hits it hard enough, it is a HR, the best possible outcome as the batter gets at least one automatic run. But sometimes, due to chance, the batter hits the ball very hard and a defender catches it, resulting in an out. In contrast, sometimes the batter hits the ball softly, but it lands just in the right place. The fact that there is chance involved hints at why probability models will be involved.\nNow, there are several ways to succeed. Understanding this distinction will be important for our analysis. When the batter hits the ball, the batter wants to pass as many bases as possible. There are four bases, with the fourth one called home plate. Home plate is where batters start by trying to hit, so the bases form a cycle.\n\n\n\n\n\n\n\n\n(Courtesy of Cburnett4. CC BY-SA 3.0 license5.) \nA batter who goes around the bases and arrives home, scores a run.\nWe are simplifying a bit, but there are five ways a batter can succeed, that is, not make an out:\n\nBases on balls (BB) - the pitcher fails to throw the ball through a predefined area considered to be hittable (the strike zone), so the batter is permitted to go to first base.\nSingle - Batter hits the ball and gets to first base.\nDouble (2B) - Batter hits the ball and gets to second base.\nTriple (3B) - Batter hits the ball and gets to third base.\nHome Run (HR) - Batter hits the ball and goes all the way home and scores a run.\n\nHere is an example of a HR: https://www.youtube.com/watch?v=xYxSZJ9GZ-w. If a batter reaches a base, the batter still has a chance of reaching home and scoring a run if the next batter succeeds with a hit. While the batter is on base, the batter can also try to steal a base (SB). If a batter runs fast enough, the batter can try to advance from one base to the next without the other team tagging the runner. Here is an example of a stolen base: https://www.youtube.com/watch?v=JSE5kfxkzfk.\nAll these events are tracked throughout the season and are available to us through the Lahman package. Now we will start discussing how data analysis can help us decide how to use these statistics to evaluate players.\n\n15.1.2 No awards for BB\nHistorically, the batting average has been considered the most important offensive statistic. To define this average, we define a hit (H) and an at bat (AB). Singles, doubles, triples, and home runs are hits. The fifth way to be successful, BB, is not a hit. An AB is the number of times in which you either get a hit or make an out; BBs are excluded. The batting average is simply H/AB and is considered the main measure of a success rate. Today, this success rate ranges from 20% to 38%. We refer to the batting average in thousands so, for example, if your success rate is 28%, we call it batting 280.\n\n\n\n\n\n\n\n\n(Picture courtesy of Keith Allison6. CC BY-SA 2.0 license7.)\nOne of Bill James’ first important insights is that the batting average ignores BB, but a BB is a success. Instead of batting average, James proposed the use of the on base percentage (OBP), which he defined as (H+BB)/(AB+BB), or simply the proportion of plate appearances that don’t result in an out, a very intuitive measure. He noted that a player that accumulates many more BB than the average player might go unrecognized if the batter does not excel in batting average. But is this player not helping produce runs? No award is given to the player with the most BB. However, bad habits are hard to break and baseball did not immediately adopt OBP as an important statistic. In contrast, total stolen bases were considered important and an award8 given to the player with the most. But players with high totals of SB also made more outs as they did not always succeed. Does a player with high SB total help produce runs? Can we use data science to determine if it’s better to pay for players with high BB or SB?\n\n15.1.3 Base on balls or stolen bases?\nOne of the challenges in this analysis is that it is not obvious how to determine if a player produces runs because so much depends on his teammates. Although we keep track of the number of runs scored by a player, remember that if player X bats right before someone who hits many HRs, batter X will score many runs. Note these runs don’t necessarily happen if we hire player X, but not his HR hitting teammate.\nHowever, we can examine team-level statistics. How do teams with many SB compare to teams with few? How about BB? We have data! Let’s examine some. We start by creating a data frame with statistics from 1962 (the first year all teams played 162 games, like today, instead of 154) to 2001 (the year before the year for which we will construct a team). We convert the data to a per game rate, because a small proportion of seasons had less games than usual due to strikes, and some teams played extra games due to tie breakers.\n\nlibrary(tidyverse)\n#&gt; ── Attaching core tidyverse packages ──────────────── tidyverse 2.0.0 ──\n#&gt; ✔ dplyr     1.1.1     ✔ readr     2.1.4\n#&gt; ✔ forcats   1.0.0     ✔ stringr   1.5.0\n#&gt; ✔ lubridate 1.9.2     ✔ tibble    3.2.1\n#&gt; ✔ purrr     1.0.1     ✔ tidyr     1.3.0\n#&gt; ── Conflicts ────────────────────────────────── tidyverse_conflicts() ──\n#&gt; ✖ dplyr::filter() masks stats::filter()\n#&gt; ✖ dplyr::lag()    masks stats::lag()\n#&gt; ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors\nlibrary(Lahman)\ndat &lt;- Teams |&gt; filter(yearID %in% 1962:2002) |&gt;\n  mutate(team = teamID, year = yearID, r = R/G, \n         singles = (H - X2B - X3B - HR)/G, \n         doubles = X2B/G, triples = X3B/G, hr = HR/G,\n         sb = SB/G, bb = BB/G) |&gt;\n  select(team, year, r, singles, doubles, triples, hr, sb, bb)\n\nNow let’s start with a obvious question: do teams that hit more home runs score more runs? When exploring the relationship between two variables, The visualization of choice is a scatterplot:\n\np &lt;- dat |&gt; ggplot(aes(hr, r)) + geom_point(alpha = 0.5)\np \n\n\n\n\n\n\n\nWe defined p because we will add it to this plot latter. The plot shows a strong association: teams with more HRs tend to score more runs. Now let’s examine the relationship between stolen bases and runs:\n\ndat |&gt; ggplot(aes(sb, r)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nHere the relationship is not as clear. Finally, let’s examine the relationship between BB and runs:\n\ndat |&gt; ggplot(aes(bb, r)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nHere again we see a clear association. But does this mean that increasing a team’s BBs causes an increase in runs? One of the most important lessons you learn in this book is that association is not causation. In fact, it looks like BBs and HRs are also associated:\n\ndat |&gt; ggplot(aes(hr, bb)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nWe know that HRs cause runs because when a player hits a HR, they are guaranteed at least one run. Could it be that HRs also cause BB and this makes it appear as if BB cause runs? When this happens, we say there is confounding, an important concept we will learn more about throughout this section.\nLinear regression will help us parse out the information and quantify the associations. This, in turn, will aid us in determining what players to recruit. Specifically, we will try to predict things like how many more runs will a team score if we increase the number of BBs, but keep the HRs fixed? Regression will help us answer questions like this one.\n\n15.1.4 Regression applied to baseball statistics\nCan we use regression with these data? First, notice that the HR and Run data, shown above, appear to be bivariate normal. Specifically, the qq-plots confirm that the normal approximation for each HR strata is useful here:\n\ndat |&gt; mutate(z_hr = round(scale(hr))) |&gt;\n  filter(z_hr %in% -2:3) |&gt;\n  ggplot() +  \n  stat_qq(aes(sample = r)) +\n  facet_wrap(~z_hr) \n\n\n\n\n\n\n\nNow we are ready to use linear regression to predict the number of runs a team will score, if we know how many home runs the team hits using regression:\n\nhr_fit  &lt;- lm(r ~ hr, data = dat)$coef\np + geom_abline(intercept = hr_fit[[1]], slope = hr_fit[[2]])\n\n\n\n\n\n\n\nNote that we can obtain the same plot more quickly by using the ggplot2 function geom_smooth, which computes and adds a regression line to plot along with confidence intervals. We use the argument method = \"lm\", which stands for linear model, the title of an upcoming section. We simplify the code above like this:\n\np + geom_smooth(method = \"lm\")\n\n\n\n\n\n\n\nIn the example above, the slope is 1.8517449. This tells us that teams that hit 1 more HR per game than the average team, score 1.8517449 more runs per game than the average team. Given that the most common final score is a difference of one run, this can certainly lead to a large increase in wins. Not surprisingly, HR hitters are very expensive. Because we are working on a budget, we will need to find some other way to increase wins. In the next chapter, we introduce linear models, which provide an framework for performing this analysis. In @ref{@moneyball}, we apply what have learned to build a baseball team."
+    "text": "15.1 Case study: Moneyball\nMoneyball: The Art of Winning an Unfair Game by Michael Lewis focuses on the Oakland Athletics (A’s) baseball team and its general manager, Billy Beane, the person tasked with building the team.\nTraditionally, baseball teams use scouts to help them decide what players to hire. These scouts evaluate players by observing them perform, tending to favor athletic players with observable physical abilities. For this reason, scouts generally agree on who the best players are and, as a result, these players are often in high demand. This in turn drives up their salaries.\nFrom 1989 to 1991, the A’s had one of the highest payrolls in baseball. They were able to buy the best players and, during that time, were one of the best teams. However, in 1995, the A’s team owner changed and the new management cut the budget drastically, leaving then general manager, Sandy Alderson, with one of the lowest payrolls in baseball. He could no longer afford the most sought-after players. As a result, Alderson began using a statistical approach to find inefficiencies in the market. Alderson was a mentor to Billy Beane, who succeeded him in 1998 and fully embraced data science, as opposed to relying exclusively on scouts, as a method for finding low-cost players that data predicted would help the team win. Today, this strategy has been adapted by most baseball teams. As we will see, regression plays a significant role in this approach.\nAs motivation for this part of the book, let’s imagine it is 2002, and attempt to build a baseball team with a limited budget, just like the A’s had to do. To appreciate what you are up against, note that in 2002 the Yankees’ payroll of $125,928,583 more than tripled the Oakland A’s payroll of $39,679,746:\n\n\n\n\n\n\n\n\nStatistics have been used in baseball since its beginnings. The dataset we will be using, included in the Lahman library, goes back to the 19th century. For example, a summary statistics we will describe soon, the batting average, has been used for decades to summarize a batter’s success. Other statistics1, such as home runs (HR), runs batted in (RBI), and stolen bases (SB), are reported for each player in the game summaries included in the sports section of newspapers, with players rewarded for high numbers. Although summary statistics such as these were widely used in baseball, data analysis per se was not. These statistics were arbitrarily chosen without much thought as to whether they actually predicted anything or were related to helping a team win.\nThis changed with Bill James2. In the late 1970s, this aspiring writer and baseball fan started publishing articles describing more in-depth analysis of baseball data. He named the approach of using data to determine which outcomes best predicted if a team would win sabermetrics3. Yet until Billy Beane made sabermetrics the center of his baseball operation, Bill James’ work was mostly ignored by the baseball world. Currently, sabermetrics popularity is no longer limited to just baseball, with other sports also adopting this approach.\nTo simplify the exercise, we will focus on scoring runs and ignore the two other important aspects of the game: pitching and fielding. We will see how regression analysis can help develop strategies to build a competitive baseball team with a constrained budget. The approach can be divided into two separate data analyses. In the first, we determine which recorded player-specific statistics predict runs. In the second, we examine if players were undervalued based on the predictions from our first analysis.\n\n15.1.1 Baseball basics\nTo understand how regression helps us find undervalued players, we don’t need to delve into all the details of the game of baseball, which has over 100 rules. Here, we distill the sport to the basic knowledge necessary for effectively addressing the data science problem.\nThe goal of a baseball game is to score more runs (points) than the other team. Each team has 9 batters that have an opportunity to hit a ball with a bat in a predetermined order. After the 9th batter has had their turn, the first batter bats again, then the second, and so on. Each time a batter has an opportunity to bat, we call it a plate appearance (PA). At each PA, the other team’s pitcher throws the ball and the batter tries to hit it. The PA ends with an binary outcome: the batter either makes an out (failure) and returns to the bench, or the batter doesn’t (success) and can run around the bases, potentially scoring a run (reaching all 4 bases). Each team gets nine tries, referred to as innings, to score runs, and each inning ends after three outs (three failures).\nHere is a video showing a success: https://www.youtube.com/watch?v=HL-XjMCPfio. And here is one showing a failure: https://www.youtube.com/watch?v=NeloljCx-1g. In these videos, we see how luck is involved in the process. When at bat, the batter wants to hit the ball hard. If the batter hits it hard enough, it is a HR, the best possible outcome as the batter gets at least one automatic run. But sometimes, due to chance, the batter hits the ball very hard and a defender catches it, resulting in an out. In contrast, sometimes the batter hits the ball softly, but it lands in just the right place. The fact that there is chance involved hints at why probability models will be involved.\nNow, there are several ways to succeed. Understanding this distinction will be important for our analysis. When the batter hits the ball, the batter wants to pass as many bases as possible. There are four bases, with the fourth one called home plate. Home plate is where batters start by trying to hit, so the bases form a cycle.\n\n\n\n\n\n\n\n\n(Courtesy of Cburnett4. CC BY-SA 3.0 license5.) \nA batter who goes around the bases and arrives home, scores a run.\nWe are simplifying a bit, but there are five ways a batter can succeed, meaning not make an out:\n\nBases on balls (BB): The pitcher fails to throw the ball through a predefined area considered to be hittable (the strike zone), so the batter is permitted to go to first base.\nSingle: The batter hits the ball and gets to first base.\nDouble (2B): The batter hits the ball and gets to second base.\nTriple (3B): The batter hits the ball and gets to third base.\nHome Run (HR): The batter hits the ball and goes all the way home and scores a run.\n\nHere is an example of a HR: https://www.youtube.com/watch?v=xYxSZJ9GZ-w. If a batter reaches a base, the batter still has a chance of reaching home and scoring a run if the next batter succeeds with a hit. While the batter is on base, the batter can also try to steal a base (SB). If a batter runs fast enough, the batter can try to advance from one base to the next without the other team tagging the runner. Here is an example of a stolen base: https://www.youtube.com/watch?v=JSE5kfxkzfk.\nAll these events are tracked throughout the season and are available to us through the Lahman package. Now, we can begin discussing how data analysis can help us determine how to use these statistics to evaluate players.\n\n15.1.2 No awards for BB\nHistorically, the batting average has been considered the most important offensive statistic. To define this average, we define a hit (H) and an at bat (AB). Singles, doubles, triples, and home runs are hits. The fifth way to be successful, BB, is not a hit. An AB is the number of times in which you either get a hit or make an out; BBs are excluded. The batting average is simply H/AB and is considered the main measure of a success rate. Today, this success rate ranges from 20% to 38%. We refer to the batting average in thousands so, for example, if your success rate is 28%, we call it batting 280.\n\n\n\n\n\n\n\n\n(Picture courtesy of Keith Allison6. CC BY-SA 2.0 license7.)\nOne of Bill James’ first important insights is that the batting average ignores BB, but a BB is a success. Instead of batting average, James proposed the use of the on-base percentage (OBP), which he defined as (H+BB)/(AB+BB), or simply the proportion of plate appearances that don’t result in an out, a very intuitive measure. He noted that a player that accumulates many more BB than the average player might go unrecognized if the batter does not excel in batting average. But is this player not helping produce runs? No award is given to the player with the most BB. However, bad habits are hard to break and baseball did not immediately adopt OBP as an important statistic. In contrast, total stolen bases were considered important and an award8 given to the player with the most. But players with high totals of SB also made more outs as they did not always succeed. Does a player with high SB total help produce runs? Can we use data science to determine if it’s better to pay for players with high BB or SB?\n\n15.1.3 Base on balls or stolen bases?\nOne of the challenges in this analysis is that it is not obvious how to determine if a player produces runs because so much depends on his teammates. Although we keep track of the number of runs scored by a player, remember that if player X bats right before someone who hits many HRs, batter X will score many runs. Note these runs don’t necessarily happen if we hire player X, but not his HR hitting teammate.\nHowever, we can examine team-level statistics. How do teams with many SB compare to teams with few? How about BB? We have data! Let’s examine some. We start by creating a data frame with statistics from 1962 (the first year all teams played 162 games, like today, instead of 154) to 2001 (the year before the year for which we will construct a team). We convert the data to a per game rate, because a small proportion of seasons had less games than usual due to strikes, and some teams played extra games due to tie breakers.\n\nlibrary(tidyverse)\n#&gt; ── Attaching core tidyverse packages ──────────────── tidyverse 2.0.0 ──\n#&gt; ✔ dplyr     1.1.1     ✔ readr     2.1.4\n#&gt; ✔ forcats   1.0.0     ✔ stringr   1.5.0\n#&gt; ✔ lubridate 1.9.2     ✔ tibble    3.2.1\n#&gt; ✔ purrr     1.0.1     ✔ tidyr     1.3.0\n#&gt; ── Conflicts ────────────────────────────────── tidyverse_conflicts() ──\n#&gt; ✖ dplyr::filter() masks stats::filter()\n#&gt; ✖ dplyr::lag()    masks stats::lag()\n#&gt; ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors\nlibrary(Lahman)\ndat &lt;- Teams |&gt; filter(yearID %in% 1962:2002) |&gt;\n  mutate(team = teamID, year = yearID, r = R/G, \n         singles = (H - X2B - X3B - HR)/G, \n         doubles = X2B/G, triples = X3B/G, hr = HR/G,\n         sb = SB/G, bb = BB/G) |&gt;\n  select(team, year, r, singles, doubles, triples, hr, sb, bb)\n\nNow let’s start with a obvious question: do teams that hit more home runs score more runs? When exploring the relationship between two variables, The visualization of choice is a scatterplot:\n\np &lt;- dat |&gt; ggplot(aes(hr, r)) + geom_point(alpha = 0.5)\np \n\n\n\n\n\n\n\nWe defined p because we will add it to this plot latter. The plot shows a strong association: teams with more HRs tend to score more runs. Now let’s examine the relationship between stolen bases and runs:\n\ndat |&gt; ggplot(aes(sb, r)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nHere the relationship is not as clear. Finally, let’s examine the relationship between BB and runs:\n\ndat |&gt; ggplot(aes(bb, r)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nHere again we see a clear association. But does this mean that increasing a team’s BBs causes an increase in runs? One of the most important lessons you learn in this book is that association is not causation. In fact, it looks like BBs and HRs are also associated:\n\ndat |&gt; ggplot(aes(hr, bb)) + geom_point(alpha = 0.5)\n\n\n\n\n\n\n\nWe know that HRs cause runs because when a player hits a HR, they are guaranteed at least one run. Could it be that HRs also cause BB and this makes it appear as if BB cause runs? When this happens, we say there is confounding, an important concept we will learn more about throughout this section.\nLinear regression will help us parse out the information and quantify the associations. This, in turn, will aid us in determining what players to recruit. Specifically, we will try to predict things like how many more runs will a team score if we increase the number of BBs, but keep the HRs fixed? Regression will help us answer questions like this one.\n\n15.1.4 Regression applied to baseball statistics\nCan we use regression with these data? First, notice that the HR and Run data, shown above, appear to be bivariate normal. Specifically, the qqplots confirm that the normal approximation for each HR strata is useful here:\n\ndat |&gt; mutate(z_hr = round(scale(hr))) |&gt;\n  filter(z_hr %in% -2:3) |&gt;\n  ggplot() +  \n  stat_qq(aes(sample = r)) +\n  facet_wrap(~z_hr) \n\n\n\n\n\n\n\nNow we are ready to use linear regression to predict the number of runs a team will score, if we know how many home runs the team hits using regression:\n\nhr_fit  &lt;- lm(r ~ hr, data = dat)$coef\np + geom_abline(intercept = hr_fit[[1]], slope = hr_fit[[2]])\n\n\n\n\n\n\n\nNote that we can obtain the same plot more quickly by using the ggplot2 function geom_smooth, which computes and adds a regression line to plot along with confidence intervals. We use the argument method = \"lm\", which stands for linear model, the title of an upcoming section. We simplify the code above like this:\n\np + geom_smooth(method = \"lm\")\n\n\n\n\n\n\n\nIn the example above, the slope is 1.8517449. This tells us that teams that hit 1 more HR per game than the average team, score 1.8517449 more runs per game than the average team. Given that the most common final score is a difference of one run, this can certainly lead to a large increase in wins. Not surprisingly, HR hitters are very expensive. Because we are working on a budget, we will need to find some other way to increase wins. In the next chapter, we introduce linear models, which provide an framework for performing this analysis. In @ref{@moneyball}, we apply what have learned to build a baseball team."
   },
   {
     "objectID": "linear-models/multivariate-regression.html#the-broom-package",
@@ -767,7 +767,7 @@
     "href": "linear-models/multivariate-regression.html#exercises",
     "title": "15  Multivariate Regression",
     "section": "\n15.5 Exercises",
-    "text": "15.5 Exercises\nWe have shown how BB and singles have similar predictive power for scoring runs. Another way to compare the usefulness of these baseball metrics is by assessing their stability across the years. Since we have to pick players based on their previous performances, we prefer metrics that are more stable. In these exercises, we will compare the stability of singles and BBs.\n1. Before we begin, we want to generate two tables. One for 2002 and another for the average of 1999-2001 seasons. We want to define per plate appearance statistics. Here is how we create the 2017 table, keeping only players with more than 100 plate appearances:\n\nlibrary(Lahman)\ndat &lt;- Batting |&gt; filter(yearID == 2002) |&gt;\n  mutate(pa = AB + BB, \n         singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) |&gt;\n  filter(pa &gt;= 100) |&gt;\n  select(playerID, singles, bb)\n\nNow, compute a similar table, but with rates computed over 1999-2001.\n2. You can use the inner_join function to combine the 2001 data and averages in the same table:\n\ndat &lt;- inner_join(dat, avg, by = \"playerID\")\n\nCompute the correlation between 2002 and the previous seasons for singles and BB.\n3. Note that the correlation is higher for BB. To quickly get an idea of the uncertainty associated with this correlation estimate, we will fit a linear model and compute confidence intervals for the slope coefficient. However, first make scatterplots to confirm that fitting a linear model is appropriate.\n4. Now fit a linear model for each metric and use the confint function to compare the estimates.\n5. In a previous section, we computed the correlation between mothers and daughters, mothers and sons, fathers and daughters, and fathers and sons. We noticed that the highest correlation is between fathers and sons and the lowest is between mothers and sons. We can compute these correlations using:\n\nlibrary(HistData)\nset.seed(1)\ngalton_heights &lt;- GaltonFamilies |&gt;\n  group_by(family, gender) |&gt;\n  sample_n(1) |&gt;\n  ungroup()\n\ncors &lt;- galton_heights |&gt; \n  pivot_longer(father:mother, names_to = \"parent\", values_to = \"parentHeight\") |&gt;\n  mutate(child = ifelse(gender == \"female\", \"daughter\", \"son\")) |&gt;\n  unite(pair, c(\"parent\", \"child\")) |&gt; \n  group_by(pair) |&gt;\n  summarize(cor = cor(parentHeight, childHeight))\n\nAre these differences statistically significant? To answer this, we will compute the slopes of the regression line along with their standard errors. Start by using lm and the broom package to compute the slopes LSE and the standard errors.\n6. Repeat the exercise above, but compute a confidence interval as well.\n7. Plot the confidence intervals and notice that they overlap, which implies that the data is consistent with the inheritance of height being independent of sex.\n8. Because we are selecting children at random, we can actually do something like a permutation test here. Repeat the computation of correlations 100 times taking a different sample each time. Hint: use similar code to what we used with simulations.\n9. Fit a linear regression model to obtain the effects of BB and HR on Runs (at the team level) in 1971. Use the tidy function in the broom package to obtain the results in a data frame.\n10. Now let’s repeat the above for each year since 1962 and make a plot. Use summarize and the broom package to fit this model for every year since 1962.\n11. Use the results of the previous exercise to plot the estimated effects of BB on runs.\n12. Advanced. Write a function that takes R, HR, and BB as arguments and fits two linear models: R ~ BB and R~BB+HR. Then use the summary function to obtain the BB for both models for each year since 1962. Then plot these against each other as a function of time.\n13. Since the 1980s, sabermetricians have used a summary statistic different from batting average to evaluate players. They realized walks were important and that doubles, triples, and HRs, should be weighed more than singles. As a result, they proposed the following metric:\n\\[\n\\frac{\\mbox{BB}}{\\mbox{PA}} + \\frac{\\mbox{Singles} + 2 \\mbox{Doubles} + 3 \\mbox{Triples} + 4\\mbox{HR}}{\\mbox{AB}}\n\\]\nThey called this on-base-percentage plus slugging percentage (OPS). Although the sabermetricians probably did not use regression, here we demonstrate how this metric closely aligns with regression results.\nCompute the OPS for each team in the 2001 season. Then plot Runs per game versus OPS.\n14. For every year since 1962, compute the correlation between runs per game and OPS. Then plot these correlations as a function of year.\n15. Keep in mind that we can rewrite OPS as a weighted average of BBs, singles, doubles, triples, and HRs. We know that the weights for doubles, triples, and HRs are 2, 3, and 4 times that of singles. But what about BB? What is the weight for BB relative to singles? Hint: the weight for BB relative to singles will be a function of AB and PA.\n16. Consider that the weight for BB, \\(\\frac{\\mbox{AB}}{\\mbox{PA}}\\), will change from team to team. To assess its variability, compute and plot this quantity for each team for each year since 1962. Then plot it again, but instead of computing it for every team, compute and plot the ratio for the entire year. Then, once you are convinced that there is not much of a time or team trend, report the overall average.\n17. So now we know that the formula for OPS is proportional to \\(0.91 \\times \\mbox{BB} + \\mbox{singles} + 2 \\times \\mbox{doubles} + 3 \\times \\mbox{triples} + 4 \\times \\mbox{HR}\\). Let’s see how these coefficients compare to those obtained with regression. Fit a regression model to the data after 1962, as done earlier: using per game statistics for each year for each team. After fitting this model, report the coefficients as weights relative to the coefficient for singles.\n18. We see that our linear regression model coefficients follow the same general trend as those used by OPS, but with slightly less weight for metrics other than singles. For each team in years after 1962, compute the OPS, the predicted runs with the regression model, and compute the correlation between the two, as well as the correlation with runs per game.\n19. We see that using the regression approach predicts runs slightly better than OPS, but not that much. However, note that we have been computing OPS and predicting runs for teams when these measures are used to evaluate players. Let’s show that OPS is quite similar to what one obtains with regression at the player level. For the 1962 season and onward, compute the OPS and the predicted runs from our model for each player, and plot them. Use the PA per game correction we used in the previous chapter:\n20. Which players have shown the largest difference between their rank by predicted runs and OPS?"
+    "text": "15.5 Exercises\nWe have shown how BB and singles have similar predictive power for scoring runs. Another way to compare the usefulness of these baseball metrics is by assessing their stability across the years. Since we have to pick players based on their previous performances, we prefer metrics that are more stable. In these exercises, we will compare the stability of singles and BBs.\n1. Before we begin, we want to generate two tables. One for 2002 and another for the average of 1999-2001 seasons. We want to define per plate appearance statistics. Here is how we create the 2017 table, keeping only players with more than 100 plate appearances:\n\nlibrary(Lahman)\ndat &lt;- Batting |&gt; filter(yearID == 2002) |&gt;\n  mutate(pa = AB + BB, \n         singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) |&gt;\n  filter(pa &gt;= 100) |&gt;\n  select(playerID, singles, bb)\n\nNow, compute a similar table, but with rates computed over 1999-2001.\n2. You can use the inner_join function to combine the 2001 data and averages in the same table:\n\ndat &lt;- inner_join(dat, avg, by = \"playerID\")\n\nCompute the correlation between 2002 and the previous seasons for singles and BB.\n3. Note that the correlation is higher for BB. To quickly get an idea of the uncertainty associated with this correlation estimate, we will fit a linear model and compute confidence intervals for the slope coefficient. However, first make scatterplots to confirm that fitting a linear model is appropriate.\n4. Now fit a linear model for each metric and use the confint function to compare the estimates.\n5. In a previous section, we computed the correlation between mothers and daughters, mothers and sons, fathers and daughters, and fathers and sons. We noticed that the highest correlation is between fathers and sons and the lowest is between mothers and sons. We can compute these correlations using:\n\nlibrary(HistData)\nset.seed(1)\ngalton_heights &lt;- GaltonFamilies |&gt;\n  group_by(family, gender) |&gt;\n  sample_n(1) |&gt;\n  ungroup()\n\ncors &lt;- galton_heights |&gt; \n  pivot_longer(father:mother, names_to = \"parent\", values_to = \"parentHeight\") |&gt;\n  mutate(child = ifelse(gender == \"female\", \"daughter\", \"son\")) |&gt;\n  unite(pair, c(\"parent\", \"child\")) |&gt; \n  group_by(pair) |&gt;\n  summarize(cor = cor(parentHeight, childHeight))\n\nAre these differences statistically significant? To answer this, we will compute the slopes of the regression line along with their standard errors. Start by using lm and the broom package to compute the slopes LSE and the standard errors.\n6. Repeat the exercise above, but compute a confidence interval as well.\n7. Plot the confidence intervals and notice that they overlap, which implies that the data is consistent with the inheritance of height being independent of sex.\n8. Because we are selecting children at random, we can actually do something like a permutation test here. Repeat the computation of correlations 100 times taking a different sample each time. Hint: Use similar code to what we used with simulations.\n9. Fit a linear regression model to obtain the effects of BB and HR on Runs (at the team level) in 1971. Use the tidy function in the broom package to obtain the results in a data frame.\n10. Now let’s repeat the above for each year since 1962 and make a plot. Use summarize and the broom package to fit this model for every year since 1962.\n11. Use the results of the previous exercise to plot the estimated effects of BB on runs.\n12. Advanced. Write a function that takes R, HR, and BB as arguments and fits two linear models: R ~ BB and R~BB+HR. Then use the summary function to obtain the BB for both models for each year since 1962. Then plot these against each other as a function of time.\n13. Since the 1980s, sabermetricians have used a summary statistic different from batting average to evaluate players. They realized walks were important and that doubles, triples, and HRs, should be weighed more than singles. As a result, they proposed the following metric:\n\\[\n\\frac{\\mbox{BB}}{\\mbox{PA}} + \\frac{\\mbox{Singles} + 2 \\mbox{Doubles} + 3 \\mbox{Triples} + 4\\mbox{HR}}{\\mbox{AB}}\n\\]\nThey called this on-base-percentage plus slugging percentage (OPS). Although the sabermetricians probably did not use regression, here we demonstrate how this metric closely aligns with regression results.\nCompute the OPS for each team in the 2001 season. Then plot Runs per game versus OPS.\n14. For every year since 1962, compute the correlation between runs per game and OPS. Then plot these correlations as a function of year.\n15. Keep in mind that we can rewrite OPS as a weighted average of BBs, singles, doubles, triples, and HRs. We know that the weights for doubles, triples, and HRs are 2, 3, and 4 times that of singles. But what about BB? What is the weight for BB relative to singles? Hint: The weight for BB relative to singles will be a function of AB and PA.\n16. Consider that the weight for BB, \\(\\frac{\\mbox{AB}}{\\mbox{PA}}\\), will change from team to team. To assess its variability, compute and plot this quantity for each team for each year since 1962. Then plot it again, but instead of computing it for every team, compute and plot the ratio for the entire year. Then, once you are convinced that there is not much of a time or team trend, report the overall average.\n17. So now we know that the formula for OPS is proportional to \\(0.91 \\times \\mbox{BB} + \\mbox{singles} + 2 \\times \\mbox{doubles} + 3 \\times \\mbox{triples} + 4 \\times \\mbox{HR}\\). Let’s see how these coefficients compare to those obtained with regression. Fit a regression model to the data after 1962, as done earlier: using per game statistics for each year for each team. After fitting this model, report the coefficients as weights relative to the coefficient for singles.\n18. We see that our linear regression model coefficients follow the same general trend as those used by OPS, but with slightly less weight for metrics other than singles. For each team in years after 1962, compute the OPS, the predicted runs with the regression model, and compute the correlation between the two, as well as the correlation with runs per game.\n19. We see that using the regression approach predicts runs slightly better than OPS, but not that much. However, note that we have been computing OPS and predicting runs for teams when these measures are used to evaluate players. Let’s show that OPS is quite similar to what one obtains with regression at the player level. For the 1962 season and onward, compute the OPS and the predicted runs from our model for each player, and plot them. Use the PA per game correction we used in the previous chapter:\n20. Which players have shown the largest difference between their rank by predicted runs and OPS?"
   },
   {
     "objectID": "linear-models/multivariate-regression.html#footnotes",
@@ -1393,57 +1393,57 @@
     "text": "https://www.flickr.com/photos/49707497@N06/7361631644↩︎\nhttps://www.flickr.com/photos/number10gov/↩︎\nhttps://creativecommons.org/licenses/by/2.0/↩︎"
   },
   {
-    "objectID": "ml/cross-validation.html#sec-knn-cv-intro",
-    "href": "ml/cross-validation.html#sec-knn-cv-intro",
+    "objectID": "ml/resampling-methods.html#sec-knn-cv-intro",
+    "href": "ml/resampling-methods.html#sec-knn-cv-intro",
     "title": "29  Resampling methods",
     "section": "\n29.1 Motivation with k-nearest neighbors",
-    "text": "29.1 Motivation with k-nearest neighbors\nWe are interested in estimating the conditional probability function:\n\\[\np(\\mathbf{x}) = \\mbox{Pr}(Y = 1 \\mid X_1 = x_1 , X_2 = x_2).\n\\]\nas defined in Section 28.6.\nWith k-nearest neighbors (kNN) we estimate \\(p(\\mathbf{x})\\) in a similar way to bin smoothing. First, we define the distance between all observations based on the features. Then, for any point \\(\\mathbf{x}_0\\), we estimate \\(p(\\mathbf{x})\\) by identifying the \\(k\\) nearest points to \\(mathbf{x}_0\\) and afterwards taking an average of the \\(y\\)s associated with these points. We refer to the set of points used to compute the average as the neighborhood.\nDue to the connection we described earlier between conditional expectations and conditional probabilities, this gives us \\(\\hat{p}(\\mathbf{x}_0)\\), just like the bin smoother gave us an estimate of a trend. As with bin smoothers, we can control the flexibility of our estimate through the \\(k\\) parameter: larger \\(k\\)s result in smoother estimates, while smaller \\(k\\)s result in more flexible and wiggly estimates.\nTo implement the algorithm, we can use the knn3 function from the caret package. Looking at the help file for this package, we see that we can call it in one of two ways. We will use the first way in which we specify a formula and a data frame. The data frame contains all the data to be used. The formula has the form outcome ~ predictor_1 + predictor_2 + predictor_3 and so on. Therefore, we type y ~ x_1 + x_2. If we are going to use variables in the data frame, we can use the . like this y ~ .. We also need to pick \\(k\\), which is set to k = 5 by default. The final call looks like this:\n\nlibrary(dslabs)\nlibrary(caret)\nknn_fit &lt;- knn3(y ~ ., data = mnist_27$train, k = 5)\n\nIn this case, since our dataset is balanced and we care just as much about sensitivity as we do about specificity, we will use accuracy to quantify performance.\nThe predict function for knn3 produces a probability for each class. We can keep the probability of being a 7 as the estimate \\(\\hat{p}(\\mathbf{x})\\) using type = \"prob\". Here we obtain the actual prediction using type = \"class\":\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.815\n\nWe see that kNN, with the default parameter, already beats regression. To see why this is the case, we plot \\(\\hat{p}(\\mathbf{x})\\) and compare it to the true conditional probability \\(p(\\mathbf{x})\\):\n\n\n\n\n\n\n\n\nWe see that kNN better adapts to the non-linear shape of \\(p(\\mathbf{x})\\). However, our estimate has some islands of blue in the red area, which intuitively does not make much sense. We notice that we have higher accuracy in the train set compared to the test set:\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$train, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$train$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.858\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.815\n\nThis is due to what we call over-training."
+    "text": "29.1 Motivation with k-nearest neighbors\nWe are interested in estimating the conditional probability function:\n\\[\np(\\mathbf{x}) = \\mbox{Pr}(Y = 1 \\mid X_1 = x_1 , X_2 = x_2).\n\\]\nas defined in Section 28.6.\nWith k-nearest neighbors (kNN) we estimate \\(p(\\mathbf{x})\\) in a similar way to bin smoothing. First, we define the distance between all observations based on the features. Then, for any point \\(\\mathbf{x}_0\\), we estimate \\(p(\\mathbf{x})\\) by identifying the \\(k\\) nearest points to \\(mathbf{x}_0\\) and afterwards taking an average of the \\(y\\)s associated with these points. We refer to the set of points used to compute the average as the neighborhood.\nDue to the connection we described earlier between conditional expectations and conditional probabilities, this gives us \\(\\hat{p}(\\mathbf{x}_0)\\), just like the bin smoother gave us an estimate of a trend. As with bin smoothers, we can control the flexibility of our estimate through the \\(k\\) parameter: larger \\(k\\)s result in smoother estimates, while smaller \\(k\\)s result in more flexible and wiggly estimates.\nTo implement the algorithm, we can use the knn3 function from the caret package. Looking at the help file for this package, we see that we can call it in one of two ways. We will use the first way in which we specify a formula and a data frame. The data frame contains all the data to be used. The formula has the form outcome ~ predictor_1 + predictor_2 + predictor_3 and so on. Therefore, we type y ~ x_1 + x_2. If we are going to use variables in the data frame, we can use the . like this y ~ .. We also need to pick \\(k\\), which is set to k = 5 by default. The final call looks like this:\n\nlibrary(dslabs)\nlibrary(caret)\nknn_fit &lt;- knn3(y ~ ., data = mnist_27$train, k = 5)\n\nIn this case, since our dataset is balanced and we care just as much about sensitivity as we do about specificity, we will use accuracy to quantify performance.\nThe predict function for knn3 produces a probability for each class. We can keep the probability of being a 7 as the estimate \\(\\hat{p}(\\mathbf{x})\\) using type = \"prob\". Here we obtain the actual prediction using type = \"class\":\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.815\n\nWe see that kNN, with the default parameter, already beats regression. To see why this is the case, we plot \\(\\hat{p}(\\mathbf{x})\\) and compare it to the true conditional probability \\(p(\\mathbf{x})\\):\n\n\n\n\n\n\n\n\nWe see that kNN better adapts to the non-linear shape of \\(p(\\mathbf{x})\\). However, our estimate has some islands of blue in the red area, which intuitively does not make much sense. We notice that we have higher accuracy in the train set compared to the test set:\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$train, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$train$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.859\n\ny_hat_knn &lt;- predict(knn_fit, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;    0.815\n\nThis is due to what we call over-training."
   },
   {
-    "objectID": "ml/cross-validation.html#over-training",
-    "href": "ml/cross-validation.html#over-training",
+    "objectID": "ml/resampling-methods.html#over-training",
+    "href": "ml/resampling-methods.html#over-training",
     "title": "29  Resampling methods",
     "section": "\n29.2 Over-training",
-    "text": "29.2 Over-training\nWith kNN, over-training is at its worst when we set \\(k = 1\\). With \\(k = 1\\), the estimate for each \\(\\mathbf{x}\\) in the training set is obtained with just the \\(y\\) corresponding to that point. In this case, if the \\(x_1\\) and \\(x_2\\) are unique, we will obtain perfect accuracy in the training set because each point is used to predict itself (if the predictors are not unique and have different outcomes for at least one set of predictors, then it is impossible to predict perfectly).\nHere we fit a kNN model with \\(k = 1\\) and confirm we get nearer to perfect accuracy in the training set:\n\nknn_fit_1 &lt;- knn3(y ~ ., data = mnist_27$train, k = 1)\ny_hat_knn_1 &lt;- predict(knn_fit_1, mnist_27$train, type = \"class\")\nconfusionMatrix(y_hat_knn_1, mnist_27$train$y)$overall[[\"Accuracy\"]]\n#&gt; [1] 0.991\n\nBut in the test set, accuracy is actually worse than what we obtained with regression:\n\ny_hat_knn_1 &lt;- predict(knn_fit_1, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn_1, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;     0.81\n\nWe can see the over-fitting problem by plotting the decision rule boundaries produced by \\(p(\\mathbf{x})\\):\n\n\n\n\n\n\n\n\nThe estimate \\(\\hat{p}(\\mathbf{x})\\) follows the training data too closely (left). You can see that, in the training set, boundaries have been drawn to perfectly surround a single red point in a sea of blue. Because most points \\(\\mathbf{x}\\) are unique, the prediction is either 1 or 0 and the prediction for that point is the associated label. However, once we introduce the test set (right), we see that many of these small islands now have the opposite color and we end up making several incorrect predictions."
+    "text": "29.2 Over-training\nWith kNN, over-training is at its worst when we set \\(k = 1\\). With \\(k = 1\\), the estimate for each \\(\\mathbf{x}\\) in the training set is obtained with just the \\(y\\) corresponding to that point. In this case, if the \\(x_1\\) and \\(x_2\\) are unique, we will obtain perfect accuracy in the training set because each point is used to predict itself (if the predictors are not unique and have different outcomes for at least one set of predictors, then it is impossible to predict perfectly).\nHere we fit a kNN model with \\(k = 1\\) and confirm we get nearer to perfect accuracy in the training set:\n\nknn_fit_1 &lt;- knn3(y ~ ., data = mnist_27$train, k = 1)\ny_hat_knn_1 &lt;- predict(knn_fit_1, mnist_27$train, type = \"class\")\nconfusionMatrix(y_hat_knn_1, mnist_27$train$y)$overall[[\"Accuracy\"]]\n#&gt; [1] 0.995\n\nBut in the test set, accuracy is actually worse than what we obtained with regression:\n\ny_hat_knn_1 &lt;- predict(knn_fit_1, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn_1, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;     0.81\n\nWe can see the over-fitting problem by plotting the decision rule boundaries produced by \\(p(\\mathbf{x})\\):\n\n\n\n\n\n\n\n\nThe estimate \\(\\hat{p}(\\mathbf{x})\\) follows the training data too closely (left). You can see that, in the training set, boundaries have been drawn to perfectly surround a single red point in a sea of blue. Because most points \\(\\mathbf{x}\\) are unique, the prediction is either 1 or 0 and the prediction for that point is the associated label. However, once we introduce the test set (right), we see that many of these small islands now have the opposite color and we end up making several incorrect predictions."
   },
   {
-    "objectID": "ml/cross-validation.html#over-smoothing",
-    "href": "ml/cross-validation.html#over-smoothing",
+    "objectID": "ml/resampling-methods.html#over-smoothing",
+    "href": "ml/resampling-methods.html#over-smoothing",
     "title": "29  Resampling methods",
     "section": "\n29.3 Over-smoothing",
     "text": "29.3 Over-smoothing\nAlthough not as badly as with \\(k=1\\), we saw that with \\(k = 5\\) we also over-trained. Hence, we should consider a larger \\(k\\). Let’s try, as an example, a much larger number: \\(k = 401\\).\n\nknn_fit_401 &lt;- knn3(y ~ ., data = mnist_27$train, k = 401)\ny_hat_knn_401 &lt;- predict(knn_fit_401, mnist_27$test, type = \"class\")\nconfusionMatrix(y_hat_knn_401, mnist_27$test$y)$overall[\"Accuracy\"]\n#&gt; Accuracy \n#&gt;     0.76\n\nThe estimate turns out to be similar to the one obtained with regression:\n\n\n\n\n\n\n\n\nIn this case, \\(k\\) is so large that it does not permit enough flexibility. We call this over-smoothing."
   },
   {
-    "objectID": "ml/cross-validation.html#parameter-tuning",
-    "href": "ml/cross-validation.html#parameter-tuning",
+    "objectID": "ml/resampling-methods.html#parameter-tuning",
+    "href": "ml/resampling-methods.html#parameter-tuning",
     "title": "29  Resampling methods",
     "section": "\n29.4 Parameter tuning",
     "text": "29.4 Parameter tuning\nIt is very common for machine learning algorithms to require that we set a value, or values, before we even fit the model. An example is the \\(k\\) in kNN. In Chapter 30 we learn of other examples. These values are referred to as parameters and an important step in machine learning in practice is picking or tunning those parameters.\nSo how do we tune these parameters? For example, how do we pick the \\(k\\) in kNN? In principle, we want to pick the \\(k\\) that maximizes accuracy or minimizes the expected MSE as defined in Section 26.8. The goal of resampling methods is to estimate these quantities for any given algorithm and set of tuning parameters such as \\(k\\). To understand why we need a special method to do this, let’s repeat what we did above, comparing the training set and test set accuracy, but for different values of \\(k\\). We can plot the accuracy estimates for each value of \\(k\\):\n\n\n\n\n\n\n\n\nFirst, note that the estimate obtained on the training set is generally lower than the estimate obtained with the test set, with the difference larger for smaller values of \\(k\\). This is due to over-training.\nSo do we simply pick the \\(k\\) that maximizes accuracy and report this accuracy? There are two problems with this approach:\n\nThe accuracy versus \\(k\\) plot is quite jagged. We do not expect this because small changes in \\(k\\) should not affect the algorithm’s performance so much. The jaggedness is explained by the fact that the accuracy is computed on a sample and therefore is a random variable.\nAlthough for each \\(k\\) we estimated MSE using the test set, we used the test set to pick the best \\(k\\). As a result, we should not expect this minimum test set accuracy to extrapolate to the real world.\n\nResampling methods provide a solution to both these problems."
   },
   {
-    "objectID": "ml/cross-validation.html#mathematical-description-of-resampling-methods",
-    "href": "ml/cross-validation.html#mathematical-description-of-resampling-methods",
+    "objectID": "ml/resampling-methods.html#mathematical-description-of-resampling-methods",
+    "href": "ml/resampling-methods.html#mathematical-description-of-resampling-methods",
     "title": "29  Resampling methods",
     "section": "\n29.5 Mathematical description of resampling methods",
     "text": "29.5 Mathematical description of resampling methods\nIn the previous section, we introduced kNN as an example to motivate the topic of this chapter. In this particular case, there is just one parameter, \\(k\\), that affects the performance of the algorithm. However, in general, machine algorithms may have multiple parameters so we use the notation \\(\\lambda\\) to represent any set of parameters needed to define a machine learning algorithm. We also introduce notation to distinguish the predictions you get with each set of parameters with \\(\\hat{y}(\\lambda)\\) and the MSE for this choice with \\(\\text{MSE}(\\lambda)\\). Our goal is to find the \\(\\lambda\\) that minimizes \\(\\text{MSE}(\\lambda)\\). Resampling method help us estimate \\(\\text{MSE}(\\lambda)\\).\nA intuitive first attempt is the apparent error defined in Section 26.8 and used in the previous section:\n\\[\n\\hat{\\mbox{MSE}}(\\lambda) = \\frac{1}{N}\\sum_{i = 1}^N \\left\\{\\hat{y}_i(\\lambda) - y_i\\right\\}^2\n\\]\nAs noted in the previous section, this estimate is a random error, based on just one test set, with enough variability to affect the choice of the best \\(\\lambda\\) substantially.\nNow imagine a world in which we could obtain data repeatedly, say from new random samples. We could take a very large number \\(B\\) of new samples, split them into training and test sets, and define:\n\\[\n\\frac{1}{B} \\sum_{b=1}^B \\frac{1}{N}\\sum_{i=1}^N \\left\\{\\hat{y}_i^b(\\lambda) - y_i^b\\right\\}^2\n\\]\nwith \\(y_i^b\\) the \\(i\\)th observation in sample \\(b\\) and \\(\\hat{y}_{i}^b(\\lambda)\\) the prediction obtained with the algorithm defined by parameter \\(\\lambda\\) and trained independently of \\(y_i^b\\). The law of large numbers tells us that as \\(B\\) becomes larger, this quantity gets closer and closer to \\(MSE(\\lambda)\\). This is of course a theoretical consideration as we rarely get access to more than one dataset for algorithm development, but the concept inspires the idea behind resampling methods.\nThe general idea is to generate a series of different random samples from the data at hand. There are several approaches to doing this, but all randomly generate several smaller datasets that are not used for training, and instead are used to estimate MSE. Next, we describe cross validation, one of the most widely used resampling resampling methods."
   },
   {
-    "objectID": "ml/cross-validation.html#cross-validation",
-    "href": "ml/cross-validation.html#cross-validation",
+    "objectID": "ml/resampling-methods.html#cross-validation",
+    "href": "ml/resampling-methods.html#cross-validation",
     "title": "29  Resampling methods",
     "section": "\n29.6 Cross validation",
     "text": "29.6 Cross validation\nOverall, we are provided a dataset (blue) and we need to build an algorithm, using this dataset, that will eventually be used in completely independent datasets (yellow) that we might not even see.\n\n\n\n\n\n\n\n\nSo to imitate this situation, we start by carving out a piece of our dataset and pretend it is an independent dataset: we divide the dataset into a training set (blue) and a test set (red). We will train the entirety of our algorithm, including the choice of parameter \\(\\lambda\\), exclusively on the training set and use the test set only for evaluation purposes.\nWe usually try to select a small piece of the dataset so that we have as much data as possible to train. However, we also want the test set to be large so that we obtain a stable estimate of the MSE without fitting an impractical number of models. Typical choices are to use 10%-20% of the data for testing.\n\n\n\n\n\n\n\n\nLet’s reiterate that it is indispensable that we not use the test set at all: not for filtering out rows, not for selecting features, not for anything!\nBut then how do we optimize \\(\\lambda\\)? In cross validation, we achieve this by splitting the training set into two: the training and validation sets.\n\n\n\n\n\n\n\n\nWe will do this many times assuring that the estimates of MSE obtained in each dataset are independent from each other. There are several proposed methods for doing this. Here we describe one of these approaches, K-fold cross validation, in detail to provide the general idea used in all approaches.\n\n29.6.1 K-fold cross validation\nAs a reminder, we are going to imitate the concept used when introducing this version of the MSE:\n\\[\n\\mbox{MSE}(\\lambda) \\approx\\frac{1}{B} \\sum_{b = 1}^B \\frac{1}{N}\\sum_{i = 1}^N \\left(\\hat{y}_i^b(\\lambda) - y_i^b\\right)^2\n\\]\nWe want to generate a dataset that can be thought of as independent random sample, and do this \\(B\\) times. The K in K-fold cross validation, represents the number of time \\(B\\). In the illustrations, we are showing an example that uses \\(B = 5\\).\nWe will eventually end up with \\(B\\) samples, but let’s start by describing how to construct the first: we simply pick \\(M = N/B\\) observations at random (we round if \\(M\\) is not a round number) and think of these as a random sample \\(y_1^b, \\dots, y_M^b\\), with \\(b = 1\\). We call this the validation set.\nNow we can fit the model in the training set, then compute the apparent error on the independent set:\n\\[\n\\hat{\\mbox{MSE}}_b(\\lambda) = \\frac{1}{M}\\sum_{i = 1}^M \\left(\\hat{y}_i^b(\\lambda) - y_i^b\\right)^2\n\\]\nAs a reminder, this is just one sample and will therefore return a noisy estimate of the true error. In K-fold cross validation, we randomly split the observations into \\(B\\) non-overlapping sets:\n\n\n\n\n\n\n\n\nNow we repeat the calculation above for each of these sets \\(b = 1,\\dots,B\\) and obtain \\(\\hat{\\mbox{MSE}}_1(\\lambda),\\dots, \\hat{\\mbox{MSE}}_B(\\lambda)\\). Then, for our final estimate, we compute the average:\n\\[\n\\hat{\\mbox{MSE}}(\\lambda) = \\frac{1}{B} \\sum_{b = 1}^B \\hat{\\mbox{MSE}}_b(\\lambda)\n\\]\nand obtain an estimate of our loss. A final step would be to select the \\(\\lambda\\) that minimizes the MSE.\n\n29.6.2 How many folds?\nNow how do we pick the cross validation fold? Large values of \\(B\\) are preferable because the training data better imitates the original dataset. However, larger values of \\(B\\) will have much slower computation time: for example, 100-fold cross validation will be 10 times slower than 10-fold cross validation. For this reason, the choices of \\(B = 5\\) and \\(B = 10\\) are popular.\nOne way we can improve the variance of our final estimate is to take more samples. To do this, we would no longer require the training set to be partitioned into non-overlapping sets. Instead, we would just pick \\(B\\) sets of some size at random.\n\n29.6.3 Estimate MSE of our optimized algorithm\nWe have described how to use cross validation to optimize parameters. However, we now have to take into account the fact that the optimization occurred on the training data and we therefore need an estimate of our final algorithm based on data that was not used to optimize the choice. Here is where we use the test set we separated early on:\n\n\n\n\n\n\n\n\nWe can actually do cross validation again:\n\n\n\n\n\n\n\n\nand obtain a final estimate of our expected loss. However, note that last cross validation iteration means that our entire compute time gets multiplied by \\(K\\). You will soon learn that fitting each algorithm takes time because we are performing many complex computations. As a result, we are always looking for ways to reduce this time. For the final evaluation, we often just use the one test set.\nOnce we are satisfied with this model and want to make it available to others, we could refit the model on the entire dataset, without changing the optimized parameters."
   },
   {
-    "objectID": "ml/cross-validation.html#boostrap-resampling",
-    "href": "ml/cross-validation.html#boostrap-resampling",
+    "objectID": "ml/resampling-methods.html#boostrap-resampling",
+    "href": "ml/resampling-methods.html#boostrap-resampling",
     "title": "29  Resampling methods",
     "section": "\n29.7 Boostrap resampling",
-    "text": "29.7 Boostrap resampling\nTypically, cross-validation involves partitioning the original dataset into a training set to train the model and a testing set to evaluate it. With the bootstrap approach, based on the ideas described in Chapter 10, you can create multiple different training datasets via bootstrapping. This method is sometimes called bootstrap aggregating or bagging.\nIn bootstrap resampling, we create a large number of bootstrap samples from the original training dataset. Each bootstrap sample is created by randomly selecting observations with replacement, usually the same size as the original training dataset. For each bootstrap sample, we fit the model and compute the MSE estimate on the observations not selected in the random sampling, referred to as the out-of-bag observations. These out-of-bag observations serve a similar role to a validation set in standard cross-validation.\nWe then average the MSEs obtained in the out-of-bag observations from each bootstrap sample to estimate the model’s performance.\nThis approach is actually the default approach in the caret package. We describe how to implement resampling methods with the caret package in the next chapter.\n\n29.7.1 Comparison of MSE estimates\nIn Section 29.1, we computed an estimate of MSE based just on the provided test set (shown in red in the plot below). Here we show how the cross-validation techniques described above help reduce variability. The green curve below shows the results of applying K-fold cross validation with 10 folds, leaving out 10% of the data for validation. We can see that the variance is reduced substantially. The blue curve is the result of using 100 bootstrap samples to estimate MSE. The variability is reduced even further, but at the cost of a 10 fold increase in computation time.\n\nset.seed(2023-11-30)\nboot &lt;- train(y~., method = \"knn\", tuneGrid = data.frame(k=ks), \n              data = mnist_27$train, \n              trControl = trainControl(number = 100))\ncv &lt;- train(y ~ ., method = \"knn\", \n            tuneGrid = data.frame(k = ks), \n            data = mnist_27$train,\n            trControl = trainControl(method = \"cv\", \n                                     number = 10, p = .9))\n\ndata.frame(k = ks, naive = accuracy[\"test\",], \n           cv = cv$results[,2],\n           boot = boot$results[,2]) |&gt;\n  pivot_longer(-k, values_to = \"accuracy\", names_to = \"set\") |&gt;\n  mutate(set = factor(set, levels = c(\"naive\", \"cv\", \"boot\"),\n                      labels = c(\"Simple\", \"K-fold\", \"Boostrap\"))) |&gt;\n  ggplot(aes(k, accuracy, color = set)) + \n  geom_line()"
+    "text": "29.7 Boostrap resampling\nTypically, cross-validation involves partitioning the original dataset into a training set to train the model and a testing set to evaluate it. With the bootstrap approach, based on the ideas described in Chapter 10, you can create multiple different training datasets via bootstrapping. This method is sometimes called bootstrap aggregating or bagging.\nIn bootstrap resampling, we create a large number of bootstrap samples from the original training dataset. Each bootstrap sample is created by randomly selecting observations with replacement, usually the same size as the original training dataset. For each bootstrap sample, we fit the model and compute the MSE estimate on the observations not selected in the random sampling, referred to as the out-of-bag observations. These out-of-bag observations serve a similar role to a validation set in standard cross-validation.\nWe then average the MSEs obtained in the out-of-bag observations from each bootstrap sample to estimate the model’s performance.\nThis approach is actually the default approach in the caret package. We describe how to implement resampling methods with the caret package in the next chapter.\n\n29.7.1 Comparison of MSE estimates\nIn Section 29.1, we computed an estimate of MSE based just on the provided test set (shown in red in the plot below). Here we show how the cross-validation techniques described above help reduce variability. The green curve below shows the results of applying K-fold cross validation with 10 folds, leaving out 10% of the data for validation. We can see that the variance is reduced substantially. The blue curve is the result of using 100 bootstrap samples to estimate MSE. The variability is reduced even further, but at the cost of a 10 fold increase in computation time."
   },
   {
-    "objectID": "ml/cross-validation.html#exercises",
-    "href": "ml/cross-validation.html#exercises",
+    "objectID": "ml/resampling-methods.html#exercises",
+    "href": "ml/resampling-methods.html#exercises",
     "title": "29  Resampling methods",
     "section": "\n29.8 Exercises",
     "text": "29.8 Exercises\nGenerate a set of random predictors and outcomes like this:\n\nset.seed(1996)\nn &lt;- 1000\np &lt;- 10000\nx &lt;- matrix(rnorm(n * p), n, p)\ncolnames(x) &lt;- paste(\"x\", 1:ncol(x), sep = \"_\")\ny &lt;- rbinom(n, 1, 0.5) |&gt; factor()\n\nx_subset &lt;- x[ ,sample(p, 100)]\n\n1. Because x and y are completely independent, you should not be able to predict y using x with accuracy larger than 0.5. Confirm this by running cross validation using logistic regression to fit the model. Because we have so many predictors, we selected a random sample x_subset. Use the subset when training the model. Hint: use the caret train function. The results component of the output of train shows you the accuracy. Ignore the warnings.\n2. Now instead of a random selection of predictors, we are going to search for those that are most predictive of the outcome. We can do this by comparing the values for the \\(y = 1\\) group to those in the \\(y = 0\\) group, for each predictor, using a t-test. You can perform this step as follows:\n\ndevtools::install_bioc(\"genefilter\")\ninstall.packages(\"genefilter\")\nlibrary(genefilter)\ntt &lt;- colttests(x, y)\n\nCreate a vector of the p-values and call it pvals.\n3. Create an index ind with the column numbers of the predictors that were “statistically significantly” associated with y. Use a p-value cutoff of 0.01 to define “statistically significant”. How many predictors survive this cutoff?\n4. Re-run the cross validation but after redefining x_subset to be the subset of x defined by the columns showing “statistically significant” association with y. What is the accuracy now?\n5. Re-run the cross validation again, but this time using kNN. Try out the following grid of tuning parameters: k = seq(101, 301, 25). Make a plot of the resulting accuracy.\n6. In exercises 3 and 4, we see that despite the fact that x and y are completely independent, we were able to predict y with accuracy higher than 70%. We must be doing something wrong then. What is it?\n\nThe function train estimates accuracy on the same data it uses to train the algorithm.\nWe are over-fitting the model by including 100 predictors.\nWe used the entire dataset to select the columns used in the model. This step needs to be included as part of the algorithm. The cross validation was done after this selection.\nThe high accuracy is just due to random variability.\n\n7. Advanced. Re-do the cross validation but this time include the selection step in the cross validation. The accuracy should now be close to 50%.\n8. Load the tissue_gene_expression dataset. Use the train function to predict tissue from gene expression. Use kNN. What k works best?\n9. The createResample function can be used to create bootstrap samples. For example, we can create 10 bootstrap samples for the mnist_27 dataset like this:\n\nset.seed(1995)\nindexes &lt;- createResample(mnist_27$train$y, 10)\n\nHow many times do 3, 4, and 7 appear in the first re-sampled index?\n10. We see that some numbers appear more than once and others appear no times. This must be so for each dataset to be independent. Repeat the exercise for all the re-sampled indexes."
@@ -1488,7 +1488,7 @@
     "href": "ml/algorithms.html#exercises",
     "title": "30  Examples of algorithms",
     "section": "\n30.6 Exercises",
-    "text": "30.6 Exercises\n1. Create a dataset using the following code:\n\nn &lt;- 100\nSigma &lt;- 9*matrix(c(1.0, 0.5, 0.5, 1.0), 2, 2)\ndat &lt;- MASS::mvrnorm(n = 100, c(69, 69), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"x\", \"y\"))\n\nUse the caret package to partition into a test and training set of equal size. Train a linear model and report the RMSE. Repeat this exercise 100 times and make a histogram of the RMSEs and report the average and standard deviation. Hint: adapt the code shown earlier like this:\n\nlibrary(caret)\ny &lt;- dat$y\ntest_index &lt;- createDataPartition(y, times = 1, p = 0.5, list = FALSE)\ntrain_set &lt;- dat |&gt; slice(-test_index)\ntest_set &lt;- dat |&gt; slice(test_index)\nfit &lt;- lm(y ~ x, data = train_set)\ny_hat &lt;- fit$coef[1] + fit$coef[2]*test_set$x\nsqrt(mean((y_hat - test_set$y)^2))\n\nand put it inside a call to replicate.\n2. Now we will repeat the above but using larger datasets. Repeat exercise 1 but for datasets with n &lt;- c(100, 500, 1000, 5000, 10000). Save the average and standard deviation of RMSE from the 100 repetitions. Hint: use the sapply or map functions.\n3. Describe what you observe with the RMSE as the size of the dataset becomes larger.\n\nOn average, the RMSE does not change much as n gets larger, while the variability of RMSE does decrease.\nBecause of the law of large numbers, the RMSE decreases: more data, more precise estimates.\n\nn = 10000 is not sufficiently large. To see a decrease in RMSE, we need to make it larger.\nThe RMSE is not a random variable.\n\n4. Now repeat exercise 1, but this time make the correlation between x and y larger by changing Sigma like this:\n\nn &lt;- 100\nSigma &lt;- 9*matrix(c(1, 0.95, 0.95, 1), 2, 2)\ndat &lt;- MASS::mvrnorm(n = 100, c(69, 69), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"x\", \"y\"))\n\nRepeat the exercise and note what happens to the RMSE now.\n5. Which of the following best explains why the RMSE in exercise 4 is so much lower than exercise 1:\n\nIt is just luck. If we do it again, it will be larger.\nThe Central Limit Theorem tells us the RMSE is normal.\nWhen we increase the correlation between x and y, x has more predictive power and thus provides a better estimate of y. This correlation has a much bigger effect on RMSE than n. Large n simply provide us more precise estimates of the linear model coefficients.\nThese are both examples of regression, so the RMSE has to be the same.\n\n6. Create a dataset using the following code:\n\nn &lt;- 1000\nSigma &lt;- matrix(c(1, 3/4, 3/4, 3/4, 1, 0, 3/4, 0, 1), 3, 3)\ndat &lt;- MASS::mvrnorm(n = 100, c(0, 0, 0), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"y\", \"x_1\", \"x_2\"))\n\nNote that y is correlated with both x_1 and x_2, but the two predictors are independent of each other.\n\ncor(dat)\n\nUse the caret package to partition into a test and training set of equal size. Compare the RMSE when using just x_1, just x_2, and both x_1 and x_2. Train a linear model and report the RMSE.\n7. Repeat exercise 6 but now create an example in which x_1 and x_2 are highly correlated:\n\nn &lt;- 1000\nSigma &lt;- matrix(c(1.0, 0.75, 0.75, 0.75, 1.0, 0.95, 0.75, 0.95, 1.0), 3, 3)\ndat &lt;- MASS::mvrnorm(n = 100, c(0, 0, 0), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"y\", \"x_1\", \"x_2\"))\n\nUse the caret package to partition into a test and training set of equal size. Compare the RMSE when using just x_1, just x_2, and both x_1 and x_2 Train a linear model and report the RMSE.\n8. Compare the results in 6 and 7 and choose the statement you agree with:\n\nAdding extra predictors can improve RMSE substantially, but not when they are highly correlated with another predictor.\nAdding extra predictors improves predictions equally in both exercises.\nAdding extra predictors results in over fitting.\nUnless we include all predictors, we have no predicting power.\n\n9. Define the following dataset:\n\nmake_data &lt;- function(n = 1000, p = 0.5, \n                      mu_0 = 0, mu_1 = 2, \n                      sigma_0 = 1,  sigma_1 = 1){\n  y &lt;- rbinom(n, 1, p)\n  f_0 &lt;- rnorm(n, mu_0, sigma_0)\n  f_1 &lt;- rnorm(n, mu_1, sigma_1)\n  x &lt;- ifelse(y == 1, f_1, f_0)\n  test_index &lt;- createDataPartition(y, times = 1, p = 0.5, list = FALSE)\n  list(train = data.frame(x = x, y = as.factor(y)) |&gt; \n         slice(-test_index),\n       test = data.frame(x = x, y = as.factor(y)) |&gt; \n         slice(test_index))\n}\n\nNote that we have defined a variable x that is predictive of a binary outcome y.\n\ndat$train |&gt; ggplot(aes(x, color = y)) + geom_density()\n\nCompare the accuracy of linear regression and logistic regression.\n10. Repeat the simulation from exercise 1 100 times and compare the average accuracy for each method and notice they give practically the same answer.\n11. Generate 25 different datasets changing the difference between the two class: delta &lt;- seq(0, 3, len = 25). Plot accuracy versus delta.\n12. If we add 1s to our 2 or 7 examples, we get data that looks like this:\n\n\n\n\n\n\n\n\nFit QDA using the qda function in the MASS package the create a confusion matrix for predictions on the test. Which of the following best describes the confusion matrix:\na. It is a two-by-two table. b. Because we have three classes, it is a two-by-three table. c. Because we have three classes, it is a three-by-three table. d. Confusion matrices only make sense when the outcomes are binary.\n13. The byClass component returned by the confusionMatrix object provides sensitivity and specificity for each class. Because these terms only make sense when data is binary, each row represents sensitivity and specificity when a particular class is 1 (positives) and the other two are considered 0s (negatives). Based on the values returned by confusionMatrix, which of the following is the most common mistake:\na. Calling 1s either a 2 or 7. b. Calling 2s either a 1 or 7. c. Calling 7s either a 1 or 2. d. All mistakes are equally common.\n\nCreate a grid of x_1 and x_2 using:\n\n\nGS &lt;- 150\nnew_x &lt;- with(mnist_127$train,\n  expand.grid(x_1 = seq(min(x_1), max(x_1), len = GS),\n              x_2 = seq(min(x_2), max(x_2), len = GS)))\n\nthen visualize the decision rule by coloring the regions of the Cartesian plan to represent the label that would be called in that region.\n14. Repeat exercise 13 but for LDA. Which of the following explains why LDA has worse accuracy:\n\nLDA separates the space with lines making it too rigid.\nLDA divides the space into two and there are three classes.\nLDA is very similar to QDA the difference is due to chance.\nLDA can’t be used with more than one class.\n\n15. Now repeat exercise 13 for kNN with \\(k=31\\) and compute and compare the overall accuracy for all three methods.\n\nTo understand how a simple method like kNN can outperform a model that explicitly tries to emulate Bayes’ rule, explore the conditional distributions of x_1 and x_2 to see if the normal approximation holds. Generative models can be very powerful, but only when we are able to successfully approximate the joint distribution of predictors conditioned on each class.\n\n17. Earlier we used logistic regression to predict sex from height. Use kNN to do the same. Use the code described in this chapter to select the \\(F_1\\) measure and plot it against \\(k\\). Compare to the \\(F_1\\) of about 0.6 we obtained with regression.\n18. Create a simple dataset where the outcome grows 0.75 units on average for every increase in a predictor:\n\nn &lt;- 1000\nsigma &lt;- 0.25\nx &lt;- rnorm(n, 0, 1)\ny &lt;- 0.75 * x + rnorm(n, 0, sigma)\ndat &lt;- data.frame(x = x, y = y)\n\nUse rpart to fit a regression tree and save the result to fit.\n19. Plot the final tree so that you can see where the partitions occurred.\n20. Make a scatterplot of y versus x along with the predicted values based on the fit.\n21. Now model with a random forest instead of a regression tree using randomForest from the randomForest package, and remake the scatterplot with the prediction line.\n22. Use the function plot to see if the random forest has converged or if we need more trees.\n23. It seems that the default values for the random forest result in an estimate that is too flexible (not smooth). Re-run the random forest but this time with nodesize set at 50 and maxnodes set at 25. Remake the plot.\n24. This **dslabs* dataset includes the tissue_gene_expression with a matrix x:\n\nlibrary(dslabs)\ndim(tissue_gene_expression$x)\n\nwith the gene expression measured on 500 genes for 189 biological samples representing seven different tissues. The tissue type is stored in tissue_gene_expression$y.\n\ntable(tissue_gene_expression$y)\n\nFit a random forest using the randomForest function in the package randomForest. Then use the varImp function to see which are the top 10 most predictive genes. Make a histogram of the reported importance to get an idea of the distribution of the importance values.\n\nlibrary(randomForest)\nfit_rf &lt;- with(tissue_gene_expression, randomForest(x, y))\nvi &lt;- varImp(fit_rf) \nvi |&gt; top_n(10, Overall) |&gt; arrange(desc(Overall))\n#&gt;          Overall\n#&gt; GPA33       3.61\n#&gt; KIF2C       2.65\n#&gt; CLIP3       2.55\n#&gt; RARRES2     2.29\n#&gt; COLGALT2    2.27\n#&gt; LRRN3       2.26\n#&gt; CEP55       2.21\n#&gt; GTF2IRD1    2.15\n#&gt; KCTD2       2.12\n#&gt; LTBR        2.01\nhist(vi$Overall, breaks = seq(0,4,0.1))"
+    "text": "30.6 Exercises\n1. Create a dataset using the following code:\n\nn &lt;- 100\nSigma &lt;- 9*matrix(c(1.0, 0.5, 0.5, 1.0), 2, 2)\ndat &lt;- MASS::mvrnorm(n = 100, c(69, 69), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"x\", \"y\"))\n\nUse the caret package to partition into a test and training set of equal size. Train a linear model and report the RMSE. Repeat this exercise 100 times and make a histogram of the RMSEs and report the average and standard deviation. Hint: adapt the code shown earlier like this:\n\nlibrary(caret)\ny &lt;- dat$y\ntest_index &lt;- createDataPartition(y, times = 1, p = 0.5, list = FALSE)\ntrain_set &lt;- dat |&gt; slice(-test_index)\ntest_set &lt;- dat |&gt; slice(test_index)\nfit &lt;- lm(y ~ x, data = train_set)\ny_hat &lt;- fit$coef[1] + fit$coef[2]*test_set$x\nsqrt(mean((y_hat - test_set$y)^2))\n\nand put it inside a call to replicate.\n2. Now we will repeat the above but using larger datasets. Repeat exercise 1 but for datasets with n &lt;- c(100, 500, 1000, 5000, 10000). Save the average and standard deviation of RMSE from the 100 repetitions. Hint: use the sapply or map functions.\n3. Describe what you observe with the RMSE as the size of the dataset becomes larger.\n\nOn average, the RMSE does not change much as n gets larger, while the variability of RMSE does decrease.\nBecause of the law of large numbers, the RMSE decreases: more data, more precise estimates.\n\nn = 10000 is not sufficiently large. To see a decrease in RMSE, we need to make it larger.\nThe RMSE is not a random variable.\n\n4. Now repeat exercise 1, but this time make the correlation between x and y larger by changing Sigma like this:\n\nn &lt;- 100\nSigma &lt;- 9*matrix(c(1, 0.95, 0.95, 1), 2, 2)\ndat &lt;- MASS::mvrnorm(n = 100, c(69, 69), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"x\", \"y\"))\n\nRepeat the exercise and note what happens to the RMSE now.\n5. Which of the following best explains why the RMSE in exercise 4 is so much lower than exercise 1:\n\nIt is just luck. If we do it again, it will be larger.\nThe Central Limit Theorem tells us the RMSE is normal.\nWhen we increase the correlation between x and y, x has more predictive power and thus provides a better estimate of y. This correlation has a much bigger effect on RMSE than n. Large n simply provide us more precise estimates of the linear model coefficients.\nThese are both examples of regression, so the RMSE has to be the same.\n\n6. Create a dataset using the following code:\n\nn &lt;- 1000\nSigma &lt;- matrix(c(1, 3/4, 3/4, 3/4, 1, 0, 3/4, 0, 1), 3, 3)\ndat &lt;- MASS::mvrnorm(n = 100, c(0, 0, 0), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"y\", \"x_1\", \"x_2\"))\n\nNote that y is correlated with both x_1 and x_2, but the two predictors are independent of each other.\n\ncor(dat)\n\nUse the caret package to partition into a test and training set of equal size. Compare the RMSE when using just x_1, just x_2, and both x_1 and x_2. Train a linear model and report the RMSE.\n7. Repeat exercise 6 but now create an example in which x_1 and x_2 are highly correlated:\n\nn &lt;- 1000\nSigma &lt;- matrix(c(1.0, 0.75, 0.75, 0.75, 1.0, 0.95, 0.75, 0.95, 1.0), 3, 3)\ndat &lt;- MASS::mvrnorm(n = 100, c(0, 0, 0), Sigma) |&gt;\n  data.frame() |&gt; setNames(c(\"y\", \"x_1\", \"x_2\"))\n\nUse the caret package to partition into a test and training set of equal size. Compare the RMSE when using just x_1, just x_2, and both x_1 and x_2 Train a linear model and report the RMSE.\n8. Compare the results in 6 and 7 and choose the statement you agree with:\n\nAdding extra predictors can improve RMSE substantially, but not when they are highly correlated with another predictor.\nAdding extra predictors improves predictions equally in both exercises.\nAdding extra predictors results in over fitting.\nUnless we include all predictors, we have no predicting power.\n\n9. Define the following dataset:\n\nmake_data &lt;- function(n = 1000, p = 0.5, \n                      mu_0 = 0, mu_1 = 2, \n                      sigma_0 = 1,  sigma_1 = 1){\n  y &lt;- rbinom(n, 1, p)\n  f_0 &lt;- rnorm(n, mu_0, sigma_0)\n  f_1 &lt;- rnorm(n, mu_1, sigma_1)\n  x &lt;- ifelse(y == 1, f_1, f_0)\n  test_index &lt;- createDataPartition(y, times = 1, p = 0.5, list = FALSE)\n  list(train = data.frame(x = x, y = as.factor(y)) |&gt; \n         slice(-test_index),\n       test = data.frame(x = x, y = as.factor(y)) |&gt; \n         slice(test_index))\n}\n\nNote that we have defined a variable x that is predictive of a binary outcome y.\n\ndat$train |&gt; ggplot(aes(x, color = y)) + geom_density()\n\nCompare the accuracy of linear regression and logistic regression.\n10. Repeat the simulation from exercise 1 100 times and compare the average accuracy for each method and notice they give practically the same answer.\n11. Generate 25 different datasets changing the difference between the two class: delta &lt;- seq(0, 3, len = 25). Plot accuracy versus delta.\n12. We cam see what the data looks like if we add 1s to our 2 or 7 examples using this code:\n\nlibrary(dslabs)\nmnist_127$train |&gt; ggplot(aes(x_1, x_2, color = y)) + geom_point()\n\nFit QDA using the qda function in the MASS package the create a confusion matrix for predictions on the test. Which of the following best describes the confusion matrix:\na. It is a two-by-two table. b. Because we have three classes, it is a two-by-three table. c. Because we have three classes, it is a three-by-three table. d. Confusion matrices only make sense when the outcomes are binary.\n13. The byClass component returned by the confusionMatrix object provides sensitivity and specificity for each class. Because these terms only make sense when data is binary, each row represents sensitivity and specificity when a particular class is 1 (positives) and the other two are considered 0s (negatives). Based on the values returned by confusionMatrix, which of the following is the most common mistake:\na. Calling 1s either a 2 or 7. b. Calling 2s either a 1 or 7. c. Calling 7s either a 1 or 2. d. All mistakes are equally common.\n\nCreate a grid of x_1 and x_2 using:\n\n\nGS &lt;- 150\nnew_x &lt;- with(mnist_127$train,\n  expand.grid(x_1 = seq(min(x_1), max(x_1), len = GS),\n              x_2 = seq(min(x_2), max(x_2), len = GS)))\n\nthen visualize the decision rule by coloring the regions of the Cartesian plan to represent the label that would be called in that region.\n14. Repeat exercise 13 but for LDA. Which of the following explains why LDA has worse accuracy:\n\nLDA separates the space with lines making it too rigid.\nLDA divides the space into two and there are three classes.\nLDA is very similar to QDA the difference is due to chance.\nLDA can’t be used with more than one class.\n\n15. Now repeat exercise 13 for kNN with \\(k=31\\) and compute and compare the overall accuracy for all three methods.\n\nTo understand how a simple method like kNN can outperform a model that explicitly tries to emulate Bayes’ rule, explore the conditional distributions of x_1 and x_2 to see if the normal approximation holds. Generative models can be very powerful, but only when we are able to successfully approximate the joint distribution of predictors conditioned on each class.\n\n17. Earlier we used logistic regression to predict sex from height. Use kNN to do the same. Use the code described in this chapter to select the \\(F_1\\) measure and plot it against \\(k\\). Compare to the \\(F_1\\) of about 0.6 we obtained with regression.\n18. Create a simple dataset where the outcome grows 0.75 units on average for every increase in a predictor:\n\nn &lt;- 1000\nsigma &lt;- 0.25\nx &lt;- rnorm(n, 0, 1)\ny &lt;- 0.75 * x + rnorm(n, 0, sigma)\ndat &lt;- data.frame(x = x, y = y)\n\nUse rpart to fit a regression tree and save the result to fit.\n19. Plot the final tree so that you can see where the partitions occurred.\n20. Make a scatterplot of y versus x along with the predicted values based on the fit.\n21. Now model with a random forest instead of a regression tree using randomForest from the randomForest package, and remake the scatterplot with the prediction line.\n22. Use the function plot to see if the random forest has converged or if we need more trees.\n23. It seems that the default values for the random forest result in an estimate that is too flexible (not smooth). Re-run the random forest but this time with nodesize set at 50 and maxnodes set at 25. Remake the plot.\n24. This **dslabs* dataset includes the tissue_gene_expression with a matrix x:\n\nlibrary(dslabs)\ndim(tissue_gene_expression$x)\n\nwith the gene expression measured on 500 genes for 189 biological samples representing seven different tissues. The tissue type is stored in tissue_gene_expression$y.\n\ntable(tissue_gene_expression$y)\n\nFit a random forest using the randomForest function in the package randomForest. Then use the varImp function to see which are the top 10 most predictive genes. Make a histogram of the reported importance to get an idea of the distribution of the importance values."
   },
   {
     "objectID": "ml/algorithms.html#footnotes",
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 9ecd022..e83cb39 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -2,166 +2,166 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/index.html</loc>
-    <lastmod>2024-01-04T13:50:50.302Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.646Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/intro.html</loc>
-    <lastmod>2024-01-04T13:50:50.308Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.652Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/summaries/intro-summaries.html</loc>
-    <lastmod>2024-01-04T13:50:50.313Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.657Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/summaries/distributions.html</loc>
-    <lastmod>2024-01-04T13:50:50.337Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.680Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/summaries/robust-summaries.html</loc>
-    <lastmod>2024-01-04T13:50:50.346Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.689Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/prob/intro-to-prob.html</loc>
-    <lastmod>2024-01-04T13:50:50.349Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.692Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/prob/discrete-probability.html</loc>
-    <lastmod>2024-01-04T13:50:50.370Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.713Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/prob/continuous-probability.html</loc>
-    <lastmod>2024-01-04T13:50:50.381Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.724Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/prob/random-variables-sampling-models-clt.html</loc>
-    <lastmod>2024-01-04T13:50:50.394Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.737Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/intro-inference.html</loc>
-    <lastmod>2024-01-04T13:50:50.398Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.741Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/parameters-estimates.html</loc>
-    <lastmod>2024-01-04T13:50:50.407Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.749Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/clt.html</loc>
-    <lastmod>2024-01-04T13:50:50.414Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.757Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/confidence-intervals.html</loc>
-    <lastmod>2024-01-04T13:50:50.424Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.766Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/hypothesis-testing.html</loc>
-    <lastmod>2024-01-04T13:50:50.429Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.772Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/bootstrap.html</loc>
-    <lastmod>2024-01-04T13:50:50.437Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.779Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/models.html</loc>
-    <lastmod>2024-01-04T13:50:50.450Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.792Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/bayes.html</loc>
-    <lastmod>2024-01-04T13:50:50.459Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.802Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/inference/hierarchical-models.html</loc>
-    <lastmod>2024-01-04T13:50:50.489Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.832Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/intro-to-linear-models.html</loc>
-    <lastmod>2024-01-04T13:50:50.493Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.836Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/regression.html</loc>
-    <lastmod>2024-01-04T13:50:50.515Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.859Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/multivariate-regression.html</loc>
-    <lastmod>2024-01-04T13:50:50.539Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.883Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/measurement-error-models.html</loc>
-    <lastmod>2024-01-04T13:50:50.547Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.890Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/treatment-effect-models.html</loc>
-    <lastmod>2024-01-04T13:50:50.561Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.904Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/association-tests.html</loc>
-    <lastmod>2024-01-04T13:50:50.575Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.918Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/linear-models/association-not-causation.html</loc>
-    <lastmod>2024-01-04T13:50:50.588Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.931Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/intro-highdim.html</loc>
-    <lastmod>2024-01-04T13:50:50.592Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.935Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/matrices-in-R.html</loc>
-    <lastmod>2024-01-04T13:50:50.607Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.950Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/linear-algebra.html</loc>
-    <lastmod>2024-01-04T13:50:50.618Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.961Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/dimension-reduction.html</loc>
-    <lastmod>2024-01-04T13:50:50.633Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.976Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/regularization.html</loc>
-    <lastmod>2024-01-04T13:50:50.652Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.994Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/highdim/matrix-factorization.html</loc>
-    <lastmod>2024-01-04T13:50:50.674Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.016Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/intro-ml.html</loc>
-    <lastmod>2024-01-04T13:50:50.677Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.020Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/notation-and-terminology.html</loc>
-    <lastmod>2024-01-04T13:50:50.682Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.024Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/evaluation-metrics.html</loc>
-    <lastmod>2024-01-04T13:50:50.698Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.040Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/conditionals.html</loc>
-    <lastmod>2024-01-04T13:50:50.703Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.045Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/smoothing.html</loc>
-    <lastmod>2024-01-04T13:50:50.716Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.059Z</lastmod>
   </url>
   <url>
-    <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/cross-validation.html</loc>
-    <lastmod>2024-01-04T13:50:50.729Z</lastmod>
+    <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/resampling-methods.html</loc>
+    <lastmod>2024-01-04T22:27:37.071Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/algorithms.html</loc>
-    <lastmod>2024-01-04T13:50:50.755Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.095Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/ml-in-practice.html</loc>
-    <lastmod>2024-01-04T13:50:50.777Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.118Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/ml/clustering.html</loc>
-    <lastmod>2024-01-04T13:50:50.787Z</lastmod>
+    <lastmod>2024-01-04T22:27:37.127Z</lastmod>
   </url>
   <url>
     <loc>http://rafalab.dfci.harvard.edu/dsbook-part-2/Advanced-Data-Science.pdf</loc>
-    <lastmod>2024-01-04T13:50:50.112Z</lastmod>
+    <lastmod>2024-01-04T22:27:36.466Z</lastmod>
   </url>
 </urlset>
diff --git a/docs/summaries/distributions.html b/docs/summaries/distributions.html
index bc18931..cbf42fb 100644
--- a/docs/summaries/distributions.html
+++ b/docs/summaries/distributions.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/summaries/intro-summaries.html b/docs/summaries/intro-summaries.html
index ad49c02..14c92ad 100644
--- a/docs/summaries/intro-summaries.html
+++ b/docs/summaries/intro-summaries.html
@@ -353,7 +353,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/docs/summaries/robust-summaries.html b/docs/summaries/robust-summaries.html
index 1ffd8e1..6fc50bd 100644
--- a/docs/summaries/robust-summaries.html
+++ b/docs/summaries/robust-summaries.html
@@ -373,7 +373,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
+  <a href="../ml/resampling-methods.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
   </div>
 </li>
diff --git a/highdim/dimension-reduction.qmd b/highdim/dimension-reduction.qmd
index 51dc85d..f1b5c06 100644
--- a/highdim/dimension-reduction.qmd
+++ b/highdim/dimension-reduction.qmd
@@ -97,7 +97,7 @@ z_1 = r \cos(\phi+ \theta), \,\,
 z_2 = r \sin(\phi + \theta)
 $$
 
-```{r, echo=FALSE, fig.asp=0.7}
+```{r rotation-diagram, echo=FALSE, fig.asp=0.7}
 draw.circle <- function(angle, start = 0, center = c(0,0), r = 0.25){
   th <- seq(start, start + angle, length.out = 100)
   x <- center[1] + r*cos(th)
@@ -136,7 +136,7 @@ $$
 
 Now we can rotate each point in the dataset by simply applying the formula above to each pair $(x_{i,1}, x_{i,2})^\top$. Here is what the twin standardized heights look like after rotating each point by $-45$ degrees:
 
-```{r, fig.width = 6, fig.height = 3, echo=FALSE}
+```{r before-after-rotation, fig.width = 6, fig.height = 3, echo=FALSE}
 z <- cbind((x[,1] + x[,2]) / sqrt(2), (x[,2] - x[,1]) / sqrt(2))
 lim <- range(z)
 rafalib::mypar(1,2)
@@ -469,7 +469,7 @@ abline(0,1, col = "red")
 
 We also notice that the two groups, adults and children, can be clearly observed with the one number summary, better than with any of the two original dimensions.
 
-```{r}
+```{r histograms-of-dimensions}
 #| echo: false
 rafalib::mypar(1,3)
 hist(x[,1], breaks = seq(-4,4,0.5))
diff --git a/highdim/linear-algebra.qmd b/highdim/linear-algebra.qmd
index 09ea5d2..9c1c7c3 100644
--- a/highdim/linear-algebra.qmd
+++ b/highdim/linear-algebra.qmd
@@ -186,7 +186,7 @@ Many of the analyses we perform with high-dimensional data relate directly or in
 
 To define distance, we introduce another linear algebra concept: the *norm*. Recall that a point in two dimensions can be represented in polar coordinates as:
 
-```{r, echo=FALSE, fig.asp=0.7}
+```{r polar-coords, echo=FALSE, fig.asp=0.7}
 draw.circle <- function(angle, start = 0, center = c(0,0), r = 0.25){
   th <- seq(start, start + angle, length.out = 100)
   x <- center[1] + r*cos(th)
@@ -196,13 +196,13 @@ draw.circle <- function(angle, start = 0, center = c(0,0), r = 0.25){
 
 rafalib::mypar()
 rafalib::nullplot(-0.25, 1.75,-0.25, 1.05, axes = FALSE)
-abline(h=0,v=0, col= "grey")
+abline(h = 0,v = 0, col = "grey")
 theta <- pi/6
 arrows(0, 0, 0.975*cos(theta), 0.975*sin(theta), length = 0.1)
 points(cos(theta), sin(theta), pch = 16)
 text(0.3*cos(theta/2), 0.3*sin(theta/2), expression(theta), font = 2)
 text(cos(theta), sin(theta), expression('(' * x[1] * ',' * x[2] *  ') = (' * phantom(.) * 'r' * phantom(.) * 'cos('*theta*'), r' * phantom(.) * 'sin('*theta*')' * phantom(.) * ')' ),
-     pos=4)
+     pos = 4)
 draw.circle(theta)
 ```
 
@@ -220,20 +220,20 @@ $$
 
 To define distance, suppose we have two two-dimensional points: $\mathbf{x}_1$ and $\mathbf{x}_2$. We can define how similar they are by simply using euclidean distance:
 
-```{r, echo=FALSE, fig.asp=0.7}
+```{r euclidean-dist-diagram, echo=FALSE, fig.asp=0.7}
 rafalib::mypar()
 rafalib::nullplot(-0.5, 4,-0.5, 2, axes = FALSE)
-abline(h=0,v=0, col= "grey")
+abline(h = 0,v = 0, col = "grey")
 x1 <- 1; x2 <- 1; y1 <- 3; y2 <- 1.5
 points(x1, x2, pch = 16)
-text(x1-0.35, x2,expression('(' * x[11] * ',' * x[21] *  ')'))
+text(x1 - 0.35, x2,expression('(' * x[11] * ',' * x[21] *  ')'))
 points(y1, y2, pch = 16)
-text(y1, y2+0.15,expression('(' * x[12] * ',' * x[22] *  ')'))
+text(y1, y2 + 0.15,expression('(' * x[12] * ',' * x[22] *  ')'))
 lines(c(x1,y1),c(x2,y2))
-pBrackets::brackets(x1, x2-0.05, y1, x2-0.05, h=-0.1)
-text((x1+y1)/2, x1-0.2, expression(x[11]-x[12]))
-pBrackets::brackets(y1+0.05, x2, y1+0.05, y2, h=-0.1)
-text(y1+0.25,(x2+y2)/2, srt = 270, expression(x[21]-x[22]))
+pBrackets::brackets(x1, x2 - 0.05, y1, x2 - 0.05, h = -0.1)
+text((x1 + y1)/2, x1 - 0.2, expression(x[11] - x[12]))
+pBrackets::brackets(y1 + 0.05, x2, y1 + 0.05, y2, h = -0.1)
+text(y1 + 0.25,(x2 + y2)/2, srt = 270, expression(x[21] - x[22]))
 ```
 
 We know that the distance is equal to the length of the hypotenuse:
diff --git a/highdim/matrix-factorization.qmd b/highdim/matrix-factorization.qmd
index 6b84ffc..49dcf7f 100644
--- a/highdim/matrix-factorization.qmd
+++ b/highdim/matrix-factorization.qmd
@@ -147,20 +147,20 @@ p <- t(qr.solve(crossprod(q)) %*% t(q) %*% t(e))
 
 The histogram below shows there are three type of users: those that love mob movies and hate romance movies, those that don't care, and those that love romance movies and hate mob movies. 
 
-```{r}
+```{r factor-histogram-example}
 hist(p, breaks = seq(-2,2,0.1))
 ```
 
 To see that we can approximate $\varepsilon_{i,j}$ with $p_iq_j we convert the vectors to matrices and use linear algebra:
 
-```{r}
+```{r e-vs-pq}
 p <- matrix(p); q <- matrix(q)
 plot(p %*% t(q), e)
 ```
 
 However, after removing this mob/romance effect, we still see structure in the correlation:
 
-```{r}
+```{r corr-due-to-alpacino}
 cor(e - p %*% t(q))
 ```
 
@@ -181,7 +181,7 @@ Note that we use the transpose `t` because `apply` binds results into columns an
 
 Our approximation based on two factors does a even better job of predicting how our residuals deviate from 0:
 
-```{r}
+```{r e-vs-pq-2}
 plot(p %*% t(q), e)
 ```
 
diff --git a/linear-models/regression.qmd b/linear-models/regression.qmd
index 9fbca69..e47af65 100644
--- a/linear-models/regression.qmd
+++ b/linear-models/regression.qmd
@@ -642,7 +642,7 @@ In fact residuals are often denoted as $\hat{\varepsilon}$. This motivates sever
 
 We prefer plots rather than summaries based on, for example, correlation because, as noted in Section @ascombe, correlation is not always the best summary of association. The function `plot` applied to an `lm` object automatically plots these.
 
-```{r}
+```{r diagnostic-plots}
 #| echo: false
 #| out-width: 100%
 
diff --git a/ml/algorithms.qmd b/ml/algorithms.qmd
index 765c358..fcf4280 100644
--- a/ml/algorithms.qmd
+++ b/ml/algorithms.qmd
@@ -972,9 +972,9 @@ Compare the accuracy of linear regression and logistic regression.
 
 11\. Generate 25 different datasets changing the difference between the two class: `delta <- seq(0, 3, len = 25)`. Plot accuracy versus `delta`.
 
-12\. If we add 1s to our 2 or 7 examples, we get data that looks like this:
+12\. We cam see what the data looks like if we add 1s to our 2 or 7 examples using this code:
 
-```{r, echo=FALSE}
+```{r, eval=FALSE}
 library(dslabs)
 mnist_127$train |> ggplot(aes(x_1, x_2, color = y)) + geom_point()
 ```
@@ -1062,13 +1062,5 @@ table(tissue_gene_expression$y)
 
 Fit a random forest using the `randomForest` function in the package **randomForest**. Then use the `varImp` function to see which are the top 10 most predictive genes. Make a histogram of the reported importance to get an idea of the distribution of the importance values.
 
-```{r}
-library(randomForest)
-fit_rf <- with(tissue_gene_expression, randomForest(x, y))
-vi <- varImp(fit_rf) 
-vi |> top_n(10, Overall) |> arrange(desc(Overall))
-hist(vi$Overall, breaks = seq(0,4,0.1))
-```
-
 
 
diff --git a/ml/evaluation-metrics.qmd b/ml/evaluation-metrics.qmd
index 3d0206a..9b23c07 100644
--- a/ml/evaluation-metrics.qmd
+++ b/ml/evaluation-metrics.qmd
@@ -304,13 +304,13 @@ $$ \mbox{Pr}(Y = 1\mid \hat{Y}=1) = \mbox{Pr}(\hat{Y}=1 \mid Y=1) \frac{\mbox{Pr
 
 Here is plot of precision as a function of prevalence with TPR and TNR are 95%:
 
-```{r,echo=FALSE}
+```{r precision-vs-prevalence, echo=FALSE}
 tpr <- 0.95; fpr <- 0.05
 prevalence <- seq(0,1,len = 100)
 data.frame(Prevalence = prevalence,
-           Precision = tpr * prevalence / (tpr*prevalence + fpr*(1 - prevalence))) |>
+           Precision = tpr*prevalence/(tpr*prevalence + fpr*(1 - prevalence))) |>
   ggplot(aes(Prevalence, Precision)) + geom_line() + 
-  ggtitle("Precision when TPR = 0.95 and TNR = 0.95 as a function of prevalence")
+  labs(title = "Precision as a function of prevalence", subtitle = "when TPR = 0.95 and TNR = 0.95")
 ```
 
 Although your algorithm has a precision of about 95% on the data you train on, with prevalence of 50%, if applied to the general population, the algorithm's precision would be just 33%. The doctor can't use an algorithm with 33% of people receiving a positive test actually not having the disease. Note that even if your algorithm had perfect sensitivity, the precision would still be around 33%. So you need to greatly decrease your FPR for the algorithm to be useful in practice.
@@ -514,4 +514,5 @@ y <- factor(dat$sex, c("Female", "Male"))
 6\. What is the prevalence (% of females) in the `dat` dataset defined above?
 
 
+
   
\ No newline at end of file
diff --git a/ml/ml-in-practice.qmd b/ml/ml-in-practice.qmd
index 659b717..e7d5c2e 100644
--- a/ml/ml-in-practice.qmd
+++ b/ml/ml-in-practice.qmd
@@ -389,7 +389,7 @@ For random forest, we can also speed up the training step by running less trees
 
 Here we can see that error rate stabilizes after about 200 trees:
 
-```{r}
+```{r error-versus-number-of-trees}
 plot(fit_rf)
 ```
 
diff --git a/ml/cross-validation.qmd b/ml/resampling-methods.qmd
similarity index 99%
rename from ml/cross-validation.qmd
rename to ml/resampling-methods.qmd
index bcf5fe5..50fb0df 100644
--- a/ml/cross-validation.qmd
+++ b/ml/resampling-methods.qmd
@@ -326,9 +326,9 @@ This approach is actually the default approach in the **caret** package. We desc
 
 In @sec-knn-cv-intro, we computed an estimate of MSE based just on the provided test set (shown in red in the plot below). Here we show how the cross-validation techniques described above help reduce variability. The green curve below shows the results of applying K-fold cross validation with 10 folds, leaving out 10% of the data for validation. We can see that the variance is reduced substantially. The blue curve is the result of using 100 bootstrap samples to estimate MSE. The variability is reduced even further, but at the cost of a 10 fold increase in computation time.
 
-```{r}
+```{r k-fold-versus-bootstrap, echo=FALSE}
 set.seed(2023-11-30)
-boot <- train(y~., method = "knn", tuneGrid = data.frame(k=ks), 
+boot <- train(y ~ ., method = "knn", tuneGrid = data.frame(k=ks), 
               data = mnist_27$train, 
               trControl = trainControl(number = 100))
 cv <- train(y ~ ., method = "knn", 
@@ -340,10 +340,10 @@ cv <- train(y ~ ., method = "knn",
 data.frame(k = ks, naive = accuracy["test",], 
            cv = cv$results[,2],
            boot = boot$results[,2]) |>
-  pivot_longer(-k, values_to = "accuracy", names_to = "set") |>
+  pivot_longer(-k, values_to = "Accuracy", names_to = "set") |>
   mutate(set = factor(set, levels = c("naive", "cv", "boot"),
                       labels = c("Simple", "K-fold", "Boostrap"))) |>
-  ggplot(aes(k, accuracy, color = set)) + 
+  ggplot(aes(k, Accuracy, color = set)) + 
   geom_line() 
 ```