-
Notifications
You must be signed in to change notification settings - Fork 7
/
stats.rmd
51 lines (35 loc) · 3.43 KB
/
stats.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Statistical and mathematical fundamentals
Phylogenetic biology draws on a variety of themes in statistics and mathematics. Given the diverse backgrounds of people coming to phylogenetic biology, one of the challenges of introducing people to this field is ensuring a foundation in these fundamentals. Many people have a solid working grasp of many of these topics, others need a refresher, and some have never seen them.
Throughout the text I have covered some of these fundamentals, but here I provide additional resources.
## Probability theory and General statistics {#stats-general}
I put together this short summary of statistical concepts commonly encountered in biology - https://bitbucket.org/caseywdunn/statistics/raw/master/statistics_in_biology.pdf. It is short and technical, with a focus on the relationship between concepts, such as the exponential family of distributions.
## Bayes theorem {#stats-bayes}
Grant Sanderson of 3Blue1Brown provides an excellent introduction to Bayes Theorem and conditional probability in this video - https://youtu.be/HZGCoVF3YvM. Note that he uses $E$ for Evidence rather than $D$ for Data as is common practice in phylogenetic applications.
This video provides additional perspective, including its application to medical testing - https://youtu.be/R13BD8qKeTg.
The MCMC robot by Paul Lewis is an excellent interactive introduction to MCMC sampling - https://phylogeny.uconn.edu/mcmc-robot/.
## Linear algebra {#linear-algebra}
Linear algebra is a rich field of math that springs from a surprisingly simple
starting point - linear combinations of
vectors, often represented as rows and columns of matrices. From basic
operations such as vector addition and scalar multiplication, one can build
matrix operations and many sophisticated data analyses. Many familiar data
analysis methods, such as regression, solving systems of equations,
Principal Component Analyses (PCA), and common clustering methods, are built from linear
algebra under the hood. So are many aspects of phylogenetic analysis, including
the models of evolution themselves and the representation of the phylogenetic
trees in matrix forms that computers can easily manupulate.
Though linear algebra is fundamental to many topics at the intersection of
science, mathematics, and computation, most science students have far less
exposure to this critical field than to statistics, calculus, and other
quantitative topics. This is too bad, as even a basic working knowledge of
linear algebra can provide great insight into the tools scientists use daily
and allow us to better apply, implement, and extend them.
The following resources are a good place to start in your own studies of
linear algebra:
- The Essence of Linear Algebra video series by Grant Sanderson (3Blue1Brown) - https://www.3blue1brown.com/topics/linear-algebra.
- Linear Algebra: Theory, Intuition, Code, by Mike X Cohen. This book is
accessible to scientists with little past math experience, and is very
reasonably priced. https://www.amazon.com/Linear-Algebra-Theory-Intuition-Code/dp/9083136604.
- Data Driven Science and Engineering, by Steve Brunton and Nathan Kutz - http://www.databookuw.com/ .
This book shows how linear algebra is applied to many powerful scientific data analyses.
The authors have excellent companion videos organized into playlists at https://www.youtube.com/c/Eigensteve/playlists, including singular value decomposition for Chapter 1, Fourier analyses for chapter 2.