HSEB 1730, 9AM on Mondays, Dates Below
- Lecture 1 (Aaron Quinlan, May 20, 2019): Intro to Data Analysis in RStudio
- Lecture 2 (Aaron Quinlan, June 3, 2019): Data frames and Importing Data
- slides
- Inspired in part by: https://4va.github.io/biodatasci/r-rnaseq-airway.html
- Lecture 3 (Aaron Quinlan, June 10, 2019): More with data frames, precision v. accuracy, very basic RNA-seq analysis
- Inspired in part by: https://4va.github.io/biodatasci/r-rnaseq-airway.html
- slides
- video
- Lecture 4,5 (Javier Hernandez, June 17, 2019): R Packages, Data Types, Functions
- Lecture 6 (Tom Sasani, July 1, 2019): Intro to data visualization and ggplot2
- Lecture 7 (Charlie Murtaugh, July 8, 2019): Data Wrangling with "tidyverse"
- Lecture 8 (Aaron Quinlan, July 22, 2019): Intro to Probability
- Lecture 9 (Alan Rogers, August 5, 2019): Sum rule, Product rule, Conditional probability and Bayes rule.
- Lecture 10 (Alan Rogers, August 12, 2019): Conditional Probability and Bayes Rule
- Lecture 11 (Aaron Quinlan, August, 26): Poisson random variables for counting applications in biology
- Lecture 12 (Aaron Quinlan, September 16): Gaussian distributions and QQ plots
- Lecture 13 (Aaron Quinlan, October 14): Central Limit Theorem and Confidence Intervals
- Lecture 14 (Aaron Quinlan, November 4): The t-statistic, t-distribution, t-tests, and p-values
- Lecture 15 (Aaron Quinlan, November 25): Power calculations and sample size
- Lecture 16 (Tom Sasani, December 2): Intro to regression, part 1
- Lecture 17 (Tom Sasani, December 16): Intro to regression, part 2
- Lecture 18 (Tom Sasani, February 10): Intro to regression, part 3
Join the SLLOBS slack group here:
The goal of the Salt Lake Learners of Biostats (SLLOBS) is to convene folks at all levels that are interested in learning (and teaching) basic concepts in data analysis and statistics for biological research. We will meet bi-weekly on Monday mornings at 9AM for one hour.
One or more people will work together to learn and present a topic each week. The goal of each lecture is to:
- Give an accessible introduction to the topic
- Provide clear examples and explanations
- Demonstrate R code that conveys the topic
This will indeed require effort by the presenter, but the idea is that a large group of interested folks will provide a large pool of both teachers and learners.
The vision is that if we all attend and put our best effort forward, we will all learn together and have a shared foundation for future learning and discussion.
To be successful, SLLOBS will need to:
- attend (most) every meeting
- read any required material before each lecture
- make a concerted effort to contribute and present material for the group.
If members follow these expectations, we will have a large corpus of teaching and learning material that will be available to refer back to. Furthermore, it will be the basis for a formal course in the future.
- multiple-testing FDR, Bonferroni, Q-values (Storey)
- power analysis
- chi-squared, contingency tests
- batch effects
- survival curves
- r versus r^2
- Monte Carlo simulations
- Gibbs sampling
- MCMC
- MA plots:
- RStudio Primers
- https://lindeloev.github.io/tests-as-linear/
- https://www.huber.embl.de/msmb/
- https://datasciencelabs.github.io/pages/lectures.html
- https://towardsdatascience.com/a-guide-to-data-visualisation-in-r-for-beginners-ef6d41a34174
- https://www.datasciencecentral.com/profiles/blogs/3-types-of-regression-in-one-picture-baba-png
- Lecture 1: Goals of the group, Intro to R and RStudio
- Goals and Motivation
- Meeting frequency
- Expectations
- sharing material
- sharing knowledge
- What is R?
- Why R?
- Installing RStudio
- RStudio
- Calculator
- Lists
- Lecture 2: Basics of R (I)
- RMarkdown
- Vectorization
- Data types
- Built-in datasets
- Importing Data
- broken data
- Lecture 3: Basics of R (II)
- Installing packages
- Basic Data Wrangling
- tidyverse
- Lecture 4: Intro to plotting and data visualization
- Why
- plot()
- customizing
- Scatter Plot
- Regression Line (details later)
- Barplot
- Histograms
- Boxplots and better versions thereof
- Lecture 1: Probability
- Discrete Random Variables
- Bernouli trials
- Binomial success counts
- Poisson distributions
- Continuous Random Variables
- Descriptive Statistics
- expected value
- mean
- median
- mode
- Basic Simulations
- coin toss
- importance of sample size
- Discrete Random Variables
- Lecture 2: Inference (I)
- Random variable probability distributions
- Expected Values
- Standard Error
- Variance and standard deviation
- Lecture 3: Maximum Likelihood
- Problem setup
- Work through an example
- Lecture 4: Inference (II)
- Estimates
- Central Limit Theorem
- Confidence Intervals
- Lecture 5: t-tests, p-values, multiple testing, q-values???
- Lecture 6: Inference (III)
- Developing Models
- Intro to Bayesian Inference
- Bayesian Thinking
-
References:
-
Lecture 1: Motivation
- Examples
- Complexity
- What is a linear model?
- Basic correlation
- Least Squares
-
Lecture 2: Intro to Regression
- Galton: Regression toward the mean
- Correlation
- Pearson
- Spearman
- Anscombe's Quartet
- Regression Line
- Stratification
-
Lecture 3: Linear Models
- lm
- interpretation of coefficients and p-values
- impact and handling of outliers
- confidence intervals
- Interpretation with Examples
-
Lecture 4: Generalized Linear Models
- Why?
- How to design them
- Example: Sasani et al, DNM counts?
-
Lecture 6: Most statistical tests are really just linear models
- One mean tests:
- One sample t-test and Wilcoxon signed-rank
- One mean tests: Paired samples t-test and Wilcoxon matched pairs
- One mean tests:
-
Lecture 7: Most statistical tests are really just linear models (cont.)
- Two means tests:
- Independent t-test
- Mann-Whitney U
- Welch's t-test
- Two means tests:
-
Lecture 8: Most statistical tests are really just linear models (cont.)
- Three or more means
- One-way ANOVA and Kruskal-Wallis
- Two-way ANOVA
- ANCOVA
- Three or more means
-
Lecture 9: Goodness of fit tests
- Lecture 1: ggplot
- Lecture 2: 1D data plots: barplots, boxplots, violin plots, beewswarm plots, density plots
- Lecture 3: 2D data plots: Scatterplots, hexbin
- Lecture 4: more than two dimensions: faceting, interactive graphics, color
- Lecture 1: Intro
- Challenges of count data
- RNA-seq
- Modeling Count Data
- Dispersion
- Normalization
- Lecture 2:
- Poisson noise
- Biological signal
- Biological and technical replicates
- Lecture 3: DeSeq2
- the method
- analyses and examples
- Lecture 4: Misc
- Outliers
- Count data transformations
- Lecture 1: Biological data is often multi-modal. How do we handle this?
- generate mixtures of normal distributions
- Lecture 2: Expectation Maximization (EM) for reverse engineering the mixtures
- Lecture 1: Intro
- Why do we cluster data?
- Measuring similarity
- k-means clustering
- Lecture 2: Clustering examples with flow cytometry data
- Data preprocessing
- Density-based clustering
- Lecture 3: Hierarchical clustering
- Lecture 4: Validating and choosing the number of clusters
- Lecture 5: Detecting Batch effects
- Lecture 1: Hypothesis testing
- types of error
- revisiting the t-test
- permutation tests
- Lecture 2: P-value hacking
- Lecture 3: Multiple testing
- Theory, Implications
- Bonferonni correction
- Lecture 4: False discovery rate (FDR)
- P-value histogram
- Benjamini-Hochberg algorithm for limiting FDR
- Local FDR
- Lecture 5: Other tests?
- Lecture 1: Different distributions
- Lecture 2: Fitting data to distributions, Q-Q plot
- Dimension reduction
- PCA
- Sampling
- Bootstrapping