Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
code		code
A-B Testing.ipynb		A-B Testing.ipynb
README.md		README.md

README.md

title

duration

creator

Statistics Fundamentals

3 hr

name	city
K. Nate Tucker	SF

Statistics Review

DS | Lesson 4

LEARNING OBJECTIVES

After this lesson, you will be able to:

Explain the difference between causation vs. correlation
Test a hypothesis within a sample case study
Validate your findings using statistical analysis (p-values, confidence intervals)

STUDENT PRE-WORK

Before this lesson, you should already be able to:

Explain the difference between variance and bias
Use descriptive stats to understand your data

LESSON GUIDE

TIMING	TYPE	TOPIC
5 min	Opening	Lesson Objectives
30 min	Introduction	Confidence Intervals
30 min	Introduction	Hypothesis Testing
30 min	Demo	Hypothesis Testing: Case Study
5 min	Introduction	Validate your findings
20 min	Demo	P-values, CI: Case Study
35 min	Independent Practice	Practice with p-values and CI
15 min	Wrap-up	Review Guided Practice

Opening (5 min)

Review any questions from last session
Discuss Current Lesson Objectives
Review prior exit tickets

Data Source

Today we will use advertising data from an example in An Introduction to Statistical Learning by Gareth James.

Intro: Hypothesis Testing (5 mins)

You'll remember from last time that we worked on descriptive statistics. How would we tell if there is a difference between our groups? How would we know if this difference was real or if our finding is simply due to chance?

These are the questions we often tackle when we are building out our models in the Refine & Build steps of our data science workflow.

For example, if we are working on sales data, how would we know if there was a difference between the buying patterns of men and women at Acme Inc? Hypothesis testing!

Hypothesis testing steps

Generally speaking, you start with a null hypothesis and an alternative hypothesis, which is opposite the null. Then, you check whether the data supports rejecting your null hypothesis or failing to reject the null hypothesis.

Note that "failing to reject" the null is not the same as "accepting" the null hypothesis. Your alternative hypothesis may indeed be true, but you don't necessarily have enough data to show that yet.

This distinction is important to help you avoid overstating your findings. You should only state what your data and analysis can truly represent.

Here is an example of a conventional hypothesis test:

Null hypothesis: There is no relationship between Gender and Sales.
Alternative hypothesis: There is a relationship between gender and Sales

Let's dive into this more with the demo.

Demo: Hypothesis Testing Case Study (30 mins)

Check: What is the null hypothesis? Why is this important to use?

Intro: Validate your findings (5 mins)

How do we tell if the association we observed is statistically significant?

Statistical Significance is the likelihood that a result or relationship is caused by something other than mere random chance. Statistical hypothesis testing is traditionally employed to determine if a result is statistically significant or not.

Typically, we use a cut point of 5%. In other words, we say that something is NOT statistically significant if there is a less than 5% chance that our finding was due to chance alone.

When data scientists present results and say we found a significant result- it is almost always using these criteria. Let's dive into them further to understand p-values and confidence intervals.

Demo: P-values & CI in the case study (20 mins)

Check: What does a 95% confidence interval indicate?

Independent Practice (35 min)

For this exercise, you will look through a variety of analyses and interpret the findings.

You will be presented a series of outputs (similar to the ones we will generate once we start regression) and tables from a published analysis.

For this lab you will be asked to read these outputs and tables and determine if the findings are statically significant or not.

You will also get practice looking at the output and understanding how the model was built (e.g. identifying predictor/exposure vs outcome).

Conclusion: Questions (15 mins)

Any questions?

BEFORE NEXT CLASS


UPCOMING PROJECTS	Unit Project 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lesson-04

lesson-04

README.md

Statistics Review

LEARNING OBJECTIVES

STUDENT PRE-WORK

LESSON GUIDE

Opening (5 min)

Data Source

Intro: Hypothesis Testing (5 mins)

Hypothesis testing steps

Demo: Hypothesis Testing Case Study (30 mins)

Intro: Validate your findings (5 mins)

Demo: P-values & CI in the case study (20 mins)

Independent Practice (35 min)

Conclusion: Questions (15 mins)

BEFORE NEXT CLASS

Files

lesson-04

Directory actions

More options

Directory actions

More options

Latest commit

History

lesson-04

Folders and files

parent directory

README.md

Statistics Review

LEARNING OBJECTIVES

STUDENT PRE-WORK

LESSON GUIDE

Opening (5 min)

Data Source

Intro: Hypothesis Testing (5 mins)

Hypothesis testing steps

Demo: Hypothesis Testing Case Study (30 mins)

Intro: Validate your findings (5 mins)

Demo: P-values & CI in the case study (20 mins)

Independent Practice (35 min)

Conclusion: Questions (15 mins)

BEFORE NEXT CLASS