- Bayesian vs Frequentist Thought
- Kolmogorov's Definition for Conditional Probability
- Law of Total Probability
- Bayes Theorem
- Bayes Textbook Problems
Statistical inference in the past century has primarily relied upon a classical, or frequentist approach, which makes inferences about populations from samples. This approach results in some explicit level of confidence via hypothesis testing and confidence intervals, using probability theory and probability distributions as the foundation to determine whether a sampling is difference by chance, or because the population being sampled is actually different than some initial hypothesized value.
Bayesian Inference does not consist of a static model the way that frequentist statistics do, rather, the models take in new information, and based on that new data, the underlying probabilistic model is adjusted. This is called the posterior distribution, in the end we end up with an evolving model. Essentially, new observations allow for “updating a belief” about the phenomenon being observed.
A conditional probability is a measure of the probability of one event occurring, given that another (different) event has occurred, or will occur.
- Implicitly, we presume, assume, assert, or see evidence that the given event has or will occur.
- This can be interpreted as limiting the sample space of the event we are measuring.
A survey was given to a number of students regarding their pets. 40% of respondents reported owning a dog, 45% of respondents reported owning a cat, and 25% of students responded to owning both (All students owning both were counted in all categories).
- Frame the question
- Write down known info
$P(\text{cat}) = 0.45$ $P(\text{dog}) = 0.40$ $P(\text{cat} \cap \text{dog}) = 0.25$
- Apply Kolmogorov's Definition
Thus the probability of a student owning a cat given that they own a dog is 0.55
By rearranging the definition, we can qualify independence of a probability problem.
Dependence if
then the events have dependency
Independence if
then the events are independent. Notice how this recalls the Multiplication Rule for independent events.
A survey of 400 vehicles was taken driving on an afternoon, the survey found that 175 of the vehicles were red, 120 of the vehicles were trucks, and 37 of the vehicles were red trucks.
Given a sample space which is divided into a number of disjoint events, an overlapping event can be interpreted as the aggregation of that event in each subspace.
Here,
By rearranging
to
we can rewrite
as
Given two bags of marbles:
- Bag 1: 75 red, 25 blue
- Bag 2: 45 red, 37 blue
Calculate the total probability that a red marble will be chosen, if a bag is chosen at random (with equal probability), and then a marble from that bag is chosen at random.
Maybe draw a picture too
Thus the overall probability of choosing a Red marble is 0.6495
Then,
Whereas the Law of Total Probability allows us to construct a probability from disjoint events, Bayes Theorem gives us a more general approach for solving conditional probabilities and validating diagnostics based on population rates.
Note that the heavy lifting of Bayes probles is usually in applying the Law of Total Probability to the
We often call
There are two bowls filled with cookies as defined as
- Bowl #1 - 20 vanilla, 20 chocolate
- Bowl #2 - 30 vanilla, 10 chocolate
Given that a random cookie is taken from a random bowl, and each bowl has the same probability of being chosen; what is the probability that the cookie comes from bowl #2, given that it is vanilla?
Thus the probability of having chosen Bowl 2 if holding a Vanilla cookie in hand is 0.6
Given that a rare disease affects 1/1000 people in a given population. There is a diagnostic which has been developed; the diagnostic result is positive 99% of the time when it is given to someone with the disease. The diagnostic result is positive 2% of the time when it is given to someone without the disease. Given that someone has tested positive on the diagnostic, what is the probability that they actually have the disease?
Given that a rare disease affects 1/1000 people in a given population. There is a diagnostic which has been developed; the diagnostic result is positive 99% of the time when it is given to someone with the disease. The diagnostic result is positive 2% of the time when it is given to someone without the disease. Given that someone has tested positive on the diagnostic, what is the probability that they actually have the disease?
Solution: 0.04721 or 4.721%
Interpretation: This conclusion is not exactly what most people figure intuitively. From the problem, it sounds as if the diagnostic is a very good indicator in terms of percentages, however Bayes’ theorem can show us that things can be quite different than they seem on the surface.
Freddy remembers to take his umbrella with him 80% of the days. It rains on 30% of the days when he remembers to take his umbrella, and it rains on 60% of the days when he forgets to take his umbrella.
What is the probability that he remembers his umbrella when it rains?
Freddy remembers to take his umbrella with him 80% of the days. It rains on 30% of the days when he remembers to take his umbrella, and it rains on 60% of the days when he forgets to take his umbrella.
What is the probability that he remembers his umbrella when it rains?
The probability he remembers his umbrella when it rains is 2/3
A glazier buys his glass from four different manufacturers - Superclear (10%), Seethrough (25%), BirdTrap (30%) and WeSellGlass (35%).
In the past, the glazier has found that 1% of Superclear product is cracked, 1.5% of Seethrough’s product is cracked, and 2% of BirdTrap’s and WeSellGlass’s products are cracked.
The glazier removes the protective covering from a sheet of glass without looking at the manufacturer's name - in other words, it's a random choice. He finds the glass is cracked. What is the probability it was made by BirdTrap?
Solution: 0.3380
Fishing line by a company is tested for strength.The test gives a correct positive result with a probability of 0.85 when the fishing line is strong, but gives an incorrect positive result (false positive) with a probability of 0.04 when in fact the fishing line is not strong. If 98% of the fishing lines are strong, and a fishing line chosen at random fails the test, what is the probability it really is not strong enough?