Skip to content
View cookm346's full-sized avatar

Block or report cookm346

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
cookm346/README.md

Select projects

Data analysis

Mental health clustering analysis

I collect mental health data from over 1800 participants and analyze the data using k-means clustering


LOST transcript analysis

Analysis of transcripts form the hit tv show Lost. Which characters are known for their use of "dude", "aye", "ain't", and "bloody"?


Fiction text analysis

What words define fiction compared to other genres like nonfiction, newspapers, and magazines? I use a simple frequency based approach to answer the question


Professional boxer analysis

In this analysis I use Empirical Bayesian techniques to better estimate a boxer's win rate. This technique is especially effective for boxer's who have a very small number of boxing matches under their belt


Master angler analysis

I scrape 400,000+ trophy fish records from Manitoba and analyze fish catching trends over time, women's involvement in sport fishing, as well as find several hot fish/lake/season combos for catching big fish



Machine learning projects

Bob Ross IMDb rating predictions

I use several machine learning models to predict IMDb episode ratings for Bob Ross' The Joy of Painting episodes


LOST IMDb episode rating prediction

I use several machine learning models to predict IMDb episode ratings from text descriptions of Lost episodes as well as analyze the defining words of each season



Probability simulation work

Birthday problem simulation

I solve the famous Birthday Problem via Monte Carlo simulation


Monty Hall simulation

I solve the Monty Hall problem through simulation showing why you should always "switch"



Statistics simulation work

Median split simulations

I show why it is never beneficial to perform a median split (and other splits) on a continuous variable. The main issue is a reduction of statistical power in finding effects (i.e., type 1 errors)


t-test simulations

In this brief simulation, I demonstrate the dangers of violating the equal sample size and equal variance assumptions while using the Student's t-test. The Welch's t-test also shows it's impressive ability to correct for assumption violations



Additional projects

Computing Fibonacci numbers using eigenvectors

I show how to represent the algorithm that generates fibonacci numbers as a matrix, then use a linear alegbra method called eigendecomposition to generate any fibonacci number without having to compute the preceding numbers in the series





Pinned Loading

  1. bob_ross_imdb bob_ross_imdb Public

    Machine learning to predict Bob Ross IMDb episode ratings

    HTML

  2. empirical_bayes_boxing empirical_bayes_boxing Public

    Empirical Bayes analysis of boxer win rate

  3. lost_transcript_analysis lost_transcript_analysis Public

    Analysis of transcripts from the tv show Lost

    R

  4. master_angler_analysis master_angler_analysis Public

    Analysis of over 400,000 Manitoba master angler records

    HTML

  5. median_split_simulation median_split_simulation Public

    Monte carlo simulation of error rates for median splits with t-tests

    R

  6. monty_hall_simulation monty_hall_simulation Public

    Simulation and explanation of the famous Monty Hall problem