Working on a craft beer data set to make some visuals, identify takeaways, run some models and then discuss over a beer.
-
Most popular beer is American IPA
-
Average abv of beer is 5.8%
-
Most popular beer state is Colorado with 47 Breweries
-
Most popular beer city is Portland with 17 breweries
-
Average breweries per state is 10
-
Average breweries per city is roughly 2
-
Random forest is the best performing model with 48% accuracy on test data
- Initial Thoughts:
- look at the different beer types
- how many beers are there?
- where are the beers brewed?
- is there a most popular beer type?
- what state / city are breweries most popular?
- download beer and brewery csv files to local computer
- import csv files into jupyter notebook
- merge csv files together into new dataframe
- use new dataframe from merged csv files
- check for nulls or missing values
- rename columns for readability
- create sub dataframes for exploration
- create model dataframe for modeling
- take a look at some beer stats
- plot most popular beers
- look at percentage of popular beers
- look at beer alcohol by volume
- look at breweries by state and city
-
Train accuracy: 51%
-
Validate accuracy: 38%
-
Test accuracy: 36%
-
Train accuracy: 47%
-
Validate accuracy: 45%
-
Test accuracy: 44%
-
Train accuracy: 51%
-
Validate accuracy: 49%
-
Test accuracy: 48%
- Beer is good
- Drink more beer
- The beer data was obtained from: Craft Beer Data