Project
Police Killings
Path	Module	Course	Date
Data Analyst	Intermediate Pandas and Python	Data Analysis With Pandas: Intermediate	July 20th, 2016
Description
In this guided project, a dataset containing information about citizens killed by police in 2015 was explored. Race and socioeconomic factors were analyzed. US State Census data was merged with this dataset to create a rate statistic which describes the frequency in which citizens were killed by police in each state. Finally, the top and bottom 10 states, ranked by the police killing rate statistic, are compared by their mean incomes and racial proportions. Libraries used: pandas matplotlib numpy
Datasets
police_killings.csv state_population.csv Datasets supplied by FiveThirtyEight and the US Census Bureau

Project
Visualizing Pixar's Rollercoaster
Path	Module	Course	Date
Data Analyst	Intermediate Python And Pandas	Exploratory Data Visualization	July 22nd, 2016
Description
In this guided project, the financial and critical successes of Pixar movies created between 1995-2015 are explored. Graphs created with pandas's plotting methods are used to compare reviews from various movie review websites. Another graph displays the share of Pixar's domestic and international revenue for each film. Libraries used: pandas matplotlib Seaborn
Datasets
PixarMovies.csv Dataset supplied by Paulo Vasconcellos

Project
Custom Data Visualization
Path	Module	Course	Date
Data Analyst	Intermediate Pandas and Python	Exploratory Data Visualization	July 23rd, 2016
Description
The purpose of this guided project was to apply our knowledge of matplotlib customization options. Using a dataset describing employment outcomes and gender of recent graduates from 173 different majors, a pair of graphs are created. Code is used to add and rotate labels, constrain the range of the graph, and to create a figure with 4 subplots. Libraries used: pandas matplotlib
Datasets
recent-grads.csv Dataset supplied by Dataquest.io

Project
Preparing Data For SQLite
Path	Module	Course	Date
Data Analyst	Working With Data Sources	SQL And Databases: Intermediate	August 13th, 2016
Description
This project is the first of a two-part SQL guided project. In part 1, a dataset of Academy Award winners is prepared and imported into a newly created SQL database. Libraries used: pandas sqlite3
Datasets
academy_awards.csv Dataset supplied by AggData

Project
Creating Relations In SQLite
Path	Module	Course	Date
Data Analyst	Working With Data Sources	SQL And Databases: Intermediate	August 28th, 2016
Description
In part 2 of the SQL guided project, a new SQL table is created to store information about Academy Award ceremonies (namely, who hosted the event) from 2000 to 2010. A one-to-many connection is made between the nominations and ceremonies table by adding a foreign keys column to the nominations table. Next, a many-to-many connection is made by creating an actors and movies table which is then connected by a join table. Libraries used: pandas sqlite3
Datasets
academy_awards.csv Dataset supplied by AggData

Project
Investigating Airplane Accidents
Path	Module	Course	Date
Data Analyst	Advanced Python And Computer Science	Data Structures And Algorithms	September 20th, 2016
Description
In this guided project, a non-CSV dataset is imported and cleaned. A list of dictionaries is used to store the data rather than a pandas DataFrame. After the data is properly prepared, a pair of functions are written to perform a cursory exploration of the data. Libraries used: collections.Counter
Datasets
AirplaneAccidents.txt Datasets supplied by National Transport Safety Board

Project
Analyzing Movie Reviews
Path	Module	Course	Date
Data Analyst	Probability And Statistics	Probability And Statistics In Python: Beginner	September 28th, 2016
Description
A dataset containing the review scores from Metacritic, IMDB, Rotten Tomatoes, and Fandango for 146 films is analyzed. The data is normalized and rounded to create a common scale for comparison. Correlation and linear regression values are calculated while exploring the relationship between Metacritic and Fandango scores. Libraries used: pandas matplotlib numpy scipy.stats
Dataset
bike_rental_hour.csv Data provided byFiveThirtyEight from their article on Fandango movie review scores

Project
Analyzing NYC High Schools
Path	Module	Course	Date
Data Analyst	Intermediate Pandas and Python	Data Cleaning	September 28th, 2016
Description
Datasets containing information about New York City schools including class sizes, SAT scores, racial demographics and survey results are imported and cleaned. Correlations between SAT scores and all other numerical dataset values are calculated and visualized with a heatmap. Schools are grouped by district with their reported safety scores averaged and then plotted onto a map with color-coordinated dots. Skewness seen on the map graphic is then visualized with a graphic of a probability density function. Libraries used: pandas re numpy Seaborn matplotlib Basemap
Datasets
ap_2010.csv - Advanced Placement test data class_size.csv demographics.csv graduation.csv hs_directory.csv sat_results.csv survey_all.txt survey_d75.txt Datasets supplied by NYC Department of Education

Project
Predicting Bike Rentals
Path	Module	Course	Date
Data Scientist	Machine Learning	Decision Trees	September 29th, 2016
Description
Data from a bike sharing program is imported and briefly explored using correlations and a histogram. Some feature engineering is done to improve the data's suitability for machine learning models. Next, the dataset is used to train three different machine learning models. Their accuracy is compared using root mean squared errors. Libraries used: pandas matplotlib SKLearn linear_model.LinearRegression tree.DecisionTreeRegressor ensemble.RandomForestRegressor
Dataset
bike_rental_hour.csv Capital Bikeshare data cleaned and combined by Dataquest.

Provide feedback

Saved searches