Data Science Projects

Below you will find a collection of projects divided into four main categories:

Data Analysis and Visualisation
Machine Learning
Probability and Statistics
SQL

Data Analysis and Visualisation
- Clean and Analyse Employee Exit Surveys
  - Objective: To understand if workers have resigned due to some kind of dissatisfaction.
  - Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
  - Key Achievement: Identified that employees with 7 or more years of service are more likely to resign due to some kind of dissatisfaction with the job.
- Exploring Ebay Car Sales Data
  - Objective: To clean the data and analyse the included used car listings.
  - Tech Stack: Python, Pandas, Numpy, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
  - Key Achievement: Found that German manufacturers represent more than 60% of the overall listings. Volkswagen is by far the most popular brand.
- Exploring Hackers News Posts
  - Objective: To determine which type of post and time receive the most comments on average.
  - Tech Stack: Python, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
  - Key Achievement: Identified the post to be categorised as Ask HN post and created between 20:00 - 21:00 GMT.
- Finding Heavy Traffic Indicators on I-94
  - Objective: To determine a few indicators of heavy traffic on I-94 Interstate highway.
  - Tech Stack: Python, Pandas, Matplotlib, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
  - Key Achievement: Identified two types of heavy traffic indicators: time and weather.
- Popular Data Science Questions
  - Objective: To determine what content a data science education company should create.
  - Tech Stack: Python, Pandas, Matplotlib, Seaborn, SQL, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
  - Key Achievement: Identified Deep Learning as the content to be created.
- Profitable App Profiles for the App Store and Google Play Markets
  - Objective: To find free mobile apps that are profitable for the App Store and Google Play markets.
  - Tech Stack: Python, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
  - Key Achievement: Found that turning a popular book into an app could be profitable for both markets. However, special features should be added to the app.
- Star Wars Survey
  - Objective: To clean and explore the dataset in order to answer some questions about Star Wars fans.
  - Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
  - Key Achievement: Found that "The Empire Strikes Back" is the most seen and liked Star Wars movie by the respondents.
Machine Learning
- Building A Handwritten Digits Classifier
  - Objective: To explore the effectiveness of deep, feedforward neural networks in image classification.
  - Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
  - Solution: Applied Neural Networks with different hidden layers and numbers of neurons.
  - Key Achievement: Found that using more hidden layers increase the amount of overfitting.
- Predicting Bike Rentals
  - Objective: To create different machine learning models and evaluate their performances.
  - Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
  - Solution: Applied Linear Regression, Decision Tree and Random Forest algorithms using their default settings and RMSE as the error metric.
  - Key Achievement: Obtained the best accuracy when using the Random Forest model.
- Predicting Car Prices
  - Objective: To predict a car's market price using the k-nearest neighbors algorithm.
  - Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
  - Solution: Created Univariate and Multivariate models using different values of the hyperparameter k.
  - Key Achievement: Obtained the best result with a Multivariate model using two best features and k=2.
- Predicting House Sale Prices
  - Objective: To predict house sale prices using Linear Regression algorithm.
  - Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Jupyter Notebook
  - Solution: Performed Feature Engineering and Feature Selection techniques and evaluated the model using k-fold cross-validation.
  - Key Achievement: Obtained the best result using k-fold cross-validation with k=4.
Probability and Statistics
- Building a Spam Filter with Naive Bayes
  - Objective: To create a spam filter that classifies new SMS messages as spam or non-spam.
  - Tech Stack: Python, Pandas, Jupyter Notebook
  - Solution: Used Conditional Probability concepts to build a function that works like the Multinomial Naive Bayes algorithm.
  - Key Achievement: Managed to build a spam filter for SMS messages with an accuracy of 98.74% on the test set.
- Finding the Best Two Markets to Advertise in
  - Objective: To find out the best markets to advertise programming courses.
  - Tech Stack: Python, Pandas, Matplotlib, Jupyter Notebook
  - Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
  - Key Achievement: Found the US as the best market to advertise in.
- Investigating Fandango Movie Ratings
  - Objective: To find any difference between Fandango's ratings for popular movies in 2015 and 2016.
  - Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
  - Solution: Performed Descriptive Statistical Analysis and Data Visualisation techniques.
  - Key Achievement: On average, popular movies released in 2016 were rated lower on Fandango than popular movies released in 2015.
- Mobile App for Lottery Addiction
  - Objective:To practise applying probability and combinatorics concepts in a setting that simulates a real-world scenario.
  - Tech Stack: Python, Pandas, Jupyter Notebook
  - Solution: Built functions that calculate Factorials, Combinations and Probabilities.
  - Key Achievement: Managed to write functions that calculate the probabilities of winning any prize with one or more tickets besides checking historical lottery data.
SQL
- Analysing CIA Factbook Data Using SQL
  - Objective: To query the database to fetch demographic information about countries around the world.
  - Tech Stack: SQL, Jupyter Notebook
  - Solution: Built SQL queries to compute Summary Statistics.
  - Key Achievement: Created queries and subqueries to answer demographic questions.
- Answering Business Questions Using SQL
  - Objective: To query the database and extract information for decision making.
  - Tech Stack: SQL, Jupyter Notebook
  - Solution: Built SQL queries to answer Business Questions.
  - Key Achievement: Found that the 'Rock' genre accounts for 53% of sales by itself.
- Customers and Products Analysis Using SQL
  - Objective: To analyse data from sales records and extract information for decision making.
  - Tech Stack: SQL, Jupyter Notebook
  - Solution: Built SQL queries to answer Business Questions.
  - Key Achievement: Found the priority products for restocking and how much to spend on acquiring new customers.

Contact

Please feel free to contact me if you have any questions or comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Projects

Data Analysis and Visualisation

Clean and Analyse Employee Exit Surveys

Exploring Ebay Car Sales Data

Exploring Hackers News Posts

Finding Heavy Traffic Indicators on I-94

Popular Data Science Questions

Profitable App Profiles for the App Store and Google Play Markets

Star Wars Survey

Machine Learning

Building A Handwritten Digits Classifier

Predicting Bike Rentals

Predicting Car Prices

Predicting House Sale Prices

Probability and Statistics

Building a Spam Filter with Naive Bayes

Finding the Best Two Markets to Advertise in

Investigating Fandango Movie Ratings

Mobile App for Lottery Addiction

SQL

Analysing CIA Factbook Data Using SQL

Answering Business Questions Using SQL

Customers and Products Analysis Using SQL

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Data Analysis and Visualisation		Data Analysis and Visualisation
Machine Learning		Machine Learning
Probability and Statistics		Probability and Statistics
SQL		SQL
README.md		README.md

thiago-cb/datascience

Folders and files

Latest commit

History

Repository files navigation

Data Science Projects

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages