Skip to content

thiago-cb/datascience

Repository files navigation

Data Science Projects

Below you will find a collection of projects divided into four main categories:

  • Data Analysis and Visualisation
  • Machine Learning
  • Probability and Statistics
  • SQL

      • Objective: To understand if workers have resigned due to some kind of dissatisfaction.
      • Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
      • Key Achievement: Identified that employees with 7 or more years of service are more likely to resign due to some kind of dissatisfaction with the job.
      • Objective: To clean the data and analyse the included used car listings.
      • Tech Stack: Python, Pandas, Numpy, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
      • Key Achievement: Found that German manufacturers represent more than 60% of the overall listings. Volkswagen is by far the most popular brand.
      • Objective: To determine which type of post and time receive the most comments on average.
      • Tech Stack: Python, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
      • Key Achievement: Identified the post to be categorised as Ask HN post and created between 20:00 - 21:00 GMT.
      • Objective: To determine a few indicators of heavy traffic on I-94 Interstate highway.
      • Tech Stack: Python, Pandas, Matplotlib, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
      • Key Achievement: Identified two types of heavy traffic indicators: time and weather.
      • Objective: To determine what content a data science education company should create.
      • Tech Stack: Python, Pandas, Matplotlib, Seaborn, SQL, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
      • Key Achievement: Identified Deep Learning as the content to be created.
      • Objective: To find free mobile apps that are profitable for the App Store and Google Play markets.
      • Tech Stack: Python, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning and Data Wrangling techniques.
      • Key Achievement: Found that turning a popular book into an app could be profitable for both markets. However, special features should be added to the app.
      • Objective: To clean and explore the dataset in order to answer some questions about Star Wars fans.
      • Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
      • Key Achievement: Found that "The Empire Strikes Back" is the most seen and liked Star Wars movie by the respondents.
      • Objective: To explore the effectiveness of deep, feedforward neural networks in image classification.
      • Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
      • Solution: Applied Neural Networks with different hidden layers and numbers of neurons.
      • Key Achievement: Found that using more hidden layers increase the amount of overfitting.
      • Objective: To create different machine learning models and evaluate their performances.
      • Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
      • Solution: Applied Linear Regression, Decision Tree and Random Forest algorithms using their default settings and RMSE as the error metric.
      • Key Achievement: Obtained the best accuracy when using the Random Forest model.
      • Objective: To predict a car's market price using the k-nearest neighbors algorithm.
      • Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Matplotlib, Jupyter Notebook
      • Solution: Created Univariate and Multivariate models using different values of the hyperparameter k.
      • Key Achievement: Obtained the best result with a Multivariate model using two best features and k=2.
      • Objective: To predict house sale prices using Linear Regression algorithm.
      • Tech Stack: Python, Pandas, NumPy, Scikit-Learn, Jupyter Notebook
      • Solution: Performed Feature Engineering and Feature Selection techniques and evaluated the model using k-fold cross-validation.
      • Key Achievement: Obtained the best result using k-fold cross-validation with k=4.
      • Objective: To create a spam filter that classifies new SMS messages as spam or non-spam.
      • Tech Stack: Python, Pandas, Jupyter Notebook
      • Solution: Used Conditional Probability concepts to build a function that works like the Multinomial Naive Bayes algorithm.
      • Key Achievement: Managed to build a spam filter for SMS messages with an accuracy of 98.74% on the test set.
      • Objective: To find out the best markets to advertise programming courses.
      • Tech Stack: Python, Pandas, Matplotlib, Jupyter Notebook
      • Solution: Performed Exploratory Data Analysis, Data Cleaning/Wrangling and Data Visualisation techniques.
      • Key Achievement: Found the US as the best market to advertise in.
      • Objective: To find any difference between Fandango's ratings for popular movies in 2015 and 2016.
      • Tech Stack: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook
      • Solution: Performed Descriptive Statistical Analysis and Data Visualisation techniques.
      • Key Achievement: On average, popular movies released in 2016 were rated lower on Fandango than popular movies released in 2015.
      • Objective:To practise applying probability and combinatorics concepts in a setting that simulates a real-world scenario.
      • Tech Stack: Python, Pandas, Jupyter Notebook
      • Solution: Built functions that calculate Factorials, Combinations and Probabilities.
      • Key Achievement: Managed to write functions that calculate the probabilities of winning any prize with one or more tickets besides checking historical lottery data.
      • Objective: To query the database to fetch demographic information about countries around the world.
      • Tech Stack: SQL, Jupyter Notebook
      • Solution: Built SQL queries to compute Summary Statistics.
      • Key Achievement: Created queries and subqueries to answer demographic questions.
      • Objective: To query the database and extract information for decision making.
      • Tech Stack: SQL, Jupyter Notebook
      • Solution: Built SQL queries to answer Business Questions.
      • Key Achievement: Found that the 'Rock' genre accounts for 53% of sales by itself.
      • Objective: To analyse data from sales records and extract information for decision making.
      • Tech Stack: SQL, Jupyter Notebook
      • Solution: Built SQL queries to answer Business Questions.
      • Key Achievement: Found the priority products for restocking and how much to spend on acquiring new customers.

Contact

Please feel free to contact me if you have any questions or comments.