Skip to content

Latest commit

 

History

History
47 lines (33 loc) · 3.43 KB

EDA-1.md

File metadata and controls

47 lines (33 loc) · 3.43 KB

EDA-1 Assignment

Made by: Heril Changwal and Aashwin Sharma

Problem 1:

Task 1

Get the Data from this link and perform Exploratory Dataset Analysis on the same. You must tackle all the missing values present and also visualise the data as much as you can by plotting various plots,heatmaps,correlation matrix etc. You should present atleat five informational insights which you were able to grab from the data. Your notebook should be presented in a well-defined manner.

Task 2

"Algorithms like K-nearest and Naive Bayes support data with missing values."
Justify the above statement in your own words, you may skip Naive Bayes but KNN is a must.

Task 3

Go through this cool Pokemon Dataset. Perform a detailed EDA on this Data and answer the following questions -

  1. How many Pokemon are in each Generation?
  2. Who is the most powerful pokemon ?
  3. In terms of dual types (Pokemon with 2 different types), What is the most common combination?
  4. Present the names of the Pokemons having Maximum and Minimum Feature Values corresponding to attack , defense , sp_attack , sp_defense , hp , speed and catch_rate.

Task 4

Go through this short Hands-on Tutorial.Try to understand how the Data is cleaned and prepared before training a model.
(Not Compulsory) - If you are able to code along and reach the end of the tutorial,you can make a submission regarding this on this link. Share the Submission results if you made a Successful Submission !

PS- Don't forget to refer the related Documentations of Matplotlib,Seaborn,Pandas etc. while working and experimenting with the Data.

Problem 2:

Task 1

Learn about the theory and implementation of handling Categorical Variables from this link. Try to complete this exercise to gain more clarity. Learn about the theory and implementation of various methods of Feature Selection from this link.

Task 2

Download the Car Dataset and perform EDA (handling missing values, categorical encoding, etc.). Using different Filter methods, find the most relevant features, taking car price as the dependent variable.

Apply Forward feature selection for a Linear Regression model for different values of k_features. Plot a graph of different value of k_features with the corresponding accuracy and determine it's best value.

Link to resources of lecture:

https://hackmd.io/1ruiZhSkT3S0dfgiMMBiKg?view

How to Submit?

Submit a well-formatted Jupyter Notebook. Use Markdown cells to separate each question and for any explanation that you wish to provide.

Create your notebook inside EDA-1-Submissions/ and name it as <your>-<name>.ipynb