build a model that predicts whether an individual makes over $50,000 per year based on anonymized census data
understanding factors influencing income inequality and potentially informing targeted social programs.
1- deals with Missing Values. 2- Figure out why the data is missing. 3- Eliminating all extra variables. 4- Eliminating duplicates. 5- detect and remove outliers (you can use box plot to ensure that your data have outliers). 6- Scaling and Normalization. 7- Eliminating blank spaces or missing information.(can use SimpleImputer to handle missing values). 8- Arranging the data logically and sequentially so that it is easy to visualize. 9- Grouping data in rows and columns or horizontally and vertically will help in data arrangement and also proper visualization. 10- Dealing with Inconsistent Data Entry.
How is one variable related to the other? What sort of relationship exists between two different variables? What kind of trend is the data following? Can a dataset be divided into smaller parts?
used basic visualization methods using plottly and cufflinks not matplotlib and seaborn : 1- Line plots. 2- Area plots. 3- Histogram. 4- Bar charts. 5- Pie charts. 6- Box plots. 7- Scatter plots. 8- Bubble plots.
Dimensionality Reduction (PCA) / Encoding (1 Hot - Normal) / Scaling
7 Models evaluation using different evaluation metrics like (Accuracy – Precision – Recall – ROC):