Machine Learning Process

The machine learning process involves the following steps:

1- Data Preparation: Collect, clean, and preprocess data.
2- Data Visualization and Analysis: Visualize and analyze data to identify patterns and relationships.
3- Feature Engineering: Select and transform relevant variables in the data.
4- Model Selection: Choose the best model for the problem.
5- Model Training: Feed data into the model and adjust parameters to minimize error.
6- Hyperparameter Tuning: Set hyperparameters to optimize model performance.
7- Model Evaluation: Measure accuracy, precision, recall, and other performance metrics.
8- Model Deployment: Integrate the model into an application and set up a pipeline to feed new data.

Machine Learning Tutorial

This tutorial covers Machine Learning Basics using Python.

The repository includes Python notebooks, reference guides, and cheatsheets for the entire Machine Learning process:

1- Data preprocessing and analysis: clean and transform data into a format suitable for analysis using NumPy and Pandas.
2- Data visualization: understand and explore data visually using Matplotlib and Seaborn.
3- Machine learning: explore various algorithms in Scikit-learn such as regression, classification, and clustering.
4- Feature engineering: feature encoding, feature scaling, feature selection, etc.
5- Model selection: comparison of ML algorithms, how to choose a ML algorithm, etc.
6- Hyperparameters tuning: Grid Search, Random Search, and Bayesian Optimization.
7- Model evaluation: validation methods, evaluation metrics, etc.
8- Model explainability: feature importance, interpretable models, etc.

The repository also includes two Python notebooks of two popular examples to get started with Machine Learning:

Classification - Titanic Survival Prediction: Predict whether a passenger on the Titanic ship survived or not based on various features such as their age, gender, ticket class, and cabin location (notebook).
Regression - Boston House Price Prediction: Predict the median value of houses in Boston neighborhoods based on various features such as crime rate, number of rooms, proximity to employment centers, and accessibility to highways (notebook).

The end of the GitHub repository provides resources and links to practice and advance with Machine Learning:

The most popular ML dataset platforms.
The most popular ML competition platforms.
A guide to tackle ML competitions (PDF).

Requirements

Tools:

Python 3
Jupyter Notebook: web-based interactive computing platform
Google Colab: cloud-based Jupyter Notebook environment

Concepts:

Mathematics (refresher)
Python programming (refresher, notebook, guide GDSC)
Data Structures (refresher)

Python libraries:

NumPy: A library for efficient numerical operations and multidimensional arrays, widely used in scientific computing and data analysis.
Pandas: A data manipulation and analysis library, providing data structures and functions to easily handle and process structured data.
Matplotlib: A popular plotting library used for creating static, animated, and interactive visualizations.
Seaborn: A data visualization library based on Matplotlib, providing high-level functions for creating attractive statistical graphics.
Scikit-learn: A data analysis and modeling library, including ML algorithms for various tasks: classification, regression, clustering, etc.

Structure of the tutorial

1- Machine learning basic concepts
2- Read input data in Python
3- Data preprocessing and analysis: Numpy and Pandas
4- Data visualization: Matplotlib and Seaborn
5- Machine learning: Scikit-learn
6- Feature engineering
7- Model selection and parameter tuning
8- Model evaluation and explainability
9- Practice: Machine learning datasets
10- Practice: Machine learning competitions

Content of the tutorial

1- Machine learning basic concepts

Presentation on Machine learning basic concepts (PDF)

2- Read input data in Python

Tutorial to read various sources in a DataFrame (notebook)

3- Data preprocessing and analysis: Numpy and Pandas

Numpy cheatsheet (PDF)
Numpy tutorial (notebook)
Pandas cheatsheet (PDF)
Pandas tutorial (notebook)
Data preprocessing tutorial (notebook)

4- Data visualization: Matplotlib and Seaborn

Chart chooser (PDF)
Matplotlib cheatsheet (PDF)
Matplotlib tutorial (WEB)
Seaborn tutorial (WEB)
Data visualization tutorial (notebook)

5- Machine learning: Scikit-learn

Machine learning map (PDF)
Scikit-learn cheatsheet (PDF)
Scikit-learn tutorial (notebook)
Machine learning tutorial (notebook)
Classification: Titanic Survival Prediction (notebook)
Regression: Boston House Price Prediction (notebook)

6- Feature engineering

Data cleaning guide (PDF)
Data preparation cheatsheet (PDF)
Feature engineering (PDF)
Feature engineering tutorial (notebook)
Feature selection methods (IMG)

7- Model selection and parameter tuning

Comparison of ML algorithms 1 (PDF)
Comparison of ML algorithms 2 (IMG)
Comparison of ML algorithms 3 (IMG)
How to choose a ML algorithm (IMG)
Hyperparameter tuning (WEB)

8- Model evaluation and explainability

Evaluation metrics cheatsheet (PDF)
Evaluation metrics in Python (WEB)
Model explainability cheatsheet (PDF)

9- Practice: Machine learning datasets

UCI Machine Learning Repository: https://archive.ics.uci.edu/
Kaggle datasets: https://www.kaggle.com/datasets
Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
Google Dataset Search: https://datasetsearch.research.google.com/
OpenML Datasets: https://www.openml.org/
Papers With Code: https://paperswithcode.com/datasets

10- Practice: Machine learning competitions

Kaggle: https://www.kaggle.com/competitions
DrivenData: https://www.drivendata.org
Zindi Africa: https://zindi.africa/competitions
Guide to tackle ML competitions (PDF)

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
Boston House Price Prediction		Boston House Price Prediction
Images		Images
Titanic Survival Prediction		Titanic Survival Prediction
Machine learning competitions.pdf		Machine learning competitions.pdf
Python Guide (GDSC).pdf		Python Guide (GDSC).pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Process

Machine Learning Tutorial

Requirements

Structure of the tutorial

Content of the tutorial

About

Releases

Packages

Languages

SamBelkacem/Machine-Learning-Basics

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Process

Machine Learning Tutorial

Requirements

Structure of the tutorial

Content of the tutorial

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages