This is my repository on learning Machine Learning from scratch, if you want to checkout my repository on Deep learning with TensorFlow click here 👉: TensorFlow-Deep-Learning
Number | Notebook | Description | Extras |
---|---|---|---|
00 | Basic ML Intuition | What is ML, Bias and Variances? | |
01 | Data Preprocess Template | Data preprocess template | |
02 | Regression | Simple linear regression, multiple, poly, ... | |
03 | Classification | Logistic regression, knn, svm, ... | |
04 | Clustering | KMeans clustering, Hierarchical clustering | |
05 | Association Rule Learning | Apriori, Eclat | |
06 | Reinforcement Learning | UCB, Thomson Sampling | |
07 | NLP | Introduction to nlp | |
08 | Dimensionality Reduction | PCA, Kernel PCA, LDA | |
## | Model Selection | Model selection: regression, classifcation | |
## | Case Study | Case study |
- Bias and Variance
- Confusion Matrix, Accuracy, Precision, Recall, F1-score
- L1 and L2 Regularization
- StatsQuest L1 Regularization
- StatsQuest L2 Regularization
- Regularization and Cross Validation
Regression Model | Pros | Cons |
---|---|---|
Linear Regression | Works on any size of the dataset, gives informations about relevance of features. | The Linear Regression Assumptions. |
Polynomial Regression | Works on any size of dataset, works very well on non linear problems. | Need to choose the right polynomial degree for a good bias, variance tradeoff. |
SVR | Easily adaptable, works very well on non linear problems, not bias by outlier. | Compulsory to apply feature scaling, not well documentated, more difficult to understand. |
Decision Tree Regression | Interpretablity, no need for feature scaling, works on both linear, nonlinear problems. | Poor Results on too small datasets, overfitting can easily occur. |
Random Forest Regression | Powerful and accurate, good performance on may problems, including nonlinear. | Poor Results on too small datasets, overfitting can easily occur. |
Classification Model | Pros | Cons |
---|---|---|
Logistic Regression | Probabilistics approach, gives informations about statiscal significance of features. | The Logistic Regression Assumptions. |
K-NN | Simple to understand, fast and efficient. | Need to choose the number of neighbours K. |
SVM | Performant, not biased by outliers, not sensitive to overfitting. | Not appropriate for nonlinear problems, not the best choice for large number of features. |
Kernel SVM | High performance on nonlinear problems, not biased by outliers, not sensitive to overfitting. | Not the best choice for large number of features, more complex. |
Naive Bayes | Efficient not biased by outliers, works on nonlinear problems, probabilitstic approach. | Based on the assumption that features have same statistical relevance. |
Decision Tree Classification | Interpretability, no need for feature scaling, works on both linear, nonlinear problems. | Poor results on too small datasets, overfitting can easily occur. |
Random Forest Classification | Powerful and accurate, good performance on many problems, including nonlinear. | No interpretability, overfitting can easily occur, need to choose the number of trees. |
Number | Notebook | Extras |
---|---|---|
01 | KMean | StatQuest: KMeans Clustering , WCSS and Elbow method |
02 | Hierarchical | StatQuest: Hierarchical Clustering , Dendrogram method |
Regression Model | Pros | Cons |
---|---|---|
K-Means | Simple to understand, easily adaptable, works well on small or large datasets, fast, efficient and performant. | Need to choose the number of cluster. |
Hierarchical Clustering | The optimal number of clusters can be obtained by the model itself, pratical visualization with the dendrogram. | Not appropriate for large datasets. |
Number | Notebook | Extras |
---|---|---|
01 | Apriori | Apriori Algorithm |
02 | Eclat |
Number | Notebook | Extras |
---|---|---|
01 | Upper Confidence Bound | Confidence Bounds, UCB and Multi-armed bandit problem |
02 | Thomson Sampling | Thomson Sampling |
- ritvikmath: The Multi-Armed Bandit Stategies
- ritvikmath: The strategies and UCB approach
- ritvikmath: The Thomson sampling algorithm
Number | Notebook | Extras |
---|---|---|
01 | Introduction to nlp |
Number | Notebook | Extras |
---|---|---|
01 | Principal Component Analysis | setosa-PCA example, StatQuest-PCA, plotly-PCA visualization |
Number | Notebooks | Extras |
---|---|---|
01 | Regression | |
02 | Classification | The Accuracy paradox, AUC-ROC and CAP Curves, Precision, Recall and F-1 score |
Number | Notebooks | Extras |
---|---|---|
01 | Logistic Regression | Breast Cancer classifier |
- Thanks Kirill Eremenko, Hadelin de Ponteves for creating such an awesome about machine learning online.
- Thanks Josh Starmer aka StatQuest for your brilliant video about machine learning, help me alot of understanding the math behind the ML algorithm.
- Thanks mr Vũ Hữu Tiệp for your brilliant blogs about machine learning, helps me a lot from the day i didn't know what is machine learning is.
- Thanks mr Phạm Đình Khánh for your blogs about machine learning and deep learning.