Repository for machine learning projects and notes
- HMM
- Regression
- Probability
- Embedding
Its based upon whether system can learn increamentally from stream of data
- Batch learning or offline learing Its done on all available data, it cannot be done increamentally. Initially training is done and the model is moved to production for prediction or classification
- Online learning Its done on sequence of data or data in form of mini-batches. It can be thought of as increamental learning.This too requires limited computing resources.
- k-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural networks
- Clustering
- K-Means
- DBSCAN
- Hierarchical Cluster Analysis (HCA)
- Anomaly detection and novelty detection
- One-class SVM
- Isolation Forest
- Visualization and dimensionality reduction
- Principal Component Analysis (PCA)
- Kernel PCA
- Locally Linear Embedding (LLE)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning
- Apriori
- Eclat
- Binary classification
- Multiclass classification
- Multiclass multilabel
- Multiclass single label
- Accuracy
- What fraction of prediction was right
- warning! Not a suitable measure for class imbalance problems
- Sensitivity or Recall
- Sensitivity = (True Positive)/(True Positive + False Negative)
- Specificity
- Specificity = (True Negative)/(True Negative + False Positive)
- Receiver Operating Characteristic (ROC) curve [Tutorial]
- Area under ROC [Tutorial]
- Youden index or Youden׳s J statistic
- It integrates sensitivity and specificity information under circumstances that emphasize both sensitivity and specificity, with a value that ranges from 0 to 1.
- J = Sensitivity + Specificity − 1
True Positive 'We correctly called the wolf! We saved the town.' |
False Positive 'We called wolf falsely! Everyone is mad at us.' |
False Negative 'There was a wolf and we didn't spot it. It ate all our chicken.' |
True Negative 'No wolf, no alarm. Everyone is fine' |
- Class Imbalance
- Measures by the number of members in the class
- Normalized range: [-1,+1]
- Difference in proportion of labels (DPL)
- Measures the imbalance of positive outcome in different classes
- Range for normalized binary & multicategory facet labels: [-1,+1], Range for continuous labels: (-∞, +∞)
- Cross validation
- Leave-1-out CV
- K-fold CV
- Randomized CV
- SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model