Skip to content

Latest commit

 

History

History
130 lines (92 loc) · 5.25 KB

README.md

File metadata and controls

130 lines (92 loc) · 5.25 KB

Python version License

mlmodelling: Experimental machine learning library in NumPy

mlmodelling is a Python package that provides readable yet efficient implementations of fundamental models used in machine learning, written using the NumPy scientific computing package.

Despite being fully usable, the models implemented in this library are not meant for production environments, but as a way of understanding the internals and working principles of the implementations that can be found in widely used machine learning libraries, such as scikit-learn and PyTorch.


Installation

Although not strictly necessary, it is highly recommended to perform the installation locally in a virtual environment, which can be initialized as follows using the venv package.

git clone https://github.com/rixsilverith/mlmodelling
cd mlmodelling

# optional: initialize a virtual environment using venv
python3 -m venv .venv
source .venv/bin/activate

# installation using pip
pip3 install .

Installation for development

Installation for development can be done by adding the -e flag to pip as

pip3 install -e .

instead of running the usual pip install command.

Requirements

mlmodelling depends on the following packages:

  • numpy - Efficient numerical computing library for Python.
  • matplotlib - Plotting library for Python

Some examples

Logistic Regression model

$ python3 examples/logistic_regression.py

This example generates a synthetic dataset suited for binary classification, fits a logistic regression model using the Stochastic Gradient Descent optimizer according to the generated data and plots the estimated decision boundary for the problem.

+------------------------------+
| LogisticRegressionClassifier |
+------------------------------+
phi (activation): Sigmoid
optimizer: StochasticGradientDescent
 └── learning_rate: 0.01, momentum: 0.0, nesterov: False
loss: BinaryCrossEntropy
regularizer: L2Ridge
 └── alpha: 0.01


List of implemented models

The following is a list of the currently implemented models.

Linear models

Model Implementation Used for
Logistic Regression LogisticRegressionClassifier Binary classification
Linear Regression LinearRegressor Regression
Polynomial Regression PolynomialRegressor Regression
Lasso (L1) Regression LassoRegressor Regression
Ridge (L2) Regression RidgeRegressor Regression

Tree-based models

Model Implementation Used for
Classification Decision Tree DecisionTreeClassifier Classification
Regression Decision Tree DecisionTreeRegressor Regression
Random Forest Classifier RandomForestClassifier Classification
Random Forest Regressor RandomForestRegressor Regression

Neighbor-based models

Model Implementation Used for
K-Nearest Neighbors Classifier KNeighborsClassifier Classification
K-Nearest Neighbors Regressor KNeighborsRegressor Regression

License

mlmodelling is licensed under the MIT License. See LICENSE for more information. A copy of the license can be found along with the code.


References

  • Deisenroth, M. P., Faisal, A. A.,, Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press.
  • Hastie, T., Tibshirani, R.,, Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer.
  • Ruder, S. (2016). An overview of gradient descent optimisation algorithms. arxiv.org/pdf/1609.04747.pdf.
  • Sutskever, I., Martens, J., Dahl, G. E. & Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. ICML-13. Vol 28. (2013): pp. 1139-1147. [pdf]
  • Sutskever, I. (2013). Training Recurrent neural Networks. PhD Thesis. University of Toronto. [pdf]
  • Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh & M. Titterington (eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics: pp. 249-256.
  • Breiman, L. (2001). Random Forests. Machine learning, 45, 5-32. doi: 10.1023/A:1010933404324