symbolic-learn is a sklearn-compatible package that implements a symbolic regression model.
Symbolic regression is a type of regression model that combines mathematical blocks to find the function that best fits the data. Here each function is represented as a binary tree like this one:
The model initially generates a random population of such functions. It then uses genetic programming techniques on it to find out the function that best fits our dataset. As this model is based on scikit-learn's base estimator, it can be used the same way you would use any sklearn model. Thus, you can use it in pipelines or apply fine-tuning techniques such as GridSearchCV on it.
Symbolic regression is best used when you want to take a naive approach to solving a regression problem. Unlike most existing models, it does not come with an a priori specification of a model. Therefore it is a good idea to use it when you want to find out and understand the mathematical structures in your data.
Here is how to instantiate and train a symbolic regression model:
from sblearn.models import SymbolicRegressor model = SymbolicRegressor() model.fit(X_train, y_train)
After training your model, you can use access the fitted function's formula and function tree through the model's attributes formulas and trees. Read the doc for more information.
In order to install the package, use this command:
pip install symbolic-learn
Note for Windows users: Microsoft Visual C++ 2014 or higher is required!