Merge pull request #24 from jordanvolz/jav/feature-pipeline

jav/feature-pipeline
jordanvolz · Sep 11, 2023 · 0b9b8b9 · 0b9b8b9
2 parents 77c3a11 + d0179e3
commit 0b9b8b9
Show file tree

Hide file tree

Showing 62 changed files with 1,918 additions and 120 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,9 @@
 # lolpop
 A software engineering framework to jump start your Machine Learning projects
 
+![Meet Larry, the lolpop dragon.](docs/src/assets/lolpop.png)
+
+Full documentation can be accessed [here](https://lolpop.readthedocs.io). 
 ## Installing 
 
 You can install lolpop from PyPI using `pip`: 
@@ -13,7 +16,7 @@ If you're working in dev mode, you can clone this repo and install lolpop by `cd
 
 ```bash
 poetry install 
-``` 
+```
 
 Welcome to lolpop!
 
@@ -127,7 +130,7 @@ runner = MyRunner(conf=config_file)
 model = runner.train.train_model(data)
 
 ... 
-``` 
+```
 
 or via the lolpop cli: 
 

diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -7,6 +7,7 @@ edit_url: docs/src/
 repo_name: lolpop
 theme: 
   name: material
+  logo: assets/lolpop.png
   features: 
     - navigation.instant
     - navigation.tracking
@@ -126,6 +127,11 @@ nav:
         - Postgres: postgres_data_transformer.md
         - Redshift: redshift_data_transformer.md 
         - Snowflake: snowflake_data_transformer.md
+      - Feature Transformers: 
+        - BaseFeatureTransformer: base_feature_transformer.md
+        - Feature Engine: feature_engine_feature_transformer.md
+        - Local: local_feature_transformer.md
+        - scikit-learn: sklearn_feature_transformer.md
       - Generative AI Chatbots: 
         - BaseGenAIChatbot: base_genai_chatbot.md
         - OpenAI: openai_chatbot.md 

diff --git a/docs/src/assets/lolpop.png b/docs/src/assets/lolpop.png
diff --git a/docs/src/assets/lolpop_logo.png b/docs/src/assets/lolpop_logo.png
diff --git a/docs/src/base_feature_transformer.md b/docs/src/base_feature_transformer.md
@@ -0,0 +1,86 @@
+## Overview
+
+A `feature_transformer` is a component that transforms data into features for a ML model. This is consists of encoding or scaling values to make them better suited for model training. Contrast this with a `data_transformer`, which contains more of a data engineering-style workflow around reshaping or creating new data. 
+
+Feature transformers can either be set at the `train` pipeline level, or at the `model_trainer` component level. If set at the pipeline level, the transformer will apply to every model created in the pipeline (e.g. if you are doing hyperparameter tuning across multiple experiments and wish to use the same transformer for each). Setting a feature transformer at the `model_trainer` level will apply only to that model trainer. This can be useful if you wish to override the pipeline feature transformer for a particular model type. 
+
+## Attributes
+
+`BaseDataConnector` contains no default attributes. 
+
+## Configuration
+
+`BaseDataConnector` contains no the following required components: 
+
+- `metadata_tracker`
+- `resource_version_control`
+
+
+## Interface
+
+The following methods are part of `BaseFeatureTransformer` and should be implemented in any class that inherits from this base class: 
+
+### fit 
+
+```python
+def fit(self, data, *args, **kwargs) -> Any
+```
+
+**Arguments**: 
+
+- `data` (object): The source data to fit the feature transformer on. This should be something like a local python object (pandas.DataFrame).
+
+**Returns**:
+
+- `transformer` (Any): Returns a fitted feature transformer.
+
+
+### transform
+
+Transforms data using the feature transformer. 
+
+```python
+def transform(self, data, *args, **kwargs) -> Any
+```
+
+**Arguments**: 
+
+- `data` (object): The data to transform with the fitted feature transformer. This could be something like a local python object (pandas.DataFrame).
+
+**Returns**:
+
+- `data_out` (Any): Returns a data object, such as a `pandas` Dataframe, which has been transformed by the feature transformer. 
+
+### fit_transform 
+Fits the transformer to the provided data, and then transform that data using the fitted feature transformer. 
+
+```python
+def fit_transform(self, data, *args, **kwargs) -> Any
+```
+
+**Arguments**: 
+
+- `data` (object): The data to fit and transform with the fitted feature transformer. This could be something like a local python object (pandas.DataFrame).
+
+**Returns**:
+
+- `data_out` (Any): Returns a data object, such as a `pandas` Dataframe, which has been transformed by the feature transformer. 
+
+
+## Default Methods 
+
+The following methods are implemented in the base class. You may find a need to overwrite them as you implement your own feature transformers.
+### save 
+Saves the feature transformer into a resource version control system. 
+
+```python
+def save(self, experiment, *args, **kwargs) -> Any
+```
+
+**Arguments**: 
+
+- `experiment` (object): The experiment in which to save the feature transformer. This object should be created by the `metadata_tracker`.
+
+**Returns**:
+
+- Nothing. 
diff --git a/docs/src/base_hyperparameter_tuner.md b/docs/src/base_hyperparameter_tuner.md
@@ -77,15 +77,13 @@ def build_model(self, data, model_version, algo, params, trainer_config={}, *arg
 Version controls and saves the model object and any associated artifacts to the `resource_version_control` system and `metadata_tracker`.
 
 ```python
-def save_model(self, model, experiment, params, algo, *args, **kwargs)
+def save_model(self, model, experiment *args, **kwargs)
 ```
 
 **Arguments**: 
 
 - `model` (object): The model object created during this experiment. 
 - `experiment` (experiment): The `metadata_tracker` experiment created for this experiment.  
-- `params` (dict): The training parameters used in the experiment
-- `algo` (str): The algorithm used in this experiment. 
 
 
 ### _build_training_grid

diff --git a/docs/src/base_model_trainer.md b/docs/src/base_model_trainer.md
@@ -8,6 +8,7 @@ A `model_trainer` is a component that essentially acts as a wrapper around a lib
 `BaseModelTrainer` contains the following default attributes: 
 
 - `model`: The trained model object. This should get set in the `fit` function. 
+- `feature_transformer`: The feature transformer used to transform data before passing it to the model. This is optional. The `feature_transformer` can be set specifically for each `ModelTrainer` class used in a workflow, or at the `pipeline` level which will be used as the default for all `ModelTrainer` classes if not overridden. 
 - `mlflow_module`: The name of the MLFlow submodule which contains the proper `log_model` method for this trainer. This is only needed if you intend to use MLFlow as your model repository
 - `params`: The training parameters for the trained model. 
 
@@ -215,10 +216,173 @@ def rebuild_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]
 
 **Arguments**: 
 
-- `data` (object): dictionary of training/test/valiadation data. 
+- `data` (object): dictionary of training/test/validation data. 
 - `model_version` (object): model version object
 
 **Returns**: 
 
 - `model`: the trained model
-- `exp`: experiment where the model was trained
+- `exp`: experiment where the model was trained
+
+
+### transform_and_fit
+
+Transforms data using a feature transform and then fits the model to the transformed data. 
+
+```python
+ def transform_and_fit(self, data_dict, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data` (object): dictionary of training/test/validation data. 
+
+**Returns**: 
+
+- `model`: the trained model
+
+
+### transform_and_predict
+
+Transforms data using a feature transform and then creates predictions from the transformed data. 
+
+```python
+ def transform_and_predict(self, data, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data` (object): dictionary of training/test/validation data. 
+
+**Returns**: 
+
+- `predictions`: the predictions
+
+### transform_and_predict_df
+
+Transforms a single dataframe using a feature transform and then creates predictions from the transformed dataframe. 
+
+```python
+ def transform_and_predict_df(self, data, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data` (object): dataframe 
+
+**Returns**: 
+
+- `predictions`: the predictions
+
+### transform_and_predict_proba_df
+
+Transforms a single dataframe using a feature transform and then creates class predictions predictions from the transformed dataframe. 
+
+```python
+def transform_and_predict_proba_df(self, data, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data` (object): dataframe 
+
+**Returns**: 
+
+- `predictions`: the predictions
+
+
+### fit_transform_data
+
+Fits feature transformer to data and then transforms that data using the fitted transformer. 
+
+```python
+def fit_transform_data(self, X_data, y_data, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `X_data` (object): Feature data to fit & transform 
+- `y_data` (object): Label data.
+
+**Returns**: 
+
+- `transformed_data`: the transformed data
+
+
+### fit_data
+
+Fits feature transformer to data
+
+```python
+def fit_data(self, X_data, y_data *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `X_data` (object): Feature data to fit & transform 
+- `y_data` (object): Label data.
+
+**Returns**: 
+
+- `feature_transformer`: the fitted feature transformer
+
+
+### transform_data
+
+Transforms a single dataframe using a feature transform.
+
+```python
+def transform_data(self, data, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data` (object): dataframe 
+
+**Returns**: 
+
+- `transformed_data`: the transformed_data
+
+### _transform_dict
+
+Transforms a dictionary of train/test/validation data sets. 
+
+```python
+def transform_data(self, data_dict, *args, **kwargs)
+ ```
+
+**Arguments**: 
+
+- `data_dict` (dictionary): dictionary of train/test/validation data sts.  
+
+**Returns**: 
+
+- `transformed_data_dict`: returns the same dictionary, now with transformed data 
+
+### _get_transformer
+
+Returns the model's feature transformer
+
+```python
+def _get_transformer_(self)
+ ```
+
+**Returns**: 
+
+- `self.feature_transformer`: the model's feature transformer
+
+### _set_transformer
+
+Sets the model's feature transformer
+
+```python
+def _set_transformer_(self, transformer)
+ ```
+
+**Arguments**: 
+
+- `transformer` (object): The feature transformer to set for the model trainer. 
+
+**Returns**: 
+
+- None
diff --git a/docs/src/base_resource_version_control.md b/docs/src/base_resource_version_control.md
@@ -83,4 +83,38 @@ def get_model(self, experiment, *args, **kwargs) -> Any
 
 **Returns**: 
 
-- `model`: The model object from the experiment. 
+- `model`: The model object from the experiment. 
+
+
+### version_feature_transformer
+
+Versions a feature transformer.   
+
+```python
+def version_feature_transformer(self, experiment, transformer, *args, **kwargs) -> dict[str, Any]
+```
+
+**Arguments**: 
+
+- `experiment` (object): The experiment being verisoned
+- `transformer` (object): The feature transformer to version
+
+**Returns**: 
+
+- `dict`: Attributes returned from the resource version control system, such as a commit hash. The returned information should be able to be used to retrieve the object in the future and may very likely be logged in the `metadata_tracker`
+
+### get_feature_transformer
+
+Returns a feature transformer object from an experiment.   
+
+```python
+def get_feature_transformer(self, experiment, *args, **kwargs) -> Any
+```
+
+**Arguments**: 
+
+- `experiment` (object): The experiment to retrieve the feature_transformer from
+
+**Returns**: 
+
+- `feature_transformer`: The feature transformer object from the experiment.