Skip to content

Commit

Permalink
Merge pull request #24 from jordanvolz/jav/feature-pipeline
Browse files Browse the repository at this point in the history
jav/feature-pipeline
  • Loading branch information
jordanvolz authored Sep 11, 2023
2 parents 77c3a11 + d0179e3 commit 0b9b8b9
Show file tree
Hide file tree
Showing 62 changed files with 1,918 additions and 120 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# lolpop
A software engineering framework to jump start your Machine Learning projects

![Meet Larry, the lolpop dragon.](docs/src/assets/lolpop.png)

Full documentation can be accessed [here](https://lolpop.readthedocs.io).
## Installing

You can install lolpop from PyPI using `pip`:
Expand All @@ -13,7 +16,7 @@ If you're working in dev mode, you can clone this repo and install lolpop by `cd

```bash
poetry install
```
```

Welcome to lolpop!

Expand Down Expand Up @@ -127,7 +130,7 @@ runner = MyRunner(conf=config_file)
model = runner.train.train_model(data)

...
```
```

or via the lolpop cli:

Expand Down
6 changes: 6 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ edit_url: docs/src/
repo_name: lolpop
theme:
name: material
logo: assets/lolpop.png
features:
- navigation.instant
- navigation.tracking
Expand Down Expand Up @@ -126,6 +127,11 @@ nav:
- Postgres: postgres_data_transformer.md
- Redshift: redshift_data_transformer.md
- Snowflake: snowflake_data_transformer.md
- Feature Transformers:
- BaseFeatureTransformer: base_feature_transformer.md
- Feature Engine: feature_engine_feature_transformer.md
- Local: local_feature_transformer.md
- scikit-learn: sklearn_feature_transformer.md
- Generative AI Chatbots:
- BaseGenAIChatbot: base_genai_chatbot.md
- OpenAI: openai_chatbot.md
Expand Down
Binary file added docs/src/assets/lolpop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/assets/lolpop_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
86 changes: 86 additions & 0 deletions docs/src/base_feature_transformer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
## Overview

A `feature_transformer` is a component that transforms data into features for a ML model. This is consists of encoding or scaling values to make them better suited for model training. Contrast this with a `data_transformer`, which contains more of a data engineering-style workflow around reshaping or creating new data.

Feature transformers can either be set at the `train` pipeline level, or at the `model_trainer` component level. If set at the pipeline level, the transformer will apply to every model created in the pipeline (e.g. if you are doing hyperparameter tuning across multiple experiments and wish to use the same transformer for each). Setting a feature transformer at the `model_trainer` level will apply only to that model trainer. This can be useful if you wish to override the pipeline feature transformer for a particular model type.

## Attributes

`BaseDataConnector` contains no default attributes.

## Configuration

`BaseDataConnector` contains no the following required components:

- `metadata_tracker`
- `resource_version_control`


## Interface

The following methods are part of `BaseFeatureTransformer` and should be implemented in any class that inherits from this base class:

### fit

```python
def fit(self, data, *args, **kwargs) -> Any
```

**Arguments**:

- `data` (object): The source data to fit the feature transformer on. This should be something like a local python object (pandas.DataFrame).

**Returns**:

- `transformer` (Any): Returns a fitted feature transformer.


### transform

Transforms data using the feature transformer.

```python
def transform(self, data, *args, **kwargs) -> Any
```

**Arguments**:

- `data` (object): The data to transform with the fitted feature transformer. This could be something like a local python object (pandas.DataFrame).

**Returns**:

- `data_out` (Any): Returns a data object, such as a `pandas` Dataframe, which has been transformed by the feature transformer.

### fit_transform
Fits the transformer to the provided data, and then transform that data using the fitted feature transformer.

```python
def fit_transform(self, data, *args, **kwargs) -> Any
```

**Arguments**:

- `data` (object): The data to fit and transform with the fitted feature transformer. This could be something like a local python object (pandas.DataFrame).

**Returns**:

- `data_out` (Any): Returns a data object, such as a `pandas` Dataframe, which has been transformed by the feature transformer.


## Default Methods

The following methods are implemented in the base class. You may find a need to overwrite them as you implement your own feature transformers.
### save
Saves the feature transformer into a resource version control system.

```python
def save(self, experiment, *args, **kwargs) -> Any
```

**Arguments**:

- `experiment` (object): The experiment in which to save the feature transformer. This object should be created by the `metadata_tracker`.

**Returns**:

- Nothing.
4 changes: 1 addition & 3 deletions docs/src/base_hyperparameter_tuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,13 @@ def build_model(self, data, model_version, algo, params, trainer_config={}, *arg
Version controls and saves the model object and any associated artifacts to the `resource_version_control` system and `metadata_tracker`.

```python
def save_model(self, model, experiment, params, algo, *args, **kwargs)
def save_model(self, model, experiment *args, **kwargs)
```

**Arguments**:

- `model` (object): The model object created during this experiment.
- `experiment` (experiment): The `metadata_tracker` experiment created for this experiment.
- `params` (dict): The training parameters used in the experiment
- `algo` (str): The algorithm used in this experiment.


### _build_training_grid
Expand Down
168 changes: 166 additions & 2 deletions docs/src/base_model_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ A `model_trainer` is a component that essentially acts as a wrapper around a lib
`BaseModelTrainer` contains the following default attributes:

- `model`: The trained model object. This should get set in the `fit` function.
- `feature_transformer`: The feature transformer used to transform data before passing it to the model. This is optional. The `feature_transformer` can be set specifically for each `ModelTrainer` class used in a workflow, or at the `pipeline` level which will be used as the default for all `ModelTrainer` classes if not overridden.
- `mlflow_module`: The name of the MLFlow submodule which contains the proper `log_model` method for this trainer. This is only needed if you intend to use MLFlow as your model repository
- `params`: The training parameters for the trained model.

Expand Down Expand Up @@ -215,10 +216,173 @@ def rebuild_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]

**Arguments**:

- `data` (object): dictionary of training/test/valiadation data.
- `data` (object): dictionary of training/test/validation data.
- `model_version` (object): model version object

**Returns**:

- `model`: the trained model
- `exp`: experiment where the model was trained
- `exp`: experiment where the model was trained


### transform_and_fit

Transforms data using a feature transform and then fits the model to the transformed data.

```python
def transform_and_fit(self, data_dict, *args, **kwargs)
```

**Arguments**:

- `data` (object): dictionary of training/test/validation data.

**Returns**:

- `model`: the trained model


### transform_and_predict

Transforms data using a feature transform and then creates predictions from the transformed data.

```python
def transform_and_predict(self, data, *args, **kwargs)
```

**Arguments**:

- `data` (object): dictionary of training/test/validation data.

**Returns**:

- `predictions`: the predictions

### transform_and_predict_df

Transforms a single dataframe using a feature transform and then creates predictions from the transformed dataframe.

```python
def transform_and_predict_df(self, data, *args, **kwargs)
```

**Arguments**:

- `data` (object): dataframe

**Returns**:

- `predictions`: the predictions

### transform_and_predict_proba_df

Transforms a single dataframe using a feature transform and then creates class predictions predictions from the transformed dataframe.

```python
def transform_and_predict_proba_df(self, data, *args, **kwargs)
```

**Arguments**:

- `data` (object): dataframe

**Returns**:

- `predictions`: the predictions


### fit_transform_data

Fits feature transformer to data and then transforms that data using the fitted transformer.

```python
def fit_transform_data(self, X_data, y_data, *args, **kwargs)
```

**Arguments**:

- `X_data` (object): Feature data to fit & transform
- `y_data` (object): Label data.

**Returns**:

- `transformed_data`: the transformed data


### fit_data

Fits feature transformer to data

```python
def fit_data(self, X_data, y_data *args, **kwargs)
```

**Arguments**:

- `X_data` (object): Feature data to fit & transform
- `y_data` (object): Label data.

**Returns**:

- `feature_transformer`: the fitted feature transformer


### transform_data

Transforms a single dataframe using a feature transform.

```python
def transform_data(self, data, *args, **kwargs)
```

**Arguments**:

- `data` (object): dataframe

**Returns**:

- `transformed_data`: the transformed_data

### _transform_dict

Transforms a dictionary of train/test/validation data sets.

```python
def transform_data(self, data_dict, *args, **kwargs)
```

**Arguments**:

- `data_dict` (dictionary): dictionary of train/test/validation data sts.

**Returns**:

- `transformed_data_dict`: returns the same dictionary, now with transformed data

### _get_transformer

Returns the model's feature transformer

```python
def _get_transformer_(self)
```

**Returns**:

- `self.feature_transformer`: the model's feature transformer

### _set_transformer

Sets the model's feature transformer

```python
def _set_transformer_(self, transformer)
```

**Arguments**:

- `transformer` (object): The feature transformer to set for the model trainer.

**Returns**:

- None
36 changes: 35 additions & 1 deletion docs/src/base_resource_version_control.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,38 @@ def get_model(self, experiment, *args, **kwargs) -> Any

**Returns**:

- `model`: The model object from the experiment.
- `model`: The model object from the experiment.


### version_feature_transformer

Versions a feature transformer.

```python
def version_feature_transformer(self, experiment, transformer, *args, **kwargs) -> dict[str, Any]
```

**Arguments**:

- `experiment` (object): The experiment being verisoned
- `transformer` (object): The feature transformer to version

**Returns**:

- `dict`: Attributes returned from the resource version control system, such as a commit hash. The returned information should be able to be used to retrieve the object in the future and may very likely be logged in the `metadata_tracker`

### get_feature_transformer

Returns a feature transformer object from an experiment.

```python
def get_feature_transformer(self, experiment, *args, **kwargs) -> Any
```

**Arguments**:

- `experiment` (object): The experiment to retrieve the feature_transformer from

**Returns**:

- `feature_transformer`: The feature transformer object from the experiment.
Loading

0 comments on commit 0b9b8b9

Please sign in to comment.