-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into odfv-updated
- Loading branch information
Showing
49 changed files
with
1,510 additions
and
301 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Data Quality Monitoring | ||
|
||
Data Quality Monitoring (DQM) is a Feast module aimed to help users to validate their data with the user-curated set of rules. | ||
Validation could be applied during: | ||
* Historical retrieval (training dataset generation) | ||
* [planned] Writing features into an online store | ||
* [planned] Reading features from an online store | ||
|
||
Its goal is to address several complex data problems, namely: | ||
* Data consistency - new training datasets can be significantly different from previous datasets. This might require a change in model architecture. | ||
* Issues/bugs in the upstream pipeline - bugs in upstream pipelines can cause invalid values to overwrite existing valid values in an online store. | ||
* Training/serving skew - distribution shift could significantly decrease the performance of the model. | ||
|
||
> To monitor data quality, we check that the characteristics of the tested dataset (aka the tested dataset's profile) are "equivalent" to the characteristics of the reference dataset. | ||
> How exactly profile equivalency should be measured is up to the user. | ||
### Overview | ||
|
||
The validation process consists of the following steps: | ||
1. User prepares reference dataset (currently only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported). | ||
2. User defines profiler function, which should produce profile by given dataset (currently only profilers based on [Great Expectations](https://docs.greatexpectations.io) are allowed). | ||
3. Validation of tested dataset is performed with reference dataset and profiler provided as parameters. | ||
|
||
### Preparations | ||
Feast with Great Expectations support can be installed via | ||
```shell | ||
pip install 'feast[ge]' | ||
``` | ||
|
||
### Dataset profile | ||
Currently, Feast supports only [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite) | ||
as dataset's profile. Hence, the user needs to define a function (profiler) that would receive a dataset and return an [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite). | ||
|
||
Great Expectations supports automatic profiling as well as manually specifying expectations: | ||
```python | ||
from great_expectations.dataset import Dataset | ||
from great_expectations.core.expectation_suite import ExpectationSuite | ||
|
||
from feast.dqm.profilers.ge_profiler import ge_profiler | ||
|
||
@ge_profiler | ||
def automatic_profiler(dataset: Dataset) -> ExpectationSuite: | ||
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler | ||
|
||
return UserConfigurableProfiler( | ||
profile_dataset=dataset, | ||
ignored_columns=['conv_rate'], | ||
value_set_threshold='few' | ||
).build_suite() | ||
``` | ||
However, from our experience capabilities of automatic profiler are quite limited. So we would recommend crafting your own expectations: | ||
```python | ||
@ge_profiler | ||
def manual_profiler(dataset: Dataset) -> ExpectationSuite: | ||
dataset.expect_column_max_to_be_between("column", 1, 2) | ||
return dataset.get_expectation_suite() | ||
``` | ||
|
||
|
||
|
||
### Validating Training Dataset | ||
During retrieval of historical features, `validation_reference` can be passed as a parameter to methods `.to_df(validation_reference=...)` or `.to_arrow(validation_reference=...)` of RetrievalJob. | ||
If parameter is provided Feast will run validation once dataset is materialized. In case if validation successful materialized dataset is returned. | ||
Otherwise, `feast.dqm.errors.ValidationFailed` exception would be raised. It will consist of all details for expectations that didn't pass. | ||
|
||
```python | ||
from feast import FeatureStore | ||
|
||
fs = FeatureStore(".") | ||
|
||
job = fs.get_historical_features(...) | ||
job.to_df( | ||
validation_reference=fs | ||
.get_saved_dataset("my_reference_dataset") | ||
.as_reference(profiler=manual_profiler) | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Feature servers | ||
|
||
Feast users can choose to retrieve features from a feature server, as opposed to through the Python SDK. | ||
|
||
{% page-ref page="local-feature-server.md" %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,4 +8,3 @@ offline_store: {} | |
flags: | ||
alpha_features: true | ||
on_demand_transforms: true | ||
python_feature_server: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.