Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Contrib azure provider with synapse/mssql offline store and Azure registry store #3072

Merged
merged 51 commits into from
Aug 19, 2022

Conversation

kevjumba
Copy link
Collaborator

@kevjumba kevjumba commented Aug 11, 2022

What this PR does / why we need it:
Moving https://github.com/Azure/feast-azure feast_azure_provider package into feast as contrib. Does not move the kubernetes cluster configuration.

Which issue(s) this PR fixes:

Fixes #

@kevjumba kevjumba changed the title Azure provider feat: Contrib azure provider with synapse/mssql offline store and Azure registry store Aug 11, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2022

Codecov Report

Merging #3072 (516ff76) into master (29f2895) will increase coverage by 8.16%.
The diff coverage is 44.50%.

@@            Coverage Diff             @@
##           master    #3072      +/-   ##
==========================================
+ Coverage   67.86%   76.03%   +8.16%     
==========================================
  Files         176      214      +38     
  Lines       15507    17897    +2390     
==========================================
+ Hits        10524    13608    +3084     
+ Misses       4983     4289     -694     
Flag Coverage Δ
integrationtests 66.96% <42.94%> (-0.91%) ⬇️
unittests 57.87% <40.31%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/data_source.py 81.68% <ø> (+23.44%) ⬆️
sdk/python/feast/feature_store.py 85.60% <ø> (+3.66%) ⬆️
sdk/python/feast/infra/provider.py 77.94% <ø> (ø)
sdk/python/feast/infra/registry/registry.py 80.83% <ø> (+5.20%) ⬆️
...sts/integration/registration/test_universal_cli.py 100.00% <ø> (ø)
setup.py 0.00% <0.00%> (ø)
sdk/python/feast/type_map.py 57.62% <10.52%> (-2.75%) ⬇️
sdk/python/feast/repo_config.py 87.54% <25.00%> (+4.40%) ⬆️
sdk/python/feast/inference.py 85.39% <33.33%> (-0.66%) ⬇️
...ffline_stores/contrib/mssql_offline_store/mssql.py 38.42% <38.42%> (ø)
... and 119 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@kevjumba kevjumba force-pushed the azure_provider branch 2 times, most recently from dba6b1f to 7fdc2ae Compare August 12, 2022 19:42
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, but we could theoretically setup CI by using https://docs.microsoft.com/en-us/azure/azure-sql-edge/disconnected-deployment and using test-containers.

## Functionality Matrix

The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Spark offline store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not spark?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/roadmap.md Outdated
@@ -10,7 +10,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Redshift source](https://docs.feast.dev/reference/data-sources/redshift)
* [x] [BigQuery source](https://docs.feast.dev/reference/data-sources/bigquery)
* [x] [Parquet file source](https://docs.feast.dev/reference/data-sources/file)
* [x] [Synapse source (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Synapse source (contrib plugin)](https://docs.feast.dev/reference/data-sources/mssql)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be Azure Synapse + Azure SQL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

make_tzaware,
)

# Make sure spark warning doesn't raise more than once.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not spark

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, we can kill this provider probably or use passthroughprovider

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -297,6 +297,7 @@ def from_proto(data_source: DataSourceProto) -> Any:
raise ValueError("Could not identify the source type being added.")

if data_source_type == DataSourceProto.SourceType.CUSTOM_SOURCE:
data_source.data_source_class_type = "feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source.MsSqlServerSource"
cls = get_data_source_class_from_type(data_source.data_source_class_type)
Copy link

@Lejboelle Lejboelle Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding the class type here, I believe it should be declared in the to_proto() function of the MsSqlServerSource class. This is done in case of the other offline contrib stores and also ensures that there are no conflicts with any other future custom sources.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kevjumba kevjumba force-pushed the azure_provider branch 2 times, most recently from 2eaebdd to 8d92603 Compare August 17, 2022 18:57
super().__init__(project_name)
self.tables_created: List[str] = []
self.container = SqlServerContainer(user=MSSQL_USER, password=MSSQL_PASSWORD)
# self.container = fixture_request.getfixturevalue("mssql_container")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all the commented line here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

pa_type_as_str = str(pa_type).lower()
if pa_type_as_str.startswith("timestamp"):
if "tz=" in pa_type_as_str:
return "DATETIME2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why this is upper case and not lower case like everything else?


## Description

MsSql data sources are Microsoft sql table sources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SQL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look fixed to me? (Microsoft sql table sources)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, I realized I hadn't pushed my changes last night.


## Description

The MsSql offline store provides support for reading [MsSQL Sources](../data-sources/mssql.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call out synapse or sql server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The objective of this tutorial is to build a model that predicts if a driver will complete a trip based on a number of features ingested into Feast. During this tutorial you will:

1. Deploy the infrastructure for a feature store (using an ARM template)
1. Register features into a central feature registry hosted on Blob Storage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these numbers seem wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@adchia adchia self-assigned this Aug 17, 2022
return f"{self.project_name}_{destination_name}"

def teardown(self):
self.container.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you always use the fixture as I suggest above, then this shouldn't be needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

entity_df[entity_df_event_timestamp_col], utc=True
).fillna(pandas.Timestamp.now())

# TODO: figure out how to deal with entity dataframes that are strings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you throw an exception here? e.g. some not implemented error so users know they can't pass a SQL string for MsSQL to generate the entity dataframe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


## Description

MsSql data sources are Microsoft sql table sources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look fixed to me? (Microsoft sql table sources)

## 6. Running Feast Azure Tutorials locally without Azure workspace

* If you are on a free tier instance, you will not be able to deploy the azure deployment because the azure workspace requires VCPUs and the free trial subscription does not have a quota.
* The workaround is to remove the `Microsoft.MachineLearningServices/workspaces/computes` resource from `fs_snapse_azure_deploy.json` and setting up the environment locally.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? assuming should be synapse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch! fixed

@kevjumba kevjumba force-pushed the azure_provider branch 2 times, most recently from 35b123d to 452379f Compare August 18, 2022 17:24
@kevjumba kevjumba requested a review from adchia August 18, 2022 17:26

@pytest.fixture(scope="session")
def mssql_container():
container = SqlServerContainer(user=MSSQL_USER, password=MSSQL_PASSWORD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in the azure-sql-edge image as mentioned above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize you actually wanted to use this image. Updated.

project,
test_repo_config,
repo_dir_name: Path,
fixture_request: Optional[pytest.FixtureRequest],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to pass a fixture_request actually. the test environment already has a datasourcecreator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kevjumba kevjumba force-pushed the azure_provider branch 2 times, most recently from de40738 to f998d8b Compare August 19, 2022 02:57
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, kevjumba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Danny Chiao <danny@tecton.ai>
@feast-ci-bot feast-ci-bot removed the lgtm label Aug 19, 2022
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
@adchia
Copy link
Collaborator

adchia commented Aug 19, 2022

/lgtm

@feast-ci-bot feast-ci-bot merged commit 9f7e557 into feast-dev:master Aug 19, 2022
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
@bgalvao
Copy link

bgalvao commented Sep 8, 2022

Hi, I am wondering if it is possible to use this for a self-hosted MSSQL Server database as an Offline Store?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants