Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add to_remote_storage functionality to SparkOfflineStore #3175

Merged
merged 1 commit into from
Sep 7, 2022

Conversation

niklasvm
Copy link
Collaborator

@niklasvm niklasvm commented Sep 3, 2022

What this PR does / why we need it:

Add to_remote_storage method to SparkRetrivalJob to write to remote storage. Both a local file-based and s3-based option have been implemented.

This is facilitated by 2 new config parameters for the SparkOfflineStore:
staging_location: should either start with file:// or s3:// to specify uri accordingly
region: aws region if applicable

Spark Universal tests pass. This is untested with an S3-based staging_location.

This PR is required in preparation for implementing a SparkBatchMaterializationEngine in a later PR.

Which issue(s) this PR fixes: None

First step towards solving #3167

@niklasvm niklasvm changed the title Add spark remote storage feat: add to_remote_storage functionality to SparkOfflineStore Sep 3, 2022
@niklasvm niklasvm changed the title feat: add to_remote_storage functionality to SparkOfflineStore feat: add to_remote_storage functionality to SparkOfflineStore Sep 3, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2022

Codecov Report

Base: 67.03% // Head: 76.10% // Increases project coverage by +9.06% 🎉

Coverage data is based on head (a4b9fa7) compared to base (b4ef834).
Patch coverage: 27.27% of modified lines in pull request are covered.

❗ Current head a4b9fa7 differs from pull request most recent head cf8b646. Consider uploading reports for the commit cf8b646 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3175      +/-   ##
==========================================
+ Coverage   67.03%   76.10%   +9.06%     
==========================================
  Files         175      211      +36     
  Lines       15941    17925    +1984     
==========================================
+ Hits        10686    13641    +2955     
+ Misses       5255     4284     -971     
Flag Coverage Δ
integrationtests 66.86% <ø> (-0.17%) ⬇️
unittests 58.30% <27.27%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/contrib/spark_offline_store/tests/data_source.py 39.28% <0.00%> (ø)
...ffline_stores/contrib/spark_offline_store/spark.py 32.57% <29.03%> (ø)
...on/feast/infra/materialization/snowflake_engine.py 92.13% <0.00%> (-0.46%) ⬇️
...tores/contrib/trino_offline_store/trino_queries.py 15.05% <0.00%> (ø)
...ib/trino_offline_store/test_config/manual_tests.py 33.33% <0.00%> (ø)
...s/contrib/mssql_offline_store/tests/data_source.py 46.93% <0.00%> (ø)
...thon/feast/infra/online_stores/contrib/postgres.py 32.69% <0.00%> (ø)
...offline_stores/contrib/mssql_repo_configuration.py 100.00% <0.00%> (ø)
...s/contrib/trino_offline_store/connectors/upload.py 8.97% <0.00%> (ø)
... and 102 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: niklasvm <niklasvm@gmail.com>
@niklasvm niklasvm changed the title feat: add to_remote_storage functionality to SparkOfflineStore feat: Add to_remote_storage functionality to SparkOfflineStore Sep 3, 2022
@niklasvm niklasvm marked this pull request as ready for review September 3, 2022 11:26
@niklasvm niklasvm force-pushed the add_spark_remote_storage branch 2 times, most recently from cf8b646 to 844bb83 Compare September 4, 2022 13:18
@kevjumba kevjumba self-requested a review September 7, 2022 19:38
@kevjumba kevjumba self-assigned this Sep 7, 2022
Copy link
Collaborator

@kevjumba kevjumba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kevjumba, niklasvm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 2107ce2 into feast-dev:master Sep 7, 2022
@niklasvm niklasvm deleted the add_spark_remote_storage branch September 8, 2022 06:16
felixwang9817 pushed a commit that referenced this pull request Sep 20, 2022
# [0.25.0](v0.24.0...v0.25.0) (2022-09-20)

### Bug Fixes

* Broken Feature Service Link ([#3227](#3227)) ([e117082](e117082))
* Feature-server image is missing mysql dependency for mysql registry ([#3223](#3223)) ([ae37b20](ae37b20))
* Fix handling of TTL in Go server ([#3232](#3232)) ([f020630](f020630))
* Fix materialization when running on Spark cluster. ([#3166](#3166)) ([175fd25](175fd25))
* Fix push API to respect feature view's already inferred entity types ([#3172](#3172)) ([7c50ab5](7c50ab5))
* Fix release workflow ([#3144](#3144)) ([20a9dd9](20a9dd9))
* Fix Shopify timestamp bug and add warnings to help with debugging entity registration ([#3191](#3191)) ([de75971](de75971))
* Handle complex Spark data types in SparkSource ([#3154](#3154)) ([5ddb83b](5ddb83b))
* Local staging location provision ([#3195](#3195)) ([cdf0faf](cdf0faf))
* Remove bad snowflake offline store method ([#3204](#3204)) ([dfdd0ca](dfdd0ca))
* Remove opening file object when validating S3 parquet source ([#3217](#3217)) ([a906018](a906018))
* Snowflake config file search error ([#3193](#3193)) ([189afb9](189afb9))
* Update Snowflake Online docs ([#3206](#3206)) ([7bc1dff](7bc1dff))

### Features

* Add `to_remote_storage` functionality to `SparkOfflineStore` ([#3175](#3175)) ([2107ce2](2107ce2))
* Add ability to give boto extra args for registry config ([#3219](#3219)) ([fbc6a2c](fbc6a2c))
* Add health endpoint to py server ([#3202](#3202)) ([43222f2](43222f2))
* Add snowflake support for date & number with scale ([#3148](#3148)) ([50e8755](50e8755))
* Add tag kwarg to set Snowflake online store table path ([#3176](#3176)) ([39aeea3](39aeea3))
* Add workgroup to athena offline store config ([#3139](#3139)) ([a752211](a752211))
* Implement spark materialization engine ([#3184](#3184)) ([a59c33a](a59c33a))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants