Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove opening file object when validating S3 parquet source #3217

Merged
merged 2 commits into from
Sep 16, 2022

Conversation

mzwiessele
Copy link
Contributor

@mzwiessele mzwiessele commented Sep 15, 2022

Remove opening file object when validating S3 parquet source.
Let instead PyArrow handle opening the path using the filesystem.

What this PR does / why we need it:
Fixes issue #3216 when trying to feast apply a parquet dataset to the feature registry.

Which issue(s) this PR fixes:

Fixes #3216

@mzwiessele mzwiessele changed the title Remove opening the file object fix: Remove opening the file object Sep 15, 2022
Let pyarrow handle opening the path using the filesystem.

Signed-off-by: Max Z <max.zwiessele@babylonhealth.com>
@mzwiessele mzwiessele changed the title fix: Remove opening the file object fix: Remove opening file object when validating S3 parquet source Sep 15, 2022
@mzwiessele
Copy link
Contributor Author

/assign @adchia

@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2022

Codecov Report

Base: 67.03% // Head: 58.28% // Decreases project coverage by -8.75% ⚠️

Coverage data is based on head (ea1a315) compared to base (7bc1dff).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3217      +/-   ##
==========================================
- Coverage   67.03%   58.28%   -8.76%     
==========================================
  Files         175      209      +34     
  Lines       15948    17659    +1711     
==========================================
- Hits        10691    10292     -399     
- Misses       5257     7367    +2110     
Flag Coverage Δ
integrationtests ?
unittests 58.28% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...k/python/feast/infra/offline_stores/file_source.py 79.09% <ø> (-15.46%) ⬇️
...sts/integration/registration/test_universal_cli.py 20.20% <0.00%> (-79.80%) ⬇️
...ts/integration/offline_store/test_offline_write.py 26.08% <0.00%> (-73.92%) ⬇️
...fline_store/test_universal_historical_retrieval.py 28.75% <0.00%> (-71.25%) ⬇️
...ests/integration/e2e/test_python_feature_server.py 29.50% <0.00%> (-70.50%) ⬇️
...dk/python/tests/integration/e2e/test_validation.py 27.55% <0.00%> (-69.30%) ⬇️
...s/integration/registration/test_universal_types.py 32.25% <0.00%> (-67.75%) ⬇️
sdk/python/feast/infra/online_stores/redis.py 28.39% <0.00%> (-66.67%) ⬇️
sdk/python/tests/integration/e2e/test_usage_e2e.py 33.87% <0.00%> (-66.13%) ⬇️
sdk/python/tests/data/data_creator.py 34.78% <0.00%> (-65.22%) ⬇️
... and 160 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Max Z <max.zwiessele@babylonhealth.com>
@adchia
Copy link
Collaborator

adchia commented Sep 16, 2022

Ideally, there'd be a test for this running, but you'd be blocked on making a file within the Feast integration tests S3 bucket. Mind making an issue for that separately? @mzwiessele

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, mzwiessele

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit a906018 into feast-dev:master Sep 16, 2022
felixwang9817 pushed a commit that referenced this pull request Sep 20, 2022
# [0.25.0](v0.24.0...v0.25.0) (2022-09-20)

### Bug Fixes

* Broken Feature Service Link ([#3227](#3227)) ([e117082](e117082))
* Feature-server image is missing mysql dependency for mysql registry ([#3223](#3223)) ([ae37b20](ae37b20))
* Fix handling of TTL in Go server ([#3232](#3232)) ([f020630](f020630))
* Fix materialization when running on Spark cluster. ([#3166](#3166)) ([175fd25](175fd25))
* Fix push API to respect feature view's already inferred entity types ([#3172](#3172)) ([7c50ab5](7c50ab5))
* Fix release workflow ([#3144](#3144)) ([20a9dd9](20a9dd9))
* Fix Shopify timestamp bug and add warnings to help with debugging entity registration ([#3191](#3191)) ([de75971](de75971))
* Handle complex Spark data types in SparkSource ([#3154](#3154)) ([5ddb83b](5ddb83b))
* Local staging location provision ([#3195](#3195)) ([cdf0faf](cdf0faf))
* Remove bad snowflake offline store method ([#3204](#3204)) ([dfdd0ca](dfdd0ca))
* Remove opening file object when validating S3 parquet source ([#3217](#3217)) ([a906018](a906018))
* Snowflake config file search error ([#3193](#3193)) ([189afb9](189afb9))
* Update Snowflake Online docs ([#3206](#3206)) ([7bc1dff](7bc1dff))

### Features

* Add `to_remote_storage` functionality to `SparkOfflineStore` ([#3175](#3175)) ([2107ce2](2107ce2))
* Add ability to give boto extra args for registry config ([#3219](#3219)) ([fbc6a2c](fbc6a2c))
* Add health endpoint to py server ([#3202](#3202)) ([43222f2](43222f2))
* Add snowflake support for date & number with scale ([#3148](#3148)) ([50e8755](50e8755))
* Add tag kwarg to set Snowflake online store table path ([#3176](#3176)) ([39aeea3](39aeea3))
* Add workgroup to athena offline store config ([#3139](#3139)) ([a752211](a752211))
* Implement spark materialization engine ([#3184](#3184)) ([a59c33a](a59c33a))
@mzwiessele mzwiessele deleted the patch-1 branch September 20, 2022 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to apply parquet dataset from s3
4 participants