Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage refactor API and docstring tweaks #569

Merged

Conversation

zhilingc
Copy link
Collaborator

What this PR does / why we need it:
Addressing most of @ches's comments here: #529
since that PR has already been merged.

The elephant in the room is unfortunately storage-api modules having Beam dependencies, which serving should not have. The two options currently on the table are:

  1. Distill the modules further into separate write and read submodules
  2. Move ingestion job initialization to Serving

Does this PR introduce a user-facing change?:

NONE

@zhilingc zhilingc changed the base branch from master to storage-refactor March 25, 2020 08:13
@zhilingc zhilingc requested review from ches and removed request for pradithya and davidheryanto March 25, 2020 08:14
Copy link
Member

@ches ches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow up!

I think the API is improved over the proto objects, it's simpler, and HistoricalRetrievalResult has clear and justified purpose as a value object. 👍

And the code looks nicer to boot!

@woop was right that "historical retrieval" doesn't quite roll off the tongue—more syllables and characters—but I have to say I like how it reads in code here. I think I would functionally understand this quicker as a newcomer to the code base.

About the elephant in the room question of the PR description, I guess that discussion would be better held on some other thread. It'll help me to get the big picture from the roll-up PR, now that I've seen a couple of its constituents.

* @param error error that occurred
* @return {@link HistoricalRetrievalResult}
*/
public static HistoricalRetrievalResult errorResult(String id, Exception error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Stutters" as they say in Go at the call site, if static imports aren't used:

var result = HistoricalRetrievalResult.errorResult("guid", err)

// vs. maybe

var result = HistoricalRetrievalResult.error("guid", err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wonder how it'd look if you dropped Result from the class name altogether… Since it's representative of a (possibly unstarted) process to produce results, but doesn't actually directly contain the result data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it always represents a completed retrieval request. I'll update it to your former suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case it makes sense.

I feel a little reserved though about committing to the pattern of out-of-band file-based result data as API, i.e. getFileUris(). Not sure if large data sets will ever be streamed through RPC responses, but either of these options compromise the possibility for user jobs doing retrieval and processing the results to leverage data locality in systems where that could be achieved—see discussion starting at #482 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I think there's nothing actionable about my prior comment for this PR, I think it's a bigger picture discussion for #567.

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ches, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ches
Copy link
Member

ches commented Mar 27, 2020

The metrics namespacing commit got brought into this PR now, not sure that was intended?

@zhilingc
Copy link
Collaborator Author

Crap. Forgot i rebased before :( Let me remove it

@zhilingc
Copy link
Collaborator Author

/retest

@ches
Copy link
Member

ches commented Mar 27, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit 22fcf8f into feast-dev:storage-refactor Mar 27, 2020
zhilingc pushed a commit that referenced this pull request Mar 29, 2020
* API and docstring tweaks

* Fix javadoc linting errors

* Apply spotless

* Fix javadoc formatting

* Drop result from HistoricalRetrievalResult constructors
zhilingc pushed a commit that referenced this pull request Apr 3, 2020
* API and docstring tweaks

* Fix javadoc linting errors

* Apply spotless

* Fix javadoc formatting

* Drop result from HistoricalRetrievalResult constructors
zhilingc pushed a commit that referenced this pull request Apr 7, 2020
* API and docstring tweaks

* Fix javadoc linting errors

* Apply spotless

* Fix javadoc formatting

* Drop result from HistoricalRetrievalResult constructors
feast-ci-bot pushed a commit that referenced this pull request Apr 7, 2020
…567)

* Add storage interfaces, basic file structure (#529)

* Add storage interfaces, basic file structure

* Apply spotless, add comments

* Move parseResponse and isEmpty to response object

* Make changes to write interface to be more beam-like

* Pass feature specs to the retriever

* Pass feature specs to online retriever

* Add FeatureSetRequest

* Add mistakenly removed TestUtil

* Add mistakenly removed TestUtil

* Add BigQuery storage (#546)

* Add Redis storage implementation (#547)

* Add Redis storage

* Remove staleness check; can be checked at the service level

* Remove staleness related tests

* Add dependencies to top level pom

* Clean up code

* Change serving and ingestion to use storage API (#553)

* Change serving and ingestion to use storage API

* Remove extra exclusion clause

* Storage refactor API and docstring tweaks (#569)

* API and docstring tweaks

* Fix javadoc linting errors

* Apply spotless

* Fix javadoc formatting

* Drop result from HistoricalRetrievalResult constructors

* Change pipeline to use DeadletterSink API (#586)

* Add better code docs to storage refactor (#601)

* Add better code documentation, make GetFeastServingInfo independent of retriever

* Make getStagingLocation method of historical retriever

* Apply spotless

* Clean up dependencies, remove exclusions at serving (#607)

* Clean up OnlineServingService code (#605)

* Clean up OnlineServingService code to be more readable

* Revert Metrics

* Rename storage API packages to nouns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants