Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature set status JOB_STARTING to denote feature sets waiting for job to get to RUNNING state #714

Merged
merged 2 commits into from
May 18, 2020

Conversation

zhilingc
Copy link
Collaborator

@zhilingc zhilingc commented May 18, 2020

What this PR does / why we need it:
Removal of PENDING status check in #708 causes jobs to be unable to track feature set updates.

This PR adds a new status STATUS_JOB_STARTING to denote that a feature set was updated, but a job has already been initialised, preventing the JobCoordinator from spinning up new jobs in its place.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:


@zhilingc
Copy link
Collaborator Author

/test test-end-to-end-batch

@khorshuheng
Copy link
Collaborator

/lgtm

@khorshuheng
Copy link
Collaborator

/approve

@@ -147,5 +147,6 @@ message FeatureSetMeta {
enum FeatureSetStatus {
STATUS_INVALID = 0;
STATUS_PENDING = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer NOT_READY and PENDING, so pending means the job is starting. It's confusing to have pending and job_starting.

Copy link
Collaborator Author

@zhilingc zhilingc May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo NOT_READY could mean anything that's... not READY. in this context PENDING means pending job initialization and JOB_STARTING is explicit in meaning the job is in the initializing state, which is less confusing to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, this is super confusing to me.

imo NOT_READY could mean anything that's... not READY

Exactly. It's more future proof.

in this context PENDING means pending job initialization and

Then why not call it PENDING_JOB_INITIALIZATION? PENDING is meaningless.

Copy link
Member

@woop woop May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's a bit of a leaky abstraction to have job statuses on a feature set. Ideally these statuses would be on the job, not the feature set. Although I am not sure if that kind of change is in scope given our timelines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's future proof, it's nebulous and doesn't convey any meaning to the user (or developer). I'd rather be more explicit with PENDING_JOB_INITIALIZATION in that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, while jobs are being processed a feature set is also in the REGISTERED BUT NOT READY state, which is why I'm saying NOT_READY is ambiguous here. :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add on, as long as feature sets remain the main source of information for users, it should have informative (read: actionable) statuses for whoever is querying for them, rather than masking them behind catch-alls for the sake of masking the existence of jobs from users. As a user i'd definitely prefer Feast being debuggable over being magical.

Copy link
Member

@woop woop May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user i'd definitely prefer Feast being debuggable over being magical.

Taken to an extreme, should we open up the Feast internals like our database to our users? If not, then you agree that a line should be drawn somewhere.

That line is called abstraction, not "magic".

Being conservative in revealing system internals through your API is the safer approach, unless you want to have frequent breaking changes. We've had enough of those of late. If you'd read what I had said you'd have seen that I don't believe that job statuses should be attached to feature sets, and that this is a leaky abstraction. This whole PR and the whole way these statuses function requires attention and a rework, so leaking more details is definitely an approach I would want to avoid.

Also, the whole debugging argument falls flat when you consider that a mapping from a feature set to a job, where the job has an appropriate status, is sufficient for debugging.

No magic involved, just common sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing hinges on the fact that STATUS_JOB_STARTING is named as such, so I don't really see any problem with it introducing any APIs that could break in the future. Rather, the point of the name is to be distinct from the existing status PENDING, which NOT_READY really isn't.

And sure, users are able to query for the jobs, but the feature set should hold the first hand information as to where to look. I don't think it's common sense to look up the job if the status returned is NOT_READY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point of the name is to be distinct from the existing status PENDING, which NOT_READY really isn't.

What I was looking for in NOT_READY was a state that represented the fact that a spec was registered but not started. If NOT_READY is ambiguous for you then we can find another name (NOT_INITIALIZED?).

To be clear, this is a separate problem from the one I listed when I started this thread, which was the overlap between PENDING and JOB_STARTED.

@woop
Copy link
Member

woop commented May 18, 2020

/hold

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, woop, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [khorshuheng,woop,zhilingc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented May 18, 2020

/hold cancel

@zhilingc
Copy link
Collaborator Author

/test test-end-to-end-batch

1 similar comment
@zhilingc
Copy link
Collaborator Author

/test test-end-to-end-batch

@woop
Copy link
Member

woop commented May 18, 2020

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants