Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TasksEqualStable in dispatcher breaks global services #1291

Closed
aaronlehmann opened this issue Aug 2, 2016 · 8 comments
Closed

Using TasksEqualStable in dispatcher breaks global services #1291

aaronlehmann opened this issue Aug 2, 2016 · 8 comments
Milestone

Comments

@aaronlehmann
Copy link
Collaborator

Global service tasks are held at the scheduler until the scheduler determines that the node has the required resources. When it confirms this, it sets the task state (NOT desired state) to "assigned".

Because of recent changes to the dispatcher, this state change is no longer a sufficient reason to send an assignment set update to the agent. So until something else triggers an assignment set update, the task can be stuck in ASSIGNED.

This is a race that doesn't happen as much before #1287, but after #1287, it nearly always happens. I can still trigger it sometimes without #1287, though.

cc @dongluochen @aluzzardi

@aaronlehmann aaronlehmann added this to the 1.12.1 milestone Aug 2, 2016
@aaronlehmann
Copy link
Collaborator Author

For 1.12.1 we could work around this by treating a transition to ASSIGNED as a modification, but ignore other status changes.

This seems very hacky, though.

Perhaps the scheduler should be advancing the desired state of the task, not just its observed state. Global tasks would start out with DesiredState = ASSIGNED and then the scheduler would update it to DesiredState = RUNNING once it's ready.

@dongluochen
Copy link
Contributor

Perhaps the scheduler should be advancing the desired state of the task, not just its observed state. Global tasks would start out with DesiredState = ASSIGNED and then the scheduler would update it to DesiredState = RUNNING once it's ready.

I like this idea. What should the start out DesiredState, ASSIGNED or READY? I haven't found any place DesiredState set to ASSIGNED.

@aaronlehmann
Copy link
Collaborator Author

It should start as ASSIGNED, because we want the scheduler to place it in ASSIGNED once that's possible. And when the scheduler does that, it will update DesiredState to RUNNING.

The downside to this is that previously only the orchestrator controlled desired state. Having the scheduler change it may not be a good idea, because it could fight with the orchestrator. For example, if the orchestrator decides it doesn't want that task anymore, it would set the desired state to SHUTDOWN. It would be really bad if the scheduler could change it to RUNNING. It may be difficult to handle all these corner cases. I'm not sure setting DesiredState outside the orchestrator is a good idea.

@dongluochen
Copy link
Contributor

dongluochen commented Aug 2, 2016

How about this?

  1. Global orchestrator creates a new task {State: api.TaskStateNew, DesiredState: api.TaskStateAssigned}.
  2. Scheduler sets task State to api.TaskStateAssigned after processing.
  3. Global orchestrator monitors task update. When it gets a task {State: api.TaskStateAssigned, DesiredState: api.TaskStateAssigned}, it updates DesiredState to api.TaskStateRunning.

@aaronlehmann
Copy link
Collaborator Author

I like that design but I think doing batching of the desired state change might make the code complex. WDYT?

@dongluochen
Copy link
Contributor

Global orchestrator has not done much batching today.

@dongluochen
Copy link
Contributor

@aaronlehmann Since the change is temporary, should we reopen (and rename) this issue for 1.13? We may still go this way but should make the decision along with other changes.

@aaronlehmann
Copy link
Collaborator Author

Sure, feel free to file a new issue that references this one.

I think it should be a different issue since it's more about "find a better solution" than "fix this bug".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants