Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[Launcher]: Revise the definition of Framework running state #2135

Merged
merged 1 commit into from
Feb 12, 2019

Conversation

yqwang-ms
Copy link
Member

@yqwang-ms yqwang-ms commented Feb 12, 2019

Solve Issues:
#2022
#2051
#2099

New Definition:

Framework is running <-> Exists running Task

This makes the Launcher APIs reflect the real Framework running state, instead of just the raw AM running state. (Since We always want to best effort hide AM concept to end user).
The definition works for both Incremental and Gang Scheduling, because even for Gang Scheduling, “Exists running Task” means the Framework has already satisfied Gang Allocation, and all its Tasks has already been launched before.

Why not implement in RestServer?
To revise state, RestServer List operation need to also read all TaskStatuses, which is too heavy.

Why implement in LauncherWebServer instead of LauncherService?
It is hard to make sure the FrameworkStatus is consistent with the TaskStatuses outside WebServer.
However, this will make the exposed FrameworkState is not consistent with the backend, but it is fine because the revised state, i.e. APPLICATION_RUNNING and APPLICATION_WAITING are generally exchangeable even in the backend.

How about K8S Launcher?
K8S Launcher treat the whole Framework spec and status as a single CRD object, it is easy to make it consistent even in the backend.
Will add a new state FrameworkAttemptPreparing to indicate there is no running Task (such as during Gang Allocation Phase) even if FrameworkAttempt object is already created.

Long Term Plan: Generalize Scheduling Policy

  1. Abstract Framework and Task States which is independent from YARN or K8S platform
  2. Inter-TaskRole Dependency (A Ready -> B Start, A Produced -> B Start)
  3. Per-TaskRole Gang Scheduling
  4. Etc

@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.1%) to 52.794% when pulling c89ebf2 on yqwang/launcher-dev into 2c213c9 on master.
#Closed

3 similar comments
@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.1%) to 52.794% when pulling c89ebf2 on yqwang/launcher-dev into 2c213c9 on master.
#Closed

@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.1%) to 52.794% when pulling c89ebf2 on yqwang/launcher-dev into 2c213c9 on master.
#Closed

@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.1%) to 52.794% when pulling c89ebf2 on yqwang/launcher-dev into 2c213c9 on master.
#Closed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants