Skip to content

Commit

Permalink
server-master(engine): update readme (#7729)
Browse files Browse the repository at this point in the history
ref #7728
  • Loading branch information
amyangfei authored Nov 30, 2022
1 parent 3f4a8a7 commit 802c73e
Showing 1 changed file with 11 additions and 29 deletions.
40 changes: 11 additions & 29 deletions engine/servermaster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,22 @@ The master module consists of several components:
- handle heartbeat (keep alive)
- user interface
- submit job
- update job
- delete job
- schedule jobs
- query job(s)
- cancel job
- schedule and failover jobs
- ExecutorManager
- Handle Heartbeat
- Notify Resource Manager updates the status infos
- Maintain the stats of Executors. There are several Status for every Executor.
- **Initing** means the executor is under initialization. It happens when executor has sent `Register Executor` request but has yet sent Heartbeat.
- **Running** means executor is running and is able to receive more tasks.
- **Busy** means executor is running but cannot receive more tasks.
- **Disconnected** happens when master sends a request but meets I/O fails.
- **Tombstone** this executor has been disconnected for a while. We regard it has been dead.
- **Check liveness**.
- Executor Manager check every executor whether the heartbeat has been timeout.
- Once an executor is offline by `heartbeat timeout`, it should notify the job manager to reschedule all the tasks on it.
- Maintain the aliveness of Executors.
- Check liveness
- Executor Manager check every executor whether the heartbeat has been timeout.
- Once an executor is offline by `heartbeat timeout`, it should notify the job manager to reschedule all the tasks on it.
- Executor Client
- Communicate with an executor, such as `SubmitTasks`. If meet transilient fault, it will set executor as `Disconnected`
- ResourceManager
- ResourceManager is an interface but implemented by Executor Manager
- The Cost Model is supposed to have two types
- Expected Usage
- Real Usage
- The Occupied resources in `Resource Manager` should be `sum(max(expected usage, real usage)`. The real usage will be updated by heartbeats.
- Scheduler
- It serves as a service that designates tasks to different executors. It implements multiple allocation strategies.
- Embeds two independent interfaces
- `ExecutorServiceClient`, which is used to dispatch task to executor.
- `BrokerServiceClient`, which is used to manage resource belongs to an executor.
- JobManager
- Receive SubmitJob Request, Check the type of Job, Create the JobMaster and Build Plan.
- Receive SubmitJob Request, Check the type of Job, Create the JobMaster.
- JobMaster (per job)
- Generate the Tasks for the job
- Schedule and Dispatch Tasks
1. Receive the Dispatch Job Request
2. Send schedule request to scheduler, get `SubJob - Tasks - Executor` tuple.
- if resource is not enough, get back to step 2.
3. Start working thread.
4. Dispatch the tasks.
- If failed, go back to step 2.
- If success, Update HA Store.

0 comments on commit 802c73e

Please sign in to comment.