-
Notifications
You must be signed in to change notification settings - Fork 549
[Close as Dup] display resource utility per vc/queue metrics #2208
Comments
Is it possible to have metrics showing current resource fragmentation status (per VC)? We should manage users' expectation properly. |
I'm not sure how to show resource fragmentation status, showing node count with certain free card number? How about memory? Like what hadoop showing disk? But this is the whole cluster status, I think no vc view about this. Am I right? @mzmssg |
related. #2013 |
@xudifsd , a high priority issue is to show how many jobs are in waiting state in a VC. |
@fanyangCS yes, yarn-exporter should get that metric. We will need to display that and node count with certain free gpu card, memory, cpu graph in some page, so user can get a sense why their job is waiting. |
What's the definition of waiting status? AM launcher but no job container is regarded as RUNNING or WAITING? |
Job Waiting: Job Not Complete and Not Exist Container is Running |
As discussed in meeting, I think above picture may captured what user may find useful. It has three tables and one histogram. Meaning of each table:
All required metrics has been exported by #2289 @scarlett2018 Maybe experience team can take over this job? |
@xudifsd - sure, but I don't think experience team has capacity to do this in 0.11.0. |
Will address this in #2539 |
What would you like to be added:
A graph/list to show resource util per vc/queue
Why is this needed:
User sometime found their job is always in waiting state, this may be due to resource not enough in VC or resource fragment in node, need a graph to help them debug the issue
Without this feature, how does the current module work:
User check yarn's page manually.
Components that may involve changes:
The text was updated successfully, but these errors were encountered: