-
Notifications
You must be signed in to change notification settings - Fork 549
add script to generate reports for OpenPai cluster #2507
Conversation
Should we add a document? |
The intended usage will be
and
The refresh command should be called periodically, maybe every 10 minutes. Will write a doc on how to setup the cron job and more fancy usage. |
may relay on #2449 |
@xudifsd Just made some inline changes for first 2 paragraphs: https://github.com/Microsoft/pai/blob/07c464838e7db343045a96dcacdb2d5c10532ec9/docs/tools/how-to-setup-report-script.md. Will review the rest next Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some inline changes here for first 2 paragraphs: https://github.com/Microsoft/pai/pull/2507/files#diff-02f09b898fe248561eae9ff55b7e58d0.
Will review the rest next Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xudifsd Feedbacks for cluster_report_alert csv:
-
can we format the start to a Time format in csv? Same as duration, need in the d:h:m:s format for usability.
-
There are several duplicated raws, what does it means? can we merge those duplications?
-
I did not get what's a "cleaner-ds-2cfmk" instance, which Node is it bending to? how ops can act after seeing this info. can you help me understand more? Same questions to all the other alerts with none ip instance values.
|
|
||
First, log into the node you choose, put the [script](../../src/tools/reports.py) somewhere, for example, I put it in directory `/home/core/report`, edit the crontab using | ||
|
||
``` sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's more like services than tools, why not make it an optional service, like alert-manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bin object this, if it is a service, we have to maintain its status, and if it is a tool, the users are responsible for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but it's not that user friendly, and might mess host envs. Is it necessary to tell user how to uninstall this tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe we should change it to service in the future.
generate csv reports for OpenPai cluster.
Document is in the header of the script.