Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Template Grafana Dashboard for ModelMesh Metrics #346

Merged
merged 14 commits into from
May 27, 2023

Conversation

rafvasq
Copy link
Member

@rafvasq rafvasq commented Mar 17, 2023

Motivation

To port a previously-built Grafana dashboard developed for displaying various prometheus metrics for ModelMesh and build upon it to include metrics at the deployment/runtime-view

Modifications

  • Added config/dashboard/ directory to host grafana dashboard JSON
  • Added servicemonitor.yaml to config/prometheus/
  • Added and modified ModelMeshMetricsDashboard.json to work out-of-the-box
    • Added queries for deployment views to filter by serving runtime
    • Created sections for global-view and deployment-view visualizations
    • Created drop-down variables for filtering between one or many pre-existing runtimes with option to filter by custom serving runtime
  • Updated monitoring.md doc to include guide for setting up Prometheus/Grafana for monitoring; mainly using instructions from modelmesh-performance

Result

@rafvasq rafvasq added the enhancement New feature or request label Mar 17, 2023
@rafvasq rafvasq requested a review from njhill March 17, 2023 18:04
@rafvasq rafvasq removed the request for review from animeshsingh March 17, 2023 18:04
@rafvasq
Copy link
Member Author

rafvasq commented Mar 17, 2023

Local dashboard example with a mix of Triton (tf/onnx) and MLServer runtime models deployed:

Screenshot 2023-03-17 at 2 10 40 PM
Screenshot 2023-03-17 at 2 10 51 PM
Screenshot 2023-03-17 at 2 11 10 PM

@rafvasq rafvasq marked this pull request as ready for review March 17, 2023 18:34
@rafvasq rafvasq changed the title feat: Add Pre-built Grafana Dashboard for ModelMesh Metrics feat: Add Template Grafana Dashboard for ModelMesh Metrics Apr 4, 2023
@rafvasq
Copy link
Member Author

rafvasq commented Apr 5, 2023

Local deployment showing dropdown runtime box (could type in name of custom serving runtime too to filter for that) and deployment-view section

screencapture-localhost-3000-d-123vMm-rt-7z-new-modelmesh-dashboard-3-2023-04-05-15_07_18

@ckadner ckadner added this to the v0.11.0 milestone Apr 14, 2023
@VedantMahabaleshwarkar
Copy link

Would it be possible to have the dashboard json file in config/dashboard/ instead of config/grafana ? It is possible to take a grafana dashboard json but not use grafana itself to construct the dashboard. This would make the directory more "generic" in my opinion without affecting functionality.
eg: Openshift 4.11 + takes grafana dashboards json files and reconstructs the dashboard internally in the same UI page without having to use the grafana service.

@rafvasq
Copy link
Member Author

rafvasq commented Apr 19, 2023

@VedantMahabaleshwarkar thanks for pointing that out. I agree config/dashboard will generalize the directory holding the JSON and additionally I think that the servicemonitor.yaml can be moved to the already-existing config/prometheus directory too.

njhill
njhill previously approved these changes May 26, 2023
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rafvasq, apologies for taking so long to get to this.

rafvasq added 12 commits May 26, 2023 16:35
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
@njhill
Copy link
Member

njhill commented May 27, 2023

@rafvasq I tried rebasing this but it seems to be failing the FVTs for some reason.

Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
@rafvasq
Copy link
Member Author

rafvasq commented May 27, 2023

@rafvasq I tried rebasing this but it seems to be failing the FVTs for some reason.

@njhill Similar problem here #359 but all passing now that the MLServer PR is in

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhill, rafvasq

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@njhill
Copy link
Member

njhill commented May 27, 2023

/lgtm

@kserve-oss-bot kserve-oss-bot merged commit 54f51cd into kserve:main May 27, 2023
lgdeloss pushed a commit to lgdeloss/modelmesh-serving that referenced this pull request Jun 5, 2023
#### Motivation
To port a previously-built Grafana dashboard developed for displaying various prometheus metrics for ModelMesh and build upon it to include metrics at the deployment/runtime-view

#### Modifications
- Added `config/dashboard/` directory to host grafana dashboard JSON
- Added `servicemonitor.yaml` to `config/prometheus/`
- Added and modified `ModelMeshMetricsDashboard.json` to work out-of-the-box
  - Added queries for deployment views to filter by serving runtime
  - Created sections for global-view and deployment-view visualizations
  - Created drop-down variables for filtering between one or many pre-existing runtimes with option to filter by custom serving runtime
- Updated `monitoring.md` doc to include guide for setting up Prometheus/Grafana for monitoring; mainly using instructions from [modelmesh-performance](https://github.com/kserve/modelmesh-performance/tree/main/docs/monitoring)

#### Result
- A functional, ready-to-use dashboard JSON
- General documentation outlining out to set up monitoring and use the JSON
- Closes kserve#335

Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Luis Delossantos <luisgd@ibm.com>
lgdeloss pushed a commit to lgdeloss/modelmesh-serving that referenced this pull request Jun 6, 2023
#### Motivation
To port a previously-built Grafana dashboard developed for displaying various prometheus metrics for ModelMesh and build upon it to include metrics at the deployment/runtime-view

#### Modifications
- Added `config/dashboard/` directory to host grafana dashboard JSON
- Added `servicemonitor.yaml` to `config/prometheus/`
- Added and modified `ModelMeshMetricsDashboard.json` to work out-of-the-box
  - Added queries for deployment views to filter by serving runtime
  - Created sections for global-view and deployment-view visualizations
  - Created drop-down variables for filtering between one or many pre-existing runtimes with option to filter by custom serving runtime
- Updated `monitoring.md` doc to include guide for setting up Prometheus/Grafana for monitoring; mainly using instructions from [modelmesh-performance](https://github.com/kserve/modelmesh-performance/tree/main/docs/monitoring)

#### Result
- A functional, ready-to-use dashboard JSON
- General documentation outlining out to set up monitoring and use the JSON
- Closes kserve#335

Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>
Signed-off-by: Luis Delossantos <luisgd@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved enhancement New feature or request lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-built grafana dashboard for modelmesh metrics
5 participants