-
Notifications
You must be signed in to change notification settings - Fork 211
Add model-monitoring stack and update model-mesh manifests to latest #732
Add model-monitoring stack and update model-mesh manifests to latest #732
Conversation
/hold until opendatahub-io/modelmesh-serving#69 is merged in |
/hold Putting a hold on this until after we tag odh-manifests for the v1.4.2 release today |
32cfbbd
to
1653ad4
Compare
@VedantMahabaleshwarkar Since this is adding new functionality, it's going to need some type a smoke test update to verify it's actually working |
095f39a
to
dc1bf51
Compare
@LaVLaS Added tests to the PR as well |
dc1bf51
to
618c0e2
Compare
Is there some circular dependency between those items ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look.
Some stuff looks a bit alarming to me, see below.
modelmesh-monitoring/base/servicemonitors/modelmesh-federated-metrics.yaml
Show resolved
Hide resolved
@VannTen I wouldn't call it a circular dependency. Ideally the changes to |
@VannTen I wouldn't call it a circular dependency. Ideally the changes to `tests/basictests/modelmesh.sh` come after the new manifests are merged. But since we are adding new functionality, @LaVLaS wanted to include tests along with this so we can verify the PR works as intended.
The sync part (first *) could at least be split don't you think ?
Why not on adding the tests as part of the PR, but could you at least do
separate commits ? That's super hard to read and review as is.
|
618c0e2
to
b94e2fa
Compare
@VannTen separated out the testsuite changes in another commit |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/unhold
@VannTen can you dismiss your old review/ create a new review? |
/retest |
b94e2fa
to
00ac213
Compare
/retest |
00ac213
to
5150693
Compare
/retest |
187ce95
to
e91463f
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anishasthana, LaVLaS The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
With Openshift Monitoring for User Defined Projects, the bringing-in / federation of cluster-level metrics (pod container restarts , oom, all that stuff from exporter kube-state-metrics) happens automatically at the namespace-level. The only thing not accessible are node-level (node exporter) metrics. Meaning I get such things as kube_pod_container_restarts without an explicit federation servicemonitor. Could you make the federation servicemonitor and prometheus and prometheus operator optional via an overlay? That way, you'd have all the metrics gathering still in there, while making it possible for users who have monitoring for user defined projects enabled to skip the prometheus part and cluster metrics federation part. The monitoring of the metrics from odh-model-controller ServiceMonitor works with Monitoring for User Defined Projects, too, by the way. That is, the section with the custom monitoring implementation for model mesh could be removed from odh-core, as it is achieved with Monitoring for User Defined Projects. |
Please move this discussion an open issue and not a PR that has already merged |
Model-Mesh updates + add model monitoring stack
Description
tests/basictests/modelmesh.sh
odh-model-monitoring
route and verify resultHow Has This Been Tested?
opendatahub
monitoring-namespace
anddeployment-namespace
with the NS you are installing ODH components in (if it is notopendatahub
)tests/makefile
odh-manifests/tests
Note:
Overall test suite fails due to
/root/peak/util: line 23: orig_project: unbound variable
but verify that all modelmesh tests passedMerge criteria: