feat: Add per model metrics #90

VedantMahabaleshwarkar · 2023-04-20T01:38:13Z

Motivation

#60

Modifications

Added modelId parameter to logTimingMetricDuration function in Metrics.java
- This adds the modelId label to the following metrics : modelmesh_cache_miss_milliseconds, modelmesh_loadmodel_milliseconds, modelmesh_unloadmodel_milliseconds, modelmesh_req_queue_delay_milliseconds, modelmesh_model_sizing_milliseconds, modelmesh_age_at_eviction_milliseconds
Added modelId parameter to logSizeEventMetric function in Metrics.java
- This adds the modelId label to the following metrics : modelmesh_loading_queue_delay_milliseconds, modelmesh_loaded_model_size_bytes
Added modelId and vmodelId parameter to logRequestMetrics function in Metrics.java
- This adds the modelId/vmodelId (whichever applicable) to the following metrics: modelmesh_invoke_model_milliseconds, modelmesh_api_request_milliseconds

Result

Metrics listed above have the modelid label attached to the emitted metric

rafvasq · 2023-05-01T16:27:21Z

Hey @VedantMahabaleshwarkar, just wondering if this is replacing #86? If so, maybe that one can be closed.

VedantMahabaleshwarkar · 2023-05-01T18:33:57Z

Hey @VedantMahabaleshwarkar, just wondering if this is replacing #86? If so, maybe that one can be closed.

@rafvasq Yep this PR will be replacing the old one, I'll close it out.

njhill

Thanks @VedantMahabaleshwarkar, and apologies for taking so long to look at this.

It looks like a good start but I think some more work is needed around how modelIds / vmodelIds are handled.

Currently, in the logRequestMetrics method, you added both modelId and vmodelId args, but only ever pass in one of those.. and inside the method only choose the non-null one. So it may as well just take one arg.

However I think we should aim to always log the model id, and additionally log the vmodel id too whenever it's used.

Requests can be targeted directly at concrete models (modelId) and in this case it's pretty straightforward and the changes you've made should cover it.

But when they are targeted at a vmodelId, we should:

Include a vmodelId for all the same metrics that we include modelId - this will require new changes to propagate the vmodelId with the request because currently it's "lost" below the top of the stack, once it's been resolved to a modelId.
Also include the resolved modelId everywhere for all these same metrics. This is already done in all these places lower in the stack, but the top level logRequestMetrics in the ModelMeshApi class will only have the vModelId and so we'll need to make sure the resolved modelId is obtained in this same place and included as a label alongside the vmodelId.

For propagating the vMmodelId, I would suggest adding it to the Litelinks ThreadContext ... this is a ThreadLocal map which will be automatically be transferred between modelmesh pods if the resquest if forwarded.

I hope the above makes some sense, feel free to ping if not!

src/main/java/com/ibm/watson/modelmesh/Metrics.java

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java

VedantMahabaleshwarkar · 2023-06-02T19:46:48Z

@njhill Implemented all of your suggestions.
When a request is targeted for a vmodelid, the resolved modelid is also included as a label. According to your conversation I have added the vmodelid to the threadcontext to propogate it before we resolve it to the modelid.
The metrics with these new labels will always have both labels, but with an empty string value if the label is not applicable. (This is because you can't pass in a null value as a label)

VedantMahabaleshwarkar · 2023-06-07T19:43:34Z

/test all

kserve-oss-bot · 2023-06-07T19:43:37Z

@VedantMahabaleshwarkar: No presubmit jobs available for kserve/modelmesh@main

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

njhill

Thanks @VedantMahabaleshwarkar, added some more comments.

src/main/java/com/ibm/watson/modelmesh/Metrics.java

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java

src/main/java/com/ibm/watson/modelmesh/Metrics.java

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java

src/main/java/com/ibm/watson/modelmesh/SidecarModelMesh.java

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java

VedantMahabaleshwarkar · 2023-07-03T20:25:41Z

@njhill addressed most of your comments and avoided passing around the boolean isVModel. (@danielezonca helped :) )
The only thing we didn't quite get was your suggestion to use FastThreadLocal to store the vModelId. It seemed redundant given that we are already adding it to the ThreadContext? Other than that every suggestion has been implemented.

Other than that @danielezonca also had a small suggestion to change the VModelManager Class. Let me know if it should be kept for a separate PR.

rafvasq

While I can't speak to the the use of FastThreadLocal, in terms of usability, I deployed the changes myself and was able to filter metrics by model using the new parameter modelId. I'd also recommend saving those changes to the VModelManager class for a separate PR if it's not directly required for this one.

FYI @ckadner

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java

ckadner

I have a few more nit-picks. Beyond those it looks like there are a few more unresolved conversations, beyond the "FastThreadLocal" topic.

src/main/java/com/ibm/watson/modelmesh/Metrics.java

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java

njhill

Thanks @VedantMahabaleshwarkar @danielezonca and apologies for the delay, I was out last week and still catching up.

The only thing we didn't quite get was your suggestion to use FastThreadLocal to store the vModelId. It seemed redundant given that we are already adding it to the ThreadContext?

I was referring to the resolved modelId here, not the vModelId. And it's just for the purpose of "passing it out" of this method so that it can be included in the metrics recorded in the calling function in addition to the vModelId.

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java

ckadner · 2023-07-25T19:43:14Z

Hi Vedant, since you force-pushed the last commit, I cannot find where/whether Nicks suggestion made it into your last code changes. Are the remaining 7 unresolved conversations actually resolved? Also, you will need to update your branch once more :-)

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

Use ThreadLocal to store resolved modelId in vModel case Revert/simplify a few things Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Update per-model metric changes

VedantMahabaleshwarkar · 2023-08-29T19:27:10Z

@njhill pulled in your changes into this PR

njhill

Thanks @VedantMahabaleshwarkar for all of your patience with this!

kserve-oss-bot · 2023-09-05T21:21:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhill, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [njhill]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Add `modelId` parameter to `logTimingMetricDuration` function in `Metrics.java`: - `modelmesh_cache_miss_milliseconds` - `modelmesh_loadmodel_milliseconds` - `modelmesh_unloadmodel_milliseconds` - `modelmesh_req_queue_delay_milliseconds` - `modelmesh_model_sizing_milliseconds` - `modelmesh_age_at_eviction_milliseconds` - Add `modelId` parameter to `logSizeEventMetric` function in `Metrics.java`: - `modelmesh_loading_queue_delay_milliseconds` - `modelmesh_loaded_model_size_bytes` - Add `modelId` and `vModelId` param to `logRequestMetrics` in `Metrics.java`: - `modelmesh_invoke_model_milliseconds` - `modelmesh_api_request_milliseconds` Closes red-hat-data-services#60 Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Prashant Sharma <prashsh1@in.ibm.com> Co-authored-by: Daniele Zonca <dzonca@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Motivation #90 introduced support for per-model prometheus metrics but the intention was not to change the default behaviour and have this as something enabled explicitly via configuration. However, it was inadvertently made the default. Modifications Change default behaviour to not include modelId/vModelId prometheus metric labels. This is important because model-mesh was designed primarily for use cases where there is a very large and changing number of individual models and those scenarios would result in a much greater number of individual metrics than prometheus can handle. Result Original behaviour restored Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Functionality added in #90 Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Functionality added in #90 --------- Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Motivation #90 introduced support for per-model prometheus metrics but the intention was not to change the default behaviour and have this as something enabled explicitly via configuration. However, it was inadvertently made the default. Modifications Change default behaviour to not include modelId/vModelId prometheus metric labels. This is important because model-mesh was designed primarily for use cases where there is a very large and changing number of individual models and those scenarios would result in a much greater number of individual metrics than prometheus can handle. Result Original behaviour restored Signed-off-by: Nick Hill <nickhill@us.ibm.com>

PR #90 introduced support for per-model prometheus metrics with the intention to not change the default behavior but require this as a feature to be enabled explicitly via configuration. However, it was inadvertently made the default. This commit restores the original behavior by changing the default configuration to not include modelId/vModelId prometheus metric labels because model-mesh was designed primarily for use cases where there is a very large and changing number of individual models and those scenarios would result in a much greater number of individual metrics than prometheus can handle. ------ Signed-off-by: Nick Hill <nickhill@us.ibm.com>

kserve-oss-bot added the do-not-merge/work-in-progress label Apr 20, 2023

kserve-oss-bot requested review from ckadner and tjohnson31415 April 20, 2023 01:38

VedantMahabaleshwarkar force-pushed the modelmetrics branch from b7f7cb9 to 86a9bf2 Compare April 30, 2023 22:23

VedantMahabaleshwarkar marked this pull request as ready for review May 1, 2023 16:11

kserve-oss-bot removed the do-not-merge/work-in-progress label May 1, 2023

rafvasq assigned VedantMahabaleshwarkar May 1, 2023

VedantMahabaleshwarkar changed the title ~~Modelmetrics~~ feat: Add per model metrics May 1, 2023

VedantMahabaleshwarkar mentioned this pull request May 1, 2023

feat: Add per model metrics #86

Closed

VedantMahabaleshwarkar linked an issue May 2, 2023 that may be closed by this pull request

Support individual model metrics #60

Closed

VedantMahabaleshwarkar force-pushed the modelmetrics branch 2 times, most recently from fe24408 to 35df353 Compare May 9, 2023 16:55

njhill reviewed Jun 1, 2023

View reviewed changes

VedantMahabaleshwarkar mentioned this pull request Jun 1, 2023

Oneroute opendatahub-io/odh-model-controller#48

Closed

rafvasq added this to the v0.11.0 milestone Jun 2, 2023

VedantMahabaleshwarkar force-pushed the modelmetrics branch 2 times, most recently from fac0f37 to b111f0e Compare June 2, 2023 19:38

VedantMahabaleshwarkar force-pushed the modelmetrics branch from 096c729 to 247910a Compare June 7, 2023 19:39

VedantMahabaleshwarkar force-pushed the modelmetrics branch from 247910a to 4f1130d Compare June 7, 2023 20:02

njhill reviewed Jun 8, 2023

View reviewed changes

VedantMahabaleshwarkar force-pushed the modelmetrics branch 2 times, most recently from e8415b8 to 5d2e8fd Compare July 3, 2023 20:05

rafvasq reviewed Jul 14, 2023

View reviewed changes

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved

ckadner requested changes Jul 18, 2023

View reviewed changes

src/main/java/com/ibm/watson/modelmesh/Metrics.java Show resolved Hide resolved

src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved

njhill reviewed Jul 20, 2023

View reviewed changes

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Show resolved Hide resolved

src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved

VedantMahabaleshwarkar force-pushed the modelmetrics branch from 6e4feaa to 000cce7 Compare July 21, 2023 00:09

ScrapCodes and others added 5 commits July 25, 2023 16:28

Add modelid and vmodelid labels to metrics.

2b1ba71

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

addressing PR comments

33829a2

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

WIP, fix text + some changes

d32b31a

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

add a sanity check while getting key from contextmap

3410e04

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

Address PR comments

72e1c8e

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>

VedantMahabaleshwarkar force-pushed the modelmetrics branch from 000cce7 to 72e1c8e Compare July 25, 2023 20:30

njhill and others added 3 commits August 20, 2023 08:00

Update per-model metric changes

37eed31

Use ThreadLocal to store resolved modelId in vModel case Revert/simplify a few things Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Merge pull request #1 from njhill/modelmetrics

7f7ae87

Update per-model metric changes

Merge branch 'main' into modelmetrics

0d1feff

ckadner modified the milestones: v0.11.0, v0.11.1 Aug 29, 2023

njhill approved these changes Sep 5, 2023

View reviewed changes

kserve-oss-bot added the approved label Sep 5, 2023

ckadner merged commit df5e6a5 into kserve:main Sep 6, 2023

RobGeada mentioned this pull request Oct 25, 2023

Revert PayloadProcessor ModelId to remain consistent with original value #113

Closed

njhill mentioned this pull request Nov 13, 2023

Include the resolved model Id in Payloads #88

Closed

njhill mentioned this pull request Nov 16, 2023

fix: Per-model metrics labels should be disabled by default #124

Merged

njhill added a commit that referenced this pull request Nov 16, 2023

docs: Add per_model_metrics parameter to metrics.md

c304cfa

Functionality added in #90 Signed-off-by: Nick Hill <nickhill@us.ibm.com>

njhill mentioned this pull request Nov 16, 2023

docs: Add per_model_metrics parameter to metrics.md #126

Merged

rafvasq pushed a commit that referenced this pull request Nov 20, 2023

docs: Add per_model_metrics parameter to metrics.md (#126)

68677c6

Functionality added in #90 --------- Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add per model metrics #90

feat: Add per model metrics #90

VedantMahabaleshwarkar commented Apr 20, 2023 •

edited

Loading

rafvasq commented May 1, 2023

VedantMahabaleshwarkar commented May 1, 2023

njhill left a comment

VedantMahabaleshwarkar commented Jun 2, 2023

VedantMahabaleshwarkar commented Jun 7, 2023

kserve-oss-bot commented Jun 7, 2023

njhill left a comment

VedantMahabaleshwarkar commented Jul 3, 2023

rafvasq left a comment •

edited

Loading

ckadner left a comment

njhill left a comment

ckadner commented Jul 25, 2023

VedantMahabaleshwarkar commented Aug 29, 2023

njhill left a comment

kserve-oss-bot commented Sep 5, 2023

feat: Add per model metrics #90

feat: Add per model metrics #90

Conversation

VedantMahabaleshwarkar commented Apr 20, 2023 • edited Loading

Motivation

Modifications

Result

rafvasq commented May 1, 2023

VedantMahabaleshwarkar commented May 1, 2023

njhill left a comment

Choose a reason for hiding this comment

VedantMahabaleshwarkar commented Jun 2, 2023

VedantMahabaleshwarkar commented Jun 7, 2023

kserve-oss-bot commented Jun 7, 2023

njhill left a comment

Choose a reason for hiding this comment

VedantMahabaleshwarkar commented Jul 3, 2023

rafvasq left a comment • edited Loading

Choose a reason for hiding this comment

ckadner left a comment

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

ckadner commented Jul 25, 2023

VedantMahabaleshwarkar commented Aug 29, 2023

njhill left a comment

Choose a reason for hiding this comment

kserve-oss-bot commented Sep 5, 2023

VedantMahabaleshwarkar commented Apr 20, 2023 •

edited

Loading

rafvasq left a comment •

edited

Loading