Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add per model metrics #90

Merged
merged 8 commits into from
Sep 6, 2023

Conversation

VedantMahabaleshwarkar
Copy link
Contributor

@VedantMahabaleshwarkar VedantMahabaleshwarkar commented Apr 20, 2023

Motivation

#60

Modifications

  • Added modelId parameter to logTimingMetricDuration function in Metrics.java
    • This adds the modelId label to the following metrics : modelmesh_cache_miss_milliseconds, modelmesh_loadmodel_milliseconds, modelmesh_unloadmodel_milliseconds, modelmesh_req_queue_delay_milliseconds, modelmesh_model_sizing_milliseconds, modelmesh_age_at_eviction_milliseconds
  • Added modelId parameter to logSizeEventMetric function in Metrics.java
    • This adds the modelId label to the following metrics : modelmesh_loading_queue_delay_milliseconds, modelmesh_loaded_model_size_bytes
  • Added modelId and vmodelId parameter to logRequestMetrics function in Metrics.java
    • This adds the modelId/vmodelId (whichever applicable) to the following metrics: modelmesh_invoke_model_milliseconds, modelmesh_api_request_milliseconds

Result

Metrics listed above have the modelid label attached to the emitted metric

@rafvasq
Copy link
Member

rafvasq commented May 1, 2023

Hey @VedantMahabaleshwarkar, just wondering if this is replacing #86? If so, maybe that one can be closed.

@VedantMahabaleshwarkar VedantMahabaleshwarkar changed the title Modelmetrics feat: Add per model metrics May 1, 2023
@VedantMahabaleshwarkar
Copy link
Contributor Author

Hey @VedantMahabaleshwarkar, just wondering if this is replacing #86? If so, maybe that one can be closed.

@rafvasq Yep this PR will be replacing the old one, I'll close it out.

@VedantMahabaleshwarkar VedantMahabaleshwarkar linked an issue May 2, 2023 that may be closed by this pull request
@VedantMahabaleshwarkar VedantMahabaleshwarkar force-pushed the modelmetrics branch 2 times, most recently from fe24408 to 35df353 Compare May 9, 2023 16:55
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VedantMahabaleshwarkar, and apologies for taking so long to look at this.

It looks like a good start but I think some more work is needed around how modelIds / vmodelIds are handled.

Currently, in the logRequestMetrics method, you added both modelId and vmodelId args, but only ever pass in one of those.. and inside the method only choose the non-null one. So it may as well just take one arg.

However I think we should aim to always log the model id, and additionally log the vmodel id too whenever it's used.

Requests can be targeted directly at concrete models (modelId) and in this case it's pretty straightforward and the changes you've made should cover it.

But when they are targeted at a vmodelId, we should:

  • Include a vmodelId for all the same metrics that we include modelId - this will require new changes to propagate the vmodelId with the request because currently it's "lost" below the top of the stack, once it's been resolved to a modelId.
  • Also include the resolved modelId everywhere for all these same metrics. This is already done in all these places lower in the stack, but the top level logRequestMetrics in the ModelMeshApi class will only have the vModelId and so we'll need to make sure the resolved modelId is obtained in this same place and included as a label alongside the vmodelId.

For propagating the vMmodelId, I would suggest adding it to the Litelinks ThreadContext ... this is a ThreadLocal map which will be automatically be transferred between modelmesh pods if the resquest if forwarded.

I hope the above makes some sense, feel free to ping if not!

src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved
@VedantMahabaleshwarkar
Copy link
Contributor Author

@njhill Implemented all of your suggestions.
When a request is targeted for a vmodelid, the resolved modelid is also included as a label. According to your conversation I have added the vmodelid to the threadcontext to propogate it before we resolve it to the modelid.
The metrics with these new labels will always have both labels, but with an empty string value if the label is not applicable. (This is because you can't pass in a null value as a label)

@VedantMahabaleshwarkar
Copy link
Contributor Author

/test all

@kserve-oss-bot
Copy link
Collaborator

@VedantMahabaleshwarkar: No presubmit jobs available for kserve/modelmesh@main

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VedantMahabaleshwarkar, added some more comments.

src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMeshApi.java Outdated Show resolved Hide resolved
@VedantMahabaleshwarkar VedantMahabaleshwarkar force-pushed the modelmetrics branch 2 times, most recently from e8415b8 to 5d2e8fd Compare July 3, 2023 20:05
@VedantMahabaleshwarkar
Copy link
Contributor Author

@njhill addressed most of your comments and avoided passing around the boolean isVModel. (@danielezonca helped :) )
The only thing we didn't quite get was your suggestion to use FastThreadLocal to store the vModelId. It seemed redundant given that we are already adding it to the ThreadContext? Other than that every suggestion has been implemented.

Other than that @danielezonca also had a small suggestion to change the VModelManager Class. Let me know if it should be kept for a separate PR.

Copy link
Member

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can't speak to the the use of FastThreadLocal, in terms of usability, I deployed the changes myself and was able to filter metrics by model using the new parameter modelId. I'd also recommend saving those changes to the VModelManager class for a separate PR if it's not directly required for this one.

FYI @ckadner

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
Copy link
Member

@ckadner ckadner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more nit-picks. Beyond those it looks like there are a few more unresolved conversations, beyond the "FastThreadLocal" topic.

src/main/java/com/ibm/watson/modelmesh/Metrics.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VedantMahabaleshwarkar @danielezonca and apologies for the delay, I was out last week and still catching up.

The only thing we didn't quite get was your suggestion to use FastThreadLocal to store the vModelId. It seemed redundant given that we are already adding it to the ThreadContext?

I was referring to the resolved modelId here, not the vModelId. And it's just for the purpose of "passing it out" of this method so that it can be included in the metrics recorded in the calling function in addition to the vModelId.

@ckadner
Copy link
Member

ckadner commented Jul 25, 2023

Hi Vedant, since you force-pushed the last commit, I cannot find where/whether Nicks suggestion made it into your last code changes. Are the remaining 7 unresolved conversations actually resolved? Also, you will need to update your branch once more :-)

ScrapCodes and others added 5 commits July 25, 2023 16:28
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
njhill and others added 3 commits August 20, 2023 08:00
Use ThreadLocal to store resolved modelId in vModel case
Revert/simplify a few things

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Update per-model metric changes
@VedantMahabaleshwarkar
Copy link
Contributor Author

@njhill pulled in your changes into this PR

@ckadner ckadner modified the milestones: v0.11.0, v0.11.1 Aug 29, 2023
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VedantMahabaleshwarkar for all of your patience with this!

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhill, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ckadner ckadner merged commit df5e6a5 into kserve:main Sep 6, 2023
3 checks passed
VedantMahabaleshwarkar added a commit to VedantMahabaleshwarkar/modelmesh that referenced this pull request Oct 11, 2023
- Add `modelId` parameter to `logTimingMetricDuration` function in `Metrics.java`:
  - `modelmesh_cache_miss_milliseconds`
  - `modelmesh_loadmodel_milliseconds`
  - `modelmesh_unloadmodel_milliseconds`
  - `modelmesh_req_queue_delay_milliseconds`
  - `modelmesh_model_sizing_milliseconds`
  - `modelmesh_age_at_eviction_milliseconds`
- Add `modelId` parameter to `logSizeEventMetric` function in `Metrics.java`:
  - `modelmesh_loading_queue_delay_milliseconds`
  - `modelmesh_loaded_model_size_bytes`
- Add `modelId` and `vModelId` param to `logRequestMetrics` in `Metrics.java`:
  - `modelmesh_invoke_model_milliseconds`
  - `modelmesh_api_request_milliseconds`

Closes opendatahub-io#60

Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Co-authored-by: Daniele Zonca <dzonca@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
njhill added a commit that referenced this pull request Nov 16, 2023
Motivation

#90 introduced support for per-model prometheus metrics but the intention was not to change the default behaviour and have this as something enabled explicitly via configuration.

However, it was inadvertently made the default.

Modifications

Change default behaviour to not include modelId/vModelId prometheus metric labels. This is important because model-mesh was designed primarily for use cases where there is a very large and changing number of individual models and those scenarios would result in a much greater number of individual metrics than prometheus can handle.

Result

Original behaviour restored

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
njhill added a commit that referenced this pull request Nov 16, 2023
Functionality added in #90 

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
rafvasq pushed a commit that referenced this pull request Nov 20, 2023
Functionality added in #90 

---------

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
njhill added a commit that referenced this pull request Nov 21, 2023
Motivation

#90 introduced support for per-model prometheus metrics but the intention was not to change the default behaviour and have this as something enabled explicitly via configuration.

However, it was inadvertently made the default.

Modifications

Change default behaviour to not include modelId/vModelId prometheus metric labels. This is important because model-mesh was designed primarily for use cases where there is a very large and changing number of individual models and those scenarios would result in a much greater number of individual metrics than prometheus can handle.

Result

Original behaviour restored

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
ckadner pushed a commit that referenced this pull request Nov 22, 2023
PR #90 introduced support for per-model prometheus metrics with the
intention to not change the default behavior but require this as a feature
to be enabled explicitly via configuration. However, it was inadvertently
made the default.

This commit restores the original behavior by changing the default configuration
to not include modelId/vModelId prometheus metric labels because model-mesh
was designed primarily for use cases where there is a very large and changing
number of individual models and those scenarios would result in a much greater
number of individual metrics than prometheus can handle.

------

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support individual model metrics
7 participants