feat: support for OkHttp thread metrics #1022

ianbotsf · 2024-01-04T23:49:31Z

Issue #

Description of changes

This change adds metrics for the OkHttp engine's thread pool. It should be considered WIP until the following open questions are addressed:

How can we best test this in an automated fashion?
This change only adds support for OkHttp. CRT doesn't seem to have support for interrogating metrics about HTTP client thread counts. Is that good enough for now or should we push for enhanced support in CRT?
The additions to HttpClientMetrics eschew the existing pattern of pushing metrics to intermediate refs and the polling them from the async counter in favor of pulling the source data directly via a passed callback. This seems cleaner to me since it removes unnecessary state but I'm curious whether others think it's a good idea (and whether we'd potentially want to extend it to other async counters as appropriate).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lauzadis · 2024-01-08T20:19:54Z

...l/http-client/common/src/aws/smithy/kotlin/runtime/http/engine/internal/HttpClientMetrics.kt

+                "smithy.client.http.threads.count",
+                { measurement -> recordThreadState(measurement, callback) },
+                "{thread}",
+                "Current state of HTTP engine threads (active, running)",


How are "running" threads different from "active" ones? I see only "total" and "active" are recorded.

You're right, "active" and "running" are not the correct terms here. I can change those to "total" and "active".

lauzadis · 2024-01-08T20:29:13Z

I think unit tests for the metrics being emitted would be a good start. I'm not sure, is it easy to change the number of active threads in the OkHttp client, or is it more an internal implementation detail in the engine? If the latter, I think it would be sufficient to just check that the metrics are being emitted.
It would be nice to have both engines' metrics aligned but it can probably be solved on the longer term
I think it's a good idea

ianbotsf · 2024-01-08T22:33:15Z

I think unit tests for the metrics being emitted would be a good start. I'm not sure, is it easy to change the number of active threads in the OkHttp client, or is it more an internal implementation detail in the engine? If the latter, I think it would be sufficient to just check that the metrics are being emitted.

To be clear, you mean verify that a metric value exists but not what it is? Because this is multi-threaded behavior and we don't control the implementation of the underlying thread pool, it'll be difficult to achieve a stable expectation for number of total/active threads in a test case.

lauzadis · 2024-01-08T22:36:19Z

I think unit tests for the metrics being emitted would be a good start. I'm not sure, is it easy to change the number of active threads in the OkHttp client, or is it more an internal implementation detail in the engine? If the latter, I think it would be sufficient to just check that the metrics are being emitted.

To be clear, you mean verify that a metric value exists but not what it is? Because this is multi-threaded behavior and we don't control the implementation of the underlying thread pool, it'll be difficult to achieve a stable expectation for number of total/active threads in a test case.

Yes, just testing that a metric value exists if we don't have predictable behavior to test on.

aajtodd

How can we best test this in an automated fashion?
I'm not sure we can in every scenario. I had to do a lot of testing locally with an observability stack setup to actually view the metrics working. Let me know if you discover something better.

This change only adds support for OkHttp. CRT doesn't seem to have support for interrogating metrics about HTTP client thread counts. Is that good enough for now or should we push for enhanced support in CRT?

I'd say do what we can for now (although see my comment about getting this in SRA first before finalizing it).

The additions to HttpClientMetrics eschew the existing pattern of pushing metrics to intermediate refs and the polling them from the async counter in favor of pulling the source data directly via a passed callback. This seems cleaner to me since it removes unnecessary state but I'm curious whether others think it's a good idea (and whether we'd potentially want to extend it to other async counters as appropriate).

The reason it's structured this way is so that engines can update the values however they see fit (whether that be inc/dec or absolute value changes). See my other comment here but I don't think I see a reason to special case this new metric and would likely prefer to keep them consistent.

aajtodd · 2024-01-09T19:17:42Z

...l/http-client/common/src/aws/smithy/kotlin/runtime/http/engine/internal/HttpClientMetrics.kt

@@ -39,6 +44,7 @@ public object HttpClientMetricAttributes {
 public class HttpClientMetrics(
    scope: String,
    public val provider: TelemetryProvider,
+    threadStateCallback: (() -> ThreadState?)? = null,


This seems different for no apparent reason. It's not wrong I guess but feels odd that we would special case it since each engine can adapt however they want to the current way metrics are exposed.

Yes I concede it's very different from what we have elsewhere and feels very specialized right now.

I think one problem is that we don't directly control the thread pool. We don't decide when new threads are created or when they swap from idle to active. All we can do is inspect the state at any given moment to know the current value. That sounds like what an AsyncUpDownCounter is intended to do via a callback.

We could start a background thread in the OkHttp engine to periodically inspect the thread pool and stuff the idle/active counts into HttpClientMetrics, which would then be read when the callback is run. But this means the value will be out of sync by whatever interval the background thread uses. If, for instance, the background thread runs every 10 seconds but the meter polls every 5 seconds, then up to two datapoints will be reported inaccurately.

Is your greater concern that this change exposes the concept of callbacks outside of HttpClientMetrics and inverts the collection model from a push to a pull? Is it the inconsistency of some metrics being exposed outside of HttpClientMetrics as callbacks while others are mutable properties inside? Or is it that this may not be flexible enough to adapt to other engines where thread updates are more deterministic?

aajtodd · 2024-01-09T19:27:32Z

...l/http-client/common/src/aws/smithy/kotlin/runtime/http/engine/internal/HttpClientMetrics.kt

@@ -25,8 +25,13 @@ public object HttpClientMetricAttributes {
    public val AcquiredConnection: Attributes = attributesOf { "state" to "acquired" }
    public val QueuedRequest: Attributes = attributesOf { "state" to "queued" }
    public val InFlightRequest: Attributes = attributesOf { "state" to "in-flight" }
+    public val TotalThread: Attributes = attributesOf { "state" to "total" }


correctness: The state attribute for this should probably follow the usage convention where the sub states total to some limit.
e.g. thread state usage metric smithy.client.http.threads.usage with states of idle | consumed and a separate metric for the limit smithy.client.http.threads.limit.

Also I suppose we should probably get this defined in the SRA before codifying it in an implementation.

We may not know the total thread limit in every case or thread pools may be dynamic and spin down threads which would break the "should" part of the convention but I think that's fine. In the ideal case the convention applies and even when it doesn't it's still how you'd want it graphed.

Sure, I'll change this to idle/active threads.

sonarqubecloud · 2024-01-12T17:27:25Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

ianbotsf added 2 commits January 3, 2024 22:45

(wip) support for OkHttp thread count

3d71280

bumping to released aws-crt-kotlin version

ea6d6d1

lauzadis reviewed Jan 8, 2024

View reviewed changes

aajtodd suggested changes Jan 9, 2024

View reviewed changes

Switch to idle/active threads as a dimension

6c8ce8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support for OkHttp thread metrics #1022

feat: support for OkHttp thread metrics #1022

ianbotsf commented Jan 4, 2024

lauzadis Jan 8, 2024

ianbotsf Jan 8, 2024

lauzadis commented Jan 8, 2024

ianbotsf commented Jan 8, 2024

lauzadis commented Jan 8, 2024

aajtodd left a comment

aajtodd Jan 9, 2024

ianbotsf Jan 10, 2024

aajtodd Jan 9, 2024

ianbotsf Jan 10, 2024

sonarqubecloud bot commented Jan 12, 2024

feat: support for OkHttp thread metrics #1022

Are you sure you want to change the base?

feat: support for OkHttp thread metrics #1022

Conversation

ianbotsf commented Jan 4, 2024

Issue #

Description of changes

lauzadis Jan 8, 2024

Choose a reason for hiding this comment

ianbotsf Jan 8, 2024

Choose a reason for hiding this comment

lauzadis commented Jan 8, 2024

ianbotsf commented Jan 8, 2024

lauzadis commented Jan 8, 2024

aajtodd left a comment

Choose a reason for hiding this comment

aajtodd Jan 9, 2024

Choose a reason for hiding this comment

ianbotsf Jan 10, 2024

Choose a reason for hiding this comment

aajtodd Jan 9, 2024

Choose a reason for hiding this comment

ianbotsf Jan 10, 2024

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 12, 2024

Quality Gate passed