Telemetry support #1529

florimondmanca · 2020-09-06T15:26:19Z

florimondmanca
Sep 6, 2020
Collaborator

Prompted by discussions on Gitter, and somewhat related to #790.

"Telemetry" refers to the possibility of extracting metrics from a running program that uses HTTPX, in order to get insights on performed requests (what they are, how many of them, to which endpoints, etc).

So…

What do other HTTP clients do?

Requests

Requests doesn't have any telemetry built-in, nor any API specifically targeted at telemetry/tracing. (Hooks don't seem to be enough, since there's only a response hook.)

For this reason it seems most instrumentation projects like OpenTelemetry and Datadog APM take a monkeypatching approach.

For example, OpenTelemetry monkeypatches the high-level API (and nothing else, which means session requests aren't traced):

import requests
import opentelemetry.instrumentation.requests

# You can optionally pass a custom TracerProvider to
RequestInstrumentor.instrument()
opentelemetry.instrumentation.requests.RequestsInstrumentor().instrument()
response = requests.get(url="https://www.example.org/")

And Datadog APM monkeypatches Session.send() (which also has limitations because then custom/subclassed sessions aren't monkeypatched by default):

from ddtrace import patch
patch(requests=True)

import requests
requests.get("https://www.datadoghq.com")

`aiohttp`

aiohttp has a Client Tracing API, which is basically "event hooks on steroids". The Tracing Reference lists 15 different callbacks, most of them related to HTTP networking rather than client functionality.

OpenTelemetry integrates with this feature by exporting a create_trace_config() function, as well as a helper for preparing span names:

import aiohttp
from opentelemetry.instrumentation.aiohttp_client import create_trace_config, url_path_span_name
import yarl

def strip_query_params(url: yarl.URL) -> str:
    return str(url.with_query(None))

open_telemetry_trace_config = create_trace_config(
        # Remove all query params from the URL attribute on the span.
        url_filter=strip_query_params,
        # Use the URL's path as the span name.
        span_name=url_path_span_name
)

async with aiohttp.ClientSession(trace_configs=[open_telemetry_trace_config]) as session:
    ...

Datadog APM doesn't have support for aiohttp yet (but it does support OpenTelemetry in general, for the above setup + the OpenTelemetry Datadog exporter would do the trick).

OpenTelemetry

As mentioned by @agronholm on Gitter, the current trend seems to shift library integration support to OpenTelemetry, a CNCF project for collecting and exposing metrics from applications. The opentelemetry-python client has support for a variety of exporters (Prometheus (an open standard for metric ingestion as well), Datadog, OpenCensus…), as well as support for instrumentating various libraries. (There's even instrumentation for Starlette and ASGI — how interesting! I need to add this to awesome-asgi…)

Thanks to the exporter mechanism, vendors can integrate with OpenTelemetry, removing the need for building and maintaining vendor-specific integrations.

From my perspective, we'll most likely want HTTPX to be supported by OpenTelemetry, since by transitivity this will give us the widest vendor support possible.

(Side note: I checked, and OpenTelemetry seems to support asyncio fine enough via its RuntimeContext API (although it's sync on the surface) - see open-telemetry/opentelemetry-python#395. Doesn't seem like there's any support for trio or curio, though.)

How?

Update 2020-11-23: This is outdated — we'll most likely want to build on top of the Transport API, see #1264 (comment).

A minimal HTTPX-OpenTelemetry integration could maybe (see questions/blockers below) be built on the upcoming Event Hooks functionality (#790, #1215).

Taking inspiration from the aiohttp OpenTelemetry implementation, I ended up with a proof-of-concept implementation in this gist:

https://gist.github.com/florimondmanca/37ffea4dc4aac0bad8f30534783b0b3d

Would be used like so:

from opentelemetry.instrumentation import httpx_client as httpx_opentelemetry

opentelemetry_on_request, opentelemetry_on_response = httpx_opentelemetry.create_hooks()

event_hooks = {
    "request": [opentelemetry_on_request],
    "response": [opentelemetry_on_response]
}

with httpx.Client(event_hooks=event_hooks) as client:
    ...

There are a couple of questions/blockers, though:

We need to persist contextual information from the request hook to the response hook. Right now the gist does this naively by storing an attribute on the request, but this would break down completely if any component in the call stack swaps the request for another (eg authentication or redirects).
Do we need a request_exception hook? The aiohttp integration uses such a hook to store an error status code based on exception information.

Overall, it feels like implementing this on top of a middleware-like API (rather than callback-based event hooks) would allow addressing these issues while providing an elegant implementation:

class OpenTelemetryInterceptor:
    def __init__(self):
        self._tracer = ...

    def __call__(self, request, send):
        with self._tracer.start_as_current_span(...) as span:
            ...  # Store request information on `span`
            try:
                response = send(request)
            except httpx.RequestError:
                ...  # Process exception
                raise
            else:
                ...  # Store response information on `span`.
                return response

Such an "interceptor" API (as described in #790 (comment)) would also make the usage simpler, since we don't need to unpack a 2-tuple of callables and put them in the correct dict entries…

from opentelemetry.instrumentation.httpx import OpenTelemetryInterceptor

with httpx.Client(interceptors=[OpenTelemetryInterceptor()]) as client:
    ...

tomchristie · 2020-09-15T11:03:14Z

tomchristie
Sep 15, 2020
Maintainer

We need to persist contextual information from the request hook to the response hook. Right now the gist does this naively by storing an attribute on the request, but this would break down completely if any component in the call stack swaps the request for another (eg authentication or redirects).

That's okay. response.request is the original request. Auth or redirects create additional request instances that are part of the history.

Do we need a request_exception hook? The aiohttp integration uses such a hook to store an error status code based on exception information.

Possibly, yes.

One other aspect of this is that the transport API does give us the option of middleware like approaches as well as the event hooks. Particularly once we have an httpx.HTTPTransport() for instantiating the transport with httpx configuration, rather than low-level httpcore config. (See #1274)

Eg, we're able to support this sort of thing...

class TelemetryTransport:
    def __init__(self, *args, **kwargs):
        self.transport = httpx.HTTPTransport(**kwargs)

    def request(self, *args, **kwargs):
        # before request
        try:
            return self.transport.request(*args, **kwargs)
        finally:
            # after response

Plenty of details to dig into there, but point being that we both have event hooks, and have a way of jumping directly into the control flow. (We'll be able to innovate with more fancy stuff with transport layers too, such as build out a MiddlewareTransport, that does provide a request/response interface. Particularly once we have tightened up the request/response APIs to make them easier to integrate at the transport level.)

0 replies

florimondmanca · 2020-11-23T09:22:21Z

florimondmanca
Nov 23, 2020
Collaborator Author

Yeah, I'm digging the use of the transport API for middleware here now. :-)

We'd be able to point users at something like this:

import httpx
from opentelemetry.instrumentation.httpx import OpenTelemetryTransport

transport = httpx.HTTPTransport()
transport = OpenTelemetryTransport(transport, ...)

with httpx.Client(transport=transport) as client:
    ...

Pretty and simple.

OpenTelemetryTransport would look like this:

import httpcore


class OpenTelemetryTransport(httpcore.SyncHTTPTransport):
    def __init__(self, transport: httpcore.SyncHTTPTransport):
        self._transport = transport
        self._tracer = ...

    def request(self, *args, **kwargs):
        with self._tracer.start_as_current_span(...) as span:
            ...  # Store request information on `span`
            try:
                response = self._transport.request(*args, **kwargs)
            except httpcore.RequestError:
                ...  # Process exception
                raise
            else:
                ...  # Store response information on `span`.
                return response


class AsyncOpenTelemetryTransport(httpcore.AsyncHTTPTransport):
    def __init__(self, transport: httpcore.AsyncHTTPTransport):
        self._transport = transport
        self._tracer = ...

    async def arequest(self, *args, **kwargs):
        with self._tracer.start_as_current_span(...) as span:
            ...  # Store request information on `span`
            try:
                response = await self._transport.arequest(*args, **kwargs)
            except httpcore.RequestError:
                ...  # Process exception
                raise
            else:
                ...  # Store response information on `span`.
                return response

(I'm also on the fence wrt encouraging exposing dual transports that implement both sync and async APIs, or sticking to two separate transports. In this case there's no silver bullet — the code has to be written twice in any case. Since we expose the switch in httpcore then I'd tend to stick to the sync/async differentiation, like here.)

With context-managed requests (encode/httpcore#206) this would look like this:

class OpenTelemetryTransport(httpcore.SyncHTTPTransport):
    def __init__(self, transport: httpcore.SyncHTTPTransport):
        self._transport = transport
        self._tracer = ...

    def request(self, *args, **kwargs):
        with self._tracer.start_as_current_span(...) as span:
            ...  # Store request information on `span`
            try:
                with self._transport.request(*args, **kwargs) as response:
                    ...  # Store response information on `span`.
                    yield response
            except httpcore.RequestError:
                ...  # Process exception
                raise

That's a very promising way forward because it would be very interoperable. It would make it possible to use the OpenTelemetryTransport more broadly bc it would work with any transport: URLLib3Transport, httpx.MockTransport, or any other custom transport users might have…

(This begs the question of whether this should be marked under opentelemetry.instrumentation.httpx, or rather, opentelemetry.instrumentation.httpcore. Latter is more correct and highlights the idea of interoperability, the former is easier to discover for regular HTTPX users.)

0 replies

florimondmanca · 2020-11-23T09:40:00Z

florimondmanca
Nov 23, 2020
Collaborator Author

Marking this as external since if we go the OpenTelemetry route, then the implementation ought to live in the opentelemetry-python project. Also means a contributor could just hop in and start something there. From my POV we won't have to change anything in HTTPX or HTTPCore to support this, other than figuring out any blockers due to the Transport API.

0 replies

tomchristie · 2020-11-24T11:18:59Z

tomchristie
Nov 24, 2020
Maintainer

Given the landscape as it currently is, I'd suggest opentelemetry.instrumentation.httpx.

Perhaps at some point there might feasibly be a range of clients it'll be compatible with, but it's the most user-obvious thing at the moment.

0 replies

srikanthccv · 2021-01-03T18:41:37Z

srikanthccv
Jan 3, 2021

@florimondmanca @tomchristie There are few people asking for the httpx support on contrib repo and on gitter. I am thinking of working on that and came here for prior discussion. It is really nice to see that this has been brought up and discussed.

Most of the open-telemetry instrumentation libraries do monkey patching to collect the telemetry data, But If the instrumented library has any built in monitoring support then that is extended to enable telemetry. @florimondmanca has already pointed out one example. Another one similar to it is pymongo it has the monitoring tool and opentelemetry-instrumentation-pymongo uses it.

OpenTelemetryTransport looks neat. I have few questions around it.

How to handle in the below usage? It doesn't take any transport and not traced?.
```
>>> import httpx
>>> r = httpx.get('https://www.example.org/')
>>> r
```
Is there a way to configure a global transport once? Or does user have to pass the transport everywhere?
This begs the question of whether this should be marked under opentelemetry.instrumentation.httpx, or rather, opentelemetry.instrumentation.httpcore.
Are there any other http clients that rely on httpcore? If so, we could have it separate and make httpx instrumentation depend on it. For example all the asgi server instrumentation libraries have dependency on asgi instrumentation to collect the telemetry based on the asgi spec.

0 replies

cdvv7788 · 2021-02-25T17:46:53Z

cdvv7788
Feb 25, 2021

Anyone looking at this? Currently considering giving httpx a try, and having this would be a HUGE plus.

0 replies

StephenBrown2 · 2021-02-26T14:44:31Z

StephenBrown2
Feb 26, 2021
Collaborator

@lonewolf3739 The answers for 1. and 2. are the same; use a Client instance:

transport = httpcore.SyncConnectionPool()  # See docs for kwargs, or wait for upcoming `httpx.HTTPTransport()`.
transport = OpenTelemetryTransport(transport)
with httpx.Client(transport=transport) as client:
    r = client.get('https://www.example.org/')

Then you can pass the client around in the context, or like this:

TelemetryClient = httpx.Client(transport=transport)
...
with TelemetryClient as client:
    r = client.get('https://www.example.org/')

Once httpx 0.17.0 gets cut and released, you can use httpx.HTTPTransport() and then the answer to 3 is opentelemetry.instrumentation.httpx, but there could also be an httpcore implementation for wider use than just httpx.

0 replies

florimondmanca · 2021-02-26T15:44:18Z

florimondmanca
Feb 26, 2021
Collaborator Author

Generally the top-level API is meant for very simple cases, or prototyping. I can understand there will be cases when this is all users need in production even, but I'd also insist on the idea that it might be okay to ask users to use a client just to benefit from telemetry. "Use the client" is not an unusual recommendation we make to users anyway for advanced use cases. And it reduces the burden on both opentelemetry and HTTPX authors, so seems like a net positive for all. :-)

0 replies

tomchristie · 2021-05-06T09:57:47Z

tomchristie
May 6, 2021
Maintainer

So, I've got a use-case which is helping illuminate some requirements here.

Working with the HTTPX command line client the --http2 flag currently doesn't make any difference to how the outgoing request is formatted. It "looks" like an HTTP/1.1 request either ways, because at the point at which the "request" hook is called we haven't yet established a connection, and don't know if it'll end up being an HTTP/2 connection or not.

Really there's a whole slew of additional points of information that we'd like to be able to hook into in order to have a CLI that's able to fully reflect the request/response cycle to the user.

Connection info. Eg. "Connecting to www.example.com..." / "...connection established to www.example.com at 123.12.67.89".
HTTP info at the level of h11 and h2 events.

I don't necessarily think we want this to be exposed via our existing event hooks. It's a different, detailed level, that's about debugging inside the network layer. It might be that we can expose it all the way out to the event hooks, but let's not start from that POV. At a basic level this is some functionality that we want to figure out as a Transport API extension.

transport.handle_request(..., extensions={"hooks": ...})

We need to be able to tie sets of callbacks together as each belonging to the same request.

My feeling is that we ought to keep things simple by having the responsibility for sequencing requests be with the caller, rather than having the transport track and increment request IDs. For example, this kind of thing...

def on_callback(message, context):
    ...


transport.handle_request(
    method=b"GET",
    url=(b"https", b"www.example.com", None, b"/"),
    headers=[(b"Host", b"www.example.com")],
    stream=httpx.ByteStream(b""),
    extensions={
        "hooks": {"callback": on_callback, "context": {"request_id": 1}}
    }
)

Design questions:

What do we name the extension? "hooks", "network_callbacks", "debug"?
Doe we use def on_callback(message, context) and keep the message and context separate. Or def on_callback(message) with the context being merged in together with the message at the point of the callback?
We'll need to define the format of message, and what minimum set of messages we want to support initially. I think using the client as a use-case driver is a smart call here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telemetry support #1529

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Telemetry support #1529

florimondmanca Sep 6, 2020 Collaborator

What do other HTTP clients do?

Requests

aiohttp

OpenTelemetry

How?

Replies: 9 comments

tomchristie Sep 15, 2020 Maintainer

florimondmanca Nov 23, 2020 Collaborator Author

florimondmanca Nov 23, 2020 Collaborator Author

tomchristie Nov 24, 2020 Maintainer

srikanthccv Jan 3, 2021

cdvv7788 Feb 25, 2021

StephenBrown2 Feb 26, 2021 Collaborator

florimondmanca Feb 26, 2021 Collaborator Author

tomchristie May 6, 2021 Maintainer

florimondmanca
Sep 6, 2020
Collaborator

`aiohttp`

tomchristie
Sep 15, 2020
Maintainer

florimondmanca
Nov 23, 2020
Collaborator Author

florimondmanca
Nov 23, 2020
Collaborator Author

tomchristie
Nov 24, 2020
Maintainer

srikanthccv
Jan 3, 2021

cdvv7788
Feb 25, 2021

StephenBrown2
Feb 26, 2021
Collaborator

florimondmanca
Feb 26, 2021
Collaborator Author

tomchristie
May 6, 2021
Maintainer