Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add :telemetry_prefix option to Tesla.Middleware.Telemetry #390

Closed
wants to merge 3 commits into from

Conversation

keatz55
Copy link

@keatz55 keatz55 commented Jun 12, 2020

This PR gives Tesla HTTP client users the ability to configure a :telemetry_prefix option at the Tesla.Middleware.Telemetry module or config.ex level. This is useful for developers creating platform-specific sdks or api clients that want Telemetry events to align more closely with their platform or app.

@keatz55 keatz55 changed the title Add :telemetry_prefix option Add :telemetry_prefix option to Tesla.Middleware.Telemetry Jun 12, 2020
You can configure `:telemetry_prefix` globally in your config, but if set the `:telemetry_prefix` option will override:

```
config :tesla, Tesla.Middleware.Telemetry, telemetry_prefix: [:custom, :prefix]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea - it would override this all clients from all dependencies.

Since the use case comes from building API clients as packages I think it would be better to hide tesla from end user and use this instead.

# config/config.exs
config :myclient, telemetry_prefix: [...]

# myclient.ex

defmodule Myclient do
  use Tesla
  @compile_config Application.compile_env(:myclient, :telemetry_prefix, [])
  plug Tesla.Middleware.Telemetry, prefix: Keyword.get(@compile_config, :prefix, [:myclient])
  # ...
end

# or top provide runtime configuration

defmodule Myclient do
  def new do
    config = Application.get_env(:myclient, :telemetry_prefix, [])
    middleware = [
      # ...
      {Tesla.Middleware.Telemetry, prefix: Keyword.get(@compile_config, :prefix, [:myclient])}
    ]

    Tesla.client(middleware, adapter)
  end
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I kept going back and forth on globalizing. Will remove and update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated. Let me know if that works.

@@ -18,6 +18,34 @@ if Code.ensure_loaded?(:telemetry) do
end)
```

## Options
- `:telemetry_prefix` - replaces default `[:tesla]` with desired Telemetry event prefix (see below)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this option is passed directly to telemetry middleware I don't think we need to include the telemetry_ prefix in the prefix name (🥁)

I'd go with just :prefix .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated. Let me know if that works.

@teamon
Copy link
Member

teamon commented Jun 15, 2020

@keatz55 Thanks for the PR! I've added a few comments on the diff.

@bryannaegele Would you mind taking a look? :)

@teamon teamon linked an issue Jun 15, 2020 that may be closed by this pull request
@aselder
Copy link

aselder commented Jun 15, 2020

❤️

@bryannaegele
Copy link
Contributor

Sure, @teamon !

@keatz55 can you elaborate on what your use case is and what you're looking to accomplish? The telemetry team and Observability WG have established telemetry prefixes in libraries as an anti-pattern since the primary use case of distinguishing different usage contexts of a library can be accomplished through event metadata. We'd love to walk that pattern back, but we're stuck with it in a couple of places. :/

I'm happy to provide suggestions or guidance to help accomplish your goal, but I do not believe this PR should be merged. I would suggest opening an issue instead and find a solution for you that could possibly be documented better. We did have a use case at WM that was somewhat related but is solvable through different middleware.

@aselder
Copy link

aselder commented Jun 15, 2020

@bryannaegele At least in our case, the use case is to be able to track metrics from connections to different services seperately. In our app, we have tesla clients connection to at least three external services and currently the telemetry metrics that come in have no way of distinguishing between which service was being called.

I had roughed out an alternative solution here: #387. Do you think that approach works better?

@bryannaegele
Copy link
Contributor

bryannaegele commented Jun 15, 2020

@aselder yeah! So we had a similar thing internally where they wanted to use something other than the module name in the env to distinguish the target service being called. The solution @moxley came up with was so use a separate middleware that added metadata to the opts and then the metric used that as a label.

Example:

# middleware
defmodule MyApp.Tesla.Middleware.Metadata do
  def call(env, next, opts) do
    env_opts = Keyword.merge(env.opts, opts)
    env = %{env | opts: env_opts}
    Tesla.run(env, next)
  end
end

# calling
middleware = [
  {MyApp.Tesla.Middleware.Metadata, service: "target_service"},
  Tesla.Middleware.Telemetry
]

# metric
counter("tesla_request_total",
  event_name: [:tesla, :request, :stop],
  description: "Tesla http request count",
  tags: [:module, :method, :status],
  tag_values: fn %{env: env} ->
    # Get the service name given to the metadata middleware keyword list
    target_service = env.opts[:service]
    %{
      method: env.method,
      status: env.status,
      target_service: target_service
    }
  end)

So there's certainly a pattern that could possibly be formally established as additional middleware rather than altering the Tesla client code or adding prefixes to telemetry events. The added bonus with going the extra middleware route is that it makes that metadata available for other use cases, such as the logger middleware.

I encouraged @moxley to formalize this pattern into a generic metadata middleware that could go in here. Would anyone be interested in taking up a proposal for that and example use cases with the telemetry and logger middlewares? @teamon any thoughts?

Note that it would probably be better to house this under a metadata key in the opts rather than blending them together.

@aselder
Copy link

aselder commented Jun 15, 2020

@bryannaegele I like that approach. It's close to what I had done, just a much cleaner way of injecting it.

I'd like to propose however that a metadata field be added to the Env, instead of using opts. From the name of the middleware, it's not clear that it will be storing passed information in the opts map. I could see someone accidentally overwriting a critical option with out realizing they were doing so. Thoughts?

@keatz55
Copy link
Author

keatz55 commented Jun 15, 2020

Thanks for the response @bryannaegele,

@keatz55 can you elaborate on what your use case is and what you're looking to accomplish?

We have the same internal case mentioned above by @aselder with the addition of the following cases:

  • we are building an elixir-specific sdk and a :prefix would allow sdk users to group all sdk-specific events together, as opposed to having sdk-specific events with the addition of tesla-specific dependency events.
  • having a configurable :prefix also encourages looser coupling, which would more easily allow users to keep the same telemetry event names independent of the http client library they use.

As a side note, when I submitted the above PR I was using a similar approach to the ecto telemetry implementation where you have a specific repo and telemetry events are based on the repo module name:
https://hexdocs.pm/ecto/Ecto.Repo.html#module-telemetry-events

@teamon
Copy link
Member

teamon commented Jun 15, 2020

I'm not sure if you noticed, but there is also built-in Tesla.Middleware.Opts

Before telemetry was a thing I've built an internal integration for tesla & prometheus. From observability perspective it makes sense to treat prefix as metric type/name and use labels/tags to distinguish source/target. For all the different clients I'm using the same middleware with the same metric name tesla_request_*, add source label taken from app config and target label based on hostname. This works well for internal services as all hostnames are in the same format some-app.tld. Having a single metric allows further manipulation like summing or joining into golden signals.

One of the design principles of tesla was to let users choose their http adapters. With this in mind, if you are creating an API wrapper library you could allow users to use a different adapter. With telemetry as an abstraction/integration layer it make sense to use the same event handler for multiple clients as the metrics represent the same thing, and if one wants to customise the handler for specific client it can be done by looking into tags.

As for the logger - the current one is far form perfect but I have yet to find time to propose a better solution.

tl;dr - I'm not sure, the simpler and more practical the better

@aselder
Copy link

aselder commented Jun 15, 2020

@teamon Nice call... Didn't even realize Tesla.Middleware.Opts was there.

@bryannaegele
Copy link
Contributor

bryannaegele commented Jun 15, 2020

I was using a similar approach to the ecto telemetry implementation where you have a specific repo and telemetry events are based on the repo module name:

Yes, unfortunately that is the anti-pattern I was denoting. We can't remove it, so we're stuck with it in that library, but we highly discourage this practice going forward.

we are building an elixir-specific sdk and a :prefix would allow sdk users to group all sdk-specific events together, as opposed to having sdk-specific events with the addition of tesla-specific dependency events.

In this case, we'd suggest using your own events and wrapping the calls. In a sense, the use of prefixes was a way to "hijack" a library's events. This also corners you and your users into using Tesla since those events would always be a part of your events. The usage of Tesla is an implementation concern of your SDK, so bleeding that out can be problematic.

having a configurable :prefix also encourages looser coupling, which would more easily allow users to keep the same telemetry event names independent of the http client library they use.

In practice this is the inverse since :tesla must be a part of the event name since that is the library. In a trace, we want to know which module/library was called versus a generic name.

An additional issue is that with prefixes our potential solutions for event discovery for tracing won't work. Our solutions for exposing traceable spans via telemetry events is highly dependent on static definition of events and that they be consistent for all uses of the library. Prefixes require the use of macros and still don't solve the problem of distinguishing individual requests from a particular client.

If you'd like to read more about how some of these conclusions were reached, especially around events in libraries, you can check out elixir-plug/plug_cowboy/pull/25, elixir-plug/plug_cowboy/pull/31, and #347

Edit: clarifying point on the plug_cowboy examples which may be confusing - the adapters are all a part of that overarching library, similar to the adapters in Tesla. We can control the shape of the metadata in all of these cases. It should not be construed to apply to having a generic [:http, :request] event that all http clients in the ecosystem conform to.

Edit #2: Additional discussion on prefixes beam-telemetry/telemetry_metrics/issues/55

Edit #3: Agenda item and resolution can be found in our meeting minutes for April 9 https://docs.google.com/document/d/18JVh6ICLyRCJBpRVIXwcR1fb-K4qRX3WHYgtM7mFx2U/edit

@aselder
Copy link

aselder commented Jun 15, 2020

One change I'd like to ask as part of this is to pass the env along for the exception event. If we're relying on opts to carry the meta data to identify the request, we're blind to it in the exception case.

@keatz55
Copy link
Author

keatz55 commented Jun 16, 2020

@bryannaegele Thank you for taking the time for such thorough response and for including sources. It is much appreciated. Under the additional context that you have just given, would it make sense to remove the :tesla prefix? E.g., [:tesla, :request, :stop] -> [:request, :stop]

@bryannaegele
Copy link
Contributor

@keatz55 no problem. We still want tesla in this case because we're emitting an event from that library. If it were left to just [:request, :stop] then a listener on that event would have no idea what that correlates to.

Say we use that pattern in multiple libraries, such as http clients or servers, such as Phoenix. Was it an http request stopping? And was it Phoenix? Or was it an http client? We quickly find ourselves in a precarious situation because in each of these cases, the measurements and metadata being emitted are different causing things like Metrics to no longer work. So we have to distinguish which library it came from, which is the reasoning for libraries to have a root prefix for all of its events.

I apologize for all of the confusion. A lot of people have put out some helpful information but we need to get official guides together.

Two other approaches to this particular issue should be put out there, one of which could be helpful for some use cases presented here.

  1. Allow for user-injected metadata in this middleware, thus negating the need to add the opts middleware. This would be static metadata for all emitted events. I can definitely get behind this!
  2. You can always ship a custom middleware if nothing here fits the bill. This middleware should try to cover most use cases while still adhering to guidelines and patterns the observability team are trying to propagate to create an overall good story.

@keatz55
Copy link
Author

keatz55 commented Jun 16, 2020

@bryannaegele Thanks again. Everything you are saying makes perfect sense. One thing I still don't quite understand is why in the following sample in elixir-plug/plug_cowboy/pull/31 they don't use the :plug_cowboy lib name and instead use :plug_adapter. Under that context, would tesla events be structured [:tesla, :request, :stop] -> [:http_client, :request, :stop]? Am I misunderstanding?

:telemetry.execute(
      [:plug_adapter, :call, :start],
      %{system_time: System.system_time()},
      %{adapter: @adapter, conn: conn, plug: plug}
    )

@bryannaegele
Copy link
Contributor

@keatz55 That's a great callout. I don't know that we put a lot more thought into it beyond it being a plug_adapter and that there is central control of all of those adapters by the same folks so coordination of naming, measurements, and metadata is straightforward across every adapter. It's easy for us to say "this is how it's all structured and here's how we identify the adapter". That isn't necessarily true for everything else out there, such as http clients, although that would be very nice.

The main theme is consistency across the entire ecosystem, so we should provide docs with guidance. These patterns, conventions, and libraries are born out of trial and error, which has been rough for the community, but is necessary to determine the best overall solutions.

How do you feel about the ability to pass static metadata as an option? Do you want to take a crack at that?

@teamon
Copy link
Member

teamon commented Jun 16, 2020

Sidenote, just to highlight the customisation options in tesla:

# 1. Pass static opts
defmodule MyStaticClient do
  use Tesla
  
  plug Tesla.Middleware.Telemetry, static: opts
end

# 2. Pass opts on client creation
defmodule MyDynamicClient do
  def new(opts) do
    middleware = [
      # ...
      {Tesla.Middleware.Telemetry, [dynamic: opts]}
    ]
  end
end

# 3. Wrap with custom middleware and modify on each request
defmodule MyTelemetryMiddleware do
  def call(env, next, opts) do
    {new_env, new_opts} = do_magic_with(env, opts)
    Tesla.Middleware.Telemetry.call(new_env, next, new_opts)
  end
end

@keatz55
Copy link
Author

keatz55 commented Jun 17, 2020

@bryannaegele Sounds good. I'll take a crack at it tonight post-work.

@teamon
Copy link
Member

teamon commented Jul 22, 2020

👋 @keatz55 How's going? :)

@keatz55
Copy link
Author

keatz55 commented Jul 22, 2020

@teamon I apologize. Work has been eating me alive and I completely spaced this. I'll take a crack at it this weekend.

@teamon
Copy link
Member

teamon commented Jul 24, 2020

@bryannaegele Is the convention described in https://keathley.io/blog/telemetry-conventions.html blessed by beam-telemetry community? :)

@bryannaegele
Copy link
Contributor

@teamon generally, yes. I was supposed to review it prior to publication but he pretty much nailed it.

  1. Prefixes are an anti-pattern.
  2. Make metadata configurable for your users in some way if possible.
  3. And expose everything to let the user decide what's important.

@teamon
Copy link
Member

teamon commented Sep 10, 2020

Let me know what do you think of #419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Allow client meta data that can be passed to Telemetry
4 participants