Make metrics filterable based upon metadata #55

bryannaegele · 2020-01-14T02:29:35Z

The "prefix or not to prefix" issue seems to boil down to the inability to allow users to conditionally handle an event based upon a predicate other than the event name. The proposal would be to add a filter predicate option which would be used by reporters to conditionally handle the event.

This should allow for simpler APIs for library authors going forward and help some tracing vendors with their needs by discouraging the use of required prefixes or auto-generated prefixes. User-supplied prefixes should still be accepted for power users with tight performance requirements where auto-instrumentation by vendors is less of a concern.

Example

I am read/write splitting to my DB which requires separate Ecto Repos. The latency requirements for my writes are much different than my requirements for my reads, resulting in different bucket sizes for each.

Currently, I must utilize event prefixes provided by Ecto to distinguish between these repos.

distribution("ecto.writer.query.duration.ms",
  buckets: [5, 10, 20, 50, 100, 200, 500, 1000, 1500, 2000, 5000, 10000],
  event_name: [:writer, :repo, :query],
  measurement: :query_time,
  unit: {:native, :millisecond},
  description: "Ecto query duration - Time spent executing the query",
  tags: [:repo]
),
distribution("ecto.reader.query.duration.ms",
  buckets: [5, 10, 15, 25, 35, 50, 100, 200],
  event_name: [:reader, :repo, :query],
  measurement: :query_time,
  unit: {:native, :millisecond},
  description: "Ecto query duration - Time spent executing the query",
  tags: [:repo]
)

If I could instead conditionally handle an event based upon the metadata, the need for a prefix could be removed in most use cases.

distribution("ecto.writer.query.duration.ms",
  buckets: [5, 10, 20, 50, 100, 200, 500, 1000, 1500, 2000, 5000, 10000],
  event_name: [:repo, :query],
  measurement: :query_time,
  unit: {:native, :millisecond},
  filter: fn meta ->
    match?(%{repo: MyApp.Repo}, meta)
  end,
  description: "Ecto query duration - Time spent executing the query",
  tags: [:repo]
),
distribution("ecto.reader.query.duration.ms",
  buckets: [5, 10, 15, 25, 35, 50, 100, 200],
  event_name: [:repo, :query],
  measurement: :query_time,
  unit: {:native, :millisecond},
  filter: &match?(%{repo: MyApp.ReadOnlyRepo}, &1),
  description: "Ecto query duration - Time spent executing the query",
  tags: [:repo]
)

Reporters should honor the predicate when implementing handlers when the filter is a function. To maintain the highest performance, reporters could add a second event handler which processes metrics defined with a predicate to remove an additional function call, e.g. handle_event/4 and handle_filtered_event/4.

/cc @binaryseed @josevalim @arkgil @GregMefford @tsloughter @hauleth

The text was updated successfully, but these errors were encountered:

arkgil · 2020-01-14T21:44:52Z

I like this idea. It gives enough flexibility to support both same or different prefix cases. Also very useful to report only subset of events as metrics.

To maintain the highest performance, reporters could add a second event handler which processes metrics defined with a predicate to remove an additional function call, e.g. handle_event/4 and handle_filtered_event/4.

@bryannaegele I'm not sure I understand this part - can you explain?

hauleth · 2020-01-14T21:51:33Z

Simple question. Is that possible to update metric filter after creation? Maybe something like that would be feasible in some cases. If so, then maybe filter should be a list? Maybe we should take a look on how logger did it:

filters is a list of {filter_fun(), Opts :: term()}
there is default_action on each handler which is either log or stop which will be action if none of the filters decides otherwise
filter_fun() takes 2 arguments - event and Opts
filter_fun() can return:
- stop - to prevent event from being logged
- ignore - this filter doesn't do anything and further filters should decide what to do
- event itself (can be modified) - event that will be sent to the handler

bryannaegele · 2020-01-14T22:05:30Z

To maintain the highest performance, reporters could add a second event handler which processes metrics defined with a predicate to remove an additional function call, e.g. handle_event/4 and handle_filtered_event/4.

@bryannaegele I'm not sure I understand this part - can you explain?

Sure. I had two concerns about that.

With one handler, the handler would need to check if there is a predicate or provide a default, then execute it on every call. This isn't a huge hit, but it's not zero. However, user supplied predicates will almost always be an anonymous function which does have a measurable hit on performance. This pattern would help alleviate doing checks on every call to segregate known filterable calls to another handler.

I suppose you could provide a default handler in your reporter that is a function capture vs anonymous function.

bryannaegele · 2020-01-14T22:10:17Z

Is that possible to update metric filter after creation?

I suppose you could if you used an MFA and your function was somehow dynamic. That doesn't sound like something you'd ever do in practice. What would be the use case?

As to filtering similarly to logger, we're only concerned with the question of "should I record this event with this definition?" which is a simple predicate rather than modifying anything. Modification of metadata is already available in tag_values/1.

That does raise a good question though of whether or not to apply tag_values/1 before filter/1. 🤔

arkgil · 2020-01-14T22:15:21Z

@bryannaegele thanks for clarification 👍

binaryseed · 2020-01-14T22:21:04Z

Is it implied here that libraries move away from configurable event name? It seemed like in other discussions we're already past the point where we can change that...

hauleth · 2020-01-15T13:40:36Z

That does raise a good question though of whether or not to apply tag_values/1 before filter/1. 🤔

Squash the two into one. Like logger does. This will solve the ordering problem.

bryannaegele · 2020-01-15T15:58:20Z

Is it implied here that libraries move away from configurable event name? It seemed like in other discussions we're already past the point where we can change that...

We could definitely discourage requiring a custom prefix. That would include the major libraries you're probably concerned with. I could possibly be swayed, but I still think power users should be able to prefix in complex cases for maximum performance.

arkgil · 2020-01-16T21:54:57Z

@bryannaegele is there a particular reason why filtering should be based on metadata only? I imagine measurements could be useful as well, e.g. for counting database queries which take above x milliseconds (although one could argue that that's the goal of distribution/2 😄 ).

bryannaegele · 2020-01-17T18:19:53Z

@arkgil I couldn't think of a situation where you'd do that with metrics since, as you mentioned, this is more of a function of your end tooling.

That is a handy technique, however. I use that for slow query logging in our services. Probably don't need this if you have tracing.

def handle_event([_repo, :repo, :query], measures, meta, _config) do
    threshold = Config.query_time_warn_threshold()
    total_time_ms = extract_time(measures, :total_time)

    if total_time_ms > threshold do
      query_time_ms = extract_time(measures, :query_time)
      decode_time_ms = extract_time(measures, :decode_time)
      queue_time_ms = extract_time(measures, :queue_time)
      idle_time_ms = extract_time(measures, :idle_time)

      query_data = %{
        title: "Slow Query",
        query: meta.query,
        params: Enum.map(meta.params, &inspect/1),
        total_time: "#{total_time_ms}ms",
        query_time: "#{query_time_ms}ms",
        decode_time: "#{decode_time_ms}ms",
        queue_time: "#{queue_time_ms}ms",
        idle_time: "#{idle_time_ms}ms",
        data_source: meta.source
      }

      _ =
        Jason.encode!(query_data)
        |> Logger.warn()

      :ok
    end

    :ok
  end

arkgil · 2020-01-21T20:40:06Z

Right, that makes sense. I'd be for implementing this in Telemetry.Metrics, looks like a useful and generic (as in it can be used across reporters) feature 👍

arkgil · 2020-01-21T20:41:28Z

That does raise a good question though of whether or not to apply tag_values/1 before filter/1. 🤔

I think we should filter first. Filter is for events, tag_values produces tags which are part of a metric derived from that event.

hkrutzer · 2020-02-05T10:22:05Z

Ecto doesn't always have a value for all its metrics so currently I have to set 0 as a default, which obviously skews results, so this would be a great addition. Is this waiting for a PR?

arkgil · 2020-02-06T21:46:43Z

@bryannaegele what @hkrutzer says would be a valid use case for filtering based on measurements.

@hkrutzer I think we need to flesh out the details before moving into implementation, but we're certainly looking for contributors here 👍

bryannaegele · 2020-02-10T21:54:41Z

@arkgil yes, definitely. This was a good use case which I have run into with Ecto, so I think it's good to support filtering on both.

My updated proposal is to change the proposed filter predicate function requirements to accept 2 arguments - measurements and metadata.

distribution("ecto.writer.query.duration.ms",
  buckets: [5, 10, 20, 50, 100, 200, 500, 1000, 1500, 2000, 5000, 10000],
  event_name: [:repo, :query],
  measurement: :query_time,
  unit: {:native, :millisecond},
  filter: fn _measurements, meta ->
    match?(%{repo: MyApp.Repo}, meta)
  end,
  description: "Ecto query duration - Time spent executing the query",
  tags: [:repo]
)

Thoughts?

arkgil · 2020-02-11T05:25:56Z

@bryannaegele looks great now 👌

bryannaegele · 2020-02-13T17:43:22Z

I was working on this last night and realized the described use case is supposed to already be handled by reporters but I fail to do so in the Prometheus one.

Under https://github.com/beam-telemetry/telemetry_metrics/blob/master/docs/writing_reporters.md#reacting-to-events

Measurements are optional, so we must skip reporting that particular measurement if it is not available

If we have this as a standard for reporters and still think it's a good idea (I do), then this is just a bug in the Prometheus reporter.

I don't see any harm with passing measurements to the filter, but I don't see a need to any longer.

The only "harm" I would see is not having a need for it and making filter functions more verbose where a simple capture would suffice, such as filter: &(&match?(%{read_only: true}, &1))

arkgil · 2020-02-13T21:06:33Z

@bryannaegele isn't the problem in the described use case not that the values are missing, but that they are set to 0, and skew the aggregations?

The only "harm" I would see is not having a need for it and making filter functions more verbose where a simple capture would suffice

I agree. Maybe this requires more thinking.

bryannaegele · 2020-02-13T21:23:35Z

They are set to nil

josevalim · 2020-02-14T09:18:45Z

I agree that we can close this for now as the use case is address. We don't emitting metadata instead of emitting zero on purpose, exactly to avoid skews.

bryannaegele · 2020-02-14T16:19:41Z

@josevalim the metadata filtering case is still relevant and removes the propensity of event prefixes. The remaining question was if there's any reason to include measurements as an argument when we can't think of a use case.

josevalim · 2020-02-14T16:31:44Z

we can always ship with only the meta and in the future if we need the measurements we can check the function arity and dispatch accordingly with both meta and measurements.

hkrutzer · 2020-02-15T12:19:40Z

Measurements are optional, so we must skip reporting that particular measurement if it is not available

So this code should work?

Telemetry.Metrics.summary("somerepo.query.idle_time_ms",
  event_name: "somerepo.query",
  measurement: fn x -> Map.get(x, :idle_time) end,
  tags: [:environment ],
  unit: {:native, :millisecond}
)

This results in

[error] Handler {TelemetryMetricsStatsd.EventHandler, #PID<0.810.0>, [:somerepo, , :query]} has failed and has been detached. Class=:error
Reason=:badarith
Stacktrace=[{:erlang, :*, [nil, 1.0e-6], []}, {Telemetry.Metrics, :"-maybe_convert_measurement/2-fun-0-", 3, [file: 'lib/telemetry_metrics.ex', line: 639]}

Am I doing it wrong?

josevalim · 2020-02-15T15:00:29Z

Hi @hkrutzer. Nope, that was a bug, thanks for the report. It has been fixed in master.

hkrutzer · 2020-02-15T15:42:20Z

Thank you!

bryannaegele · 2020-02-15T18:19:53Z

@hkrutzer This is fixed in the Prometheus library but I see your example is from StatsD. If this is happening in the StatsD reporter, that will need to be fixed in there, as well.

bryannaegele added enhancement New feature or request discussion labels Jan 14, 2020

bryannaegele mentioned this issue Feb 28, 2020

Enable event filtering on metadata #61

Merged

josevalim closed this as completed in #61 Mar 10, 2020

bryannaegele mentioned this issue Jun 15, 2020

Add :telemetry_prefix option to Tesla.Middleware.Telemetry elixir-tesla/tesla#390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make metrics filterable based upon metadata #55

Make metrics filterable based upon metadata #55

bryannaegele commented Jan 14, 2020 •

edited

Loading

arkgil commented Jan 14, 2020

hauleth commented Jan 14, 2020

bryannaegele commented Jan 14, 2020

bryannaegele commented Jan 14, 2020 •

edited

Loading

arkgil commented Jan 14, 2020

binaryseed commented Jan 14, 2020

hauleth commented Jan 15, 2020

bryannaegele commented Jan 15, 2020

arkgil commented Jan 16, 2020

bryannaegele commented Jan 17, 2020

arkgil commented Jan 21, 2020

arkgil commented Jan 21, 2020

hkrutzer commented Feb 5, 2020

arkgil commented Feb 6, 2020

bryannaegele commented Feb 10, 2020

arkgil commented Feb 11, 2020

bryannaegele commented Feb 13, 2020 •

edited

Loading

arkgil commented Feb 13, 2020 •

edited

Loading

bryannaegele commented Feb 13, 2020

josevalim commented Feb 14, 2020

bryannaegele commented Feb 14, 2020

josevalim commented Feb 14, 2020

hkrutzer commented Feb 15, 2020

josevalim commented Feb 15, 2020 •

edited

Loading

hkrutzer commented Feb 15, 2020

bryannaegele commented Feb 15, 2020

Make metrics filterable based upon metadata #55

Make metrics filterable based upon metadata #55

Comments

bryannaegele commented Jan 14, 2020 • edited Loading

Example

arkgil commented Jan 14, 2020

hauleth commented Jan 14, 2020

bryannaegele commented Jan 14, 2020

bryannaegele commented Jan 14, 2020 • edited Loading

arkgil commented Jan 14, 2020

binaryseed commented Jan 14, 2020

hauleth commented Jan 15, 2020

bryannaegele commented Jan 15, 2020

arkgil commented Jan 16, 2020

bryannaegele commented Jan 17, 2020

arkgil commented Jan 21, 2020

arkgil commented Jan 21, 2020

hkrutzer commented Feb 5, 2020

arkgil commented Feb 6, 2020

bryannaegele commented Feb 10, 2020

arkgil commented Feb 11, 2020

bryannaegele commented Feb 13, 2020 • edited Loading

arkgil commented Feb 13, 2020 • edited Loading

bryannaegele commented Feb 13, 2020

josevalim commented Feb 14, 2020

bryannaegele commented Feb 14, 2020

josevalim commented Feb 14, 2020

hkrutzer commented Feb 15, 2020

josevalim commented Feb 15, 2020 • edited Loading

hkrutzer commented Feb 15, 2020

bryannaegele commented Feb 15, 2020

bryannaegele commented Jan 14, 2020 •

edited

Loading

bryannaegele commented Jan 14, 2020 •

edited

Loading

bryannaegele commented Feb 13, 2020 •

edited

Loading

arkgil commented Feb 13, 2020 •

edited

Loading

josevalim commented Feb 15, 2020 •

edited

Loading