New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: add new metadata to Datadog traces #1763

Closed

macaptain wants to merge 1 commit into master from add-metadata-to-dd-traces

Contributor

macaptain commented Nov 6, 2020

Overview

Because API responses are always 200, even if there's an error message in the body of the response, observability is impacted and it's hard to understand the health of services. This PR adds extra fields to the spans of tags passing through the Watcher and Watcher Info Phoenix based APIs, which flags responses with errors in the trace in Datadog.

While Datadog generates metrics from traces, unfortunately it isn't possible to group the traces by error_type in the metrics count (see https://docs.datadoghq.com/tracing/guide/metrics_namespace/#errors), so we will need to add more instrumentation, counting errors hit with custom metrics. This'll be a different PR.

Changes

What you see in Datadog traces after this change:

The service name (default 'web' for Phoenix applications) is the nameof the OMG application.
Traces are tagged with application version number
For API responses which error, the trace is flagged red as an error, the error code is marked on the trace and the error message is found in the tags.

This is implemented by using customize_metadata on the SpandexPhoenix tracer: https://hexdocs.pm/spandex_phoenix/SpandexPhoenix.html, parsing the JSON response body with Jason (hopefully a relatively cheap operation), and adding the fields to the default metadata.

Testing

DD_API_KEY=<some-api-key> docker-compose -f docker-compose-watcher.yml -f docker-compose.dev.yml up

Make API calls to the endpoints, and check the local-development environment APM in Datadog. Look for services "watcher" and "watcher_info", and check the traces in the DataDog GUI.

P.S.

This is my first Elixir PR. Please let me know how my code can be more idiomatic!


          feat: add new metadata to Datadog traces

685f1b0

Because API responses are always 200, even if there's an error message
in the body of the response, it's hard to get metrics from Datadog about
 the health of services.

What you see in Datadog traces:
* The service name (default 'web' for Phoenix applications) is the name
of the OMG application.
* Traces are tagged with application version number
* For API responses which error, the trace is flagged red as an error,
the error code is marked on the trace and the error message is found
in the tags.

macaptain requested review from InoMurko, 0x234, mikebryant and arthurk

November 6, 2020 13:33

InoMurko suggested changes

View reviewed changes

apps/omg_watcher_rpc/lib/tracer.ex

Comment on lines +80 to +85

+                def add_trace_metadata(conn) do
+                  case Jason.decode(conn.resp_body) do
+                    {:ok, json_resp_body} -> add_metadata_from_response_body(conn, json_resp_body)
+                    {:error, _} -> SpandexPhoenix.default_metadata(conn)
+                  end
+                end

Contributor

InoMurko Nov 9, 2020

it seems to me that we could find a way that would avoid this serialization and deserialization.

if you look at OMG.WatcherRPC.Web.api_response/3 this gets called when success == true
if you look at OMG.WatcherRPC.Web.Controller.Fallback.call/2 this gets called when success == false

and you could attach metadata at this point in time with assigns https://elixircasts.io/exploring-phoenix-assigns

try it, let me know :)))

Contributor Author

macaptain Nov 10, 2020

I agree that it would be great to avoid decoding the response body back to JSON. I'll take a look at using Phoenix assigns instead. Thanks for the suggestion!

apps/omg_watcher_rpc/lib/tracer.ex

+                type(:atom)
+                """
+                defp add_metadata_from_response_body(conn, json_resp_body) do
+                  service = String.to_atom(json_resp_body["service_name"])

Contributor

InoMurko Nov 9, 2020

this seems somewhat dangerous that could peraps blow up the VM unintentionally. try to choose the atom, for example:

case (json_resp_body["service_name"] do
"watcher"->:watcher
"watcher_info"-> :watcher_info
end

the reason being, atoms are not garbage collected and turning strings to atoms is something we should avoid.

apps/omg_watcher_rpc/lib/tracer.ex

Comment on lines +48 to +50

+                  error = if !json_resp_body["success"], do: Keyword.new([{:error, true}])
+                  error_type = if !!error and json_resp_body["data"], do: json_resp_body["data"]["code"]
+                  error_msg = if !!error and json_resp_body["data"], do: json_resp_body["data"]["description"]

Contributor

InoMurko Nov 9, 2020

if clauses are to be used only with functions with questionmarks. in elixir, we prefer patter matching with case clauses

apps/omg_watcher_rpc/lib/tracer.ex

Comment on lines +56 to +57

		_ = if error_type, do: tags ++ {String.to_atom("error.type"), error_type}
		_ = if error_msg, do: tags ++ {String.to_atom("error.msg"), error_msg}

Contributor

InoMurko Nov 9, 2020

as said, String.to_atom is to be avoided whenever possible.

apps/omg_watcher_rpc/lib/tracer.ex

Comment on lines +65 to +69

+                  if error,
+                    do:
+                      trace_data
+                      |> Keyword.put(:error, error),
+                    else: trace_data

Contributor

InoMurko Nov 9, 2020

this is a short if notation for a multiline statement... anyhow, use case clauses as they're prefered.

apps/omg_watcher_rpc/lib/tracer.ex

+                Handles failure to decode the response body gracefully.
+                """
+                def add_trace_metadata(conn) do

Contributor

InoMurko Nov 9, 2020

public functions on top, private at the bottom

apps/omg_watcher_rpc/test/omg_watcher_rpc/tracer_test.exs

Comment on lines +30 to +31

		Phoenix.ConnTest.build_conn(:get, "/alerts.get")
		\|> Plug.Conn.resp(200, resp_body)

Contributor

InoMurko Nov 9, 2020

piping should be used for multiple statements piped.
yes:

:get
|> Phoenix.ConnTest.build_conn("/alerts.get")
|> Plug.Conn.resp(200, resp_body)

Contributor Author

macaptain commented Nov 11, 2020

Thanks for the review @InoMurko.

I'll close this PR and open a new one. There's too much to fix here in one pass. I'll re-use customize_metadata, but take a different approach with assigns for adding metadata to the conn.

macaptain closed this

macaptain deleted the add-metadata-to-dd-traces branch

November 11, 2020 14:11

macaptain mentioned this pull request

feat: add new metadata to Datadog traces (attempt 2) #1770

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet