Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new metadata to Datadog traces #1763

Closed
wants to merge 1 commit into from

Conversation

macaptain
Copy link
Contributor

Overview

Because API responses are always 200, even if there's an error message in the body of the response, observability is impacted and it's hard to understand the health of services. This PR adds extra fields to the spans of tags passing through the Watcher and Watcher Info Phoenix based APIs, which flags responses with errors in the trace in Datadog.

While Datadog generates metrics from traces, unfortunately it isn't possible to group the traces by error_type in the metrics count (see https://docs.datadoghq.com/tracing/guide/metrics_namespace/#errors), so we will need to add more instrumentation, counting errors hit with custom metrics. This'll be a different PR.

Changes

What you see in Datadog traces after this change:

  • The service name (default 'web' for Phoenix applications) is the nameof the OMG application.
  • Traces are tagged with application version number
  • For API responses which error, the trace is flagged red as an error, the error code is marked on the trace and the error message is found in the tags.

image

This is implemented by using customize_metadata on the SpandexPhoenix tracer: https://hexdocs.pm/spandex_phoenix/SpandexPhoenix.html, parsing the JSON response body with Jason (hopefully a relatively cheap operation), and adding the fields to the default metadata.

Testing

DD_API_KEY=<some-api-key> docker-compose -f docker-compose-watcher.yml -f docker-compose.dev.yml up

Make API calls to the endpoints, and check the local-development environment APM in Datadog. Look for services "watcher" and "watcher_info", and check the traces in the DataDog GUI.

P.S.

This is my first Elixir PR. Please let me know how my code can be more idiomatic!

Because API responses are always 200, even if there's an error message
in the body of the response, it's hard to get metrics from Datadog about
 the health of services.

What you see in Datadog traces:
* The service name (default 'web' for Phoenix applications) is the name
of the OMG application.
* Traces are tagged with application version number
* For API responses which error, the trace is flagged red as an error,
the error code is marked on the trace and the error message is found
in the tags.
Comment on lines +80 to +85
def add_trace_metadata(conn) do
case Jason.decode(conn.resp_body) do
{:ok, json_resp_body} -> add_metadata_from_response_body(conn, json_resp_body)
{:error, _} -> SpandexPhoenix.default_metadata(conn)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that we could find a way that would avoid this serialization and deserialization.

if you look at OMG.WatcherRPC.Web.api_response/3 this gets called when success == true
if you look at OMG.WatcherRPC.Web.Controller.Fallback.call/2 this gets called when success == false

and you could attach metadata at this point in time with assigns https://elixircasts.io/exploring-phoenix-assigns

try it, let me know :)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be great to avoid decoding the response body back to JSON. I'll take a look at using Phoenix assigns instead. Thanks for the suggestion!

type(:atom)
"""
defp add_metadata_from_response_body(conn, json_resp_body) do
service = String.to_atom(json_resp_body["service_name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems somewhat dangerous that could peraps blow up the VM unintentionally. try to choose the atom, for example:

case (json_resp_body["service_name"] do
"watcher"->:watcher
"watcher_info"-> :watcher_info
end

the reason being, atoms are not garbage collected and turning strings to atoms is something we should avoid.

Comment on lines +48 to +50
error = if !json_resp_body["success"], do: Keyword.new([{:error, true}])
error_type = if !!error and json_resp_body["data"], do: json_resp_body["data"]["code"]
error_msg = if !!error and json_resp_body["data"], do: json_resp_body["data"]["description"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if clauses are to be used only with functions with questionmarks. in elixir, we prefer patter matching with case clauses

Comment on lines +56 to +57
_ = if error_type, do: tags ++ {String.to_atom("error.type"), error_type}
_ = if error_msg, do: tags ++ {String.to_atom("error.msg"), error_msg}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said, String.to_atom is to be avoided whenever possible.

Comment on lines +65 to +69
if error,
do:
trace_data
|> Keyword.put(:error, error),
else: trace_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a short if notation for a multiline statement... anyhow, use case clauses as they're prefered.

Handles failure to decode the response body gracefully.
"""

def add_trace_metadata(conn) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public functions on top, private at the bottom

Comment on lines +30 to +31
Phoenix.ConnTest.build_conn(:get, "/alerts.get")
|> Plug.Conn.resp(200, resp_body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piping should be used for multiple statements piped.
yes:

:get
|> Phoenix.ConnTest.build_conn("/alerts.get")
|> Plug.Conn.resp(200, resp_body)

@macaptain
Copy link
Contributor Author

Thanks for the review @InoMurko.

I'll close this PR and open a new one. There's too much to fix here in one pass. I'll re-use customize_metadata, but take a different approach with assigns for adding metadata to the conn.

@macaptain macaptain closed this Nov 11, 2020
@macaptain macaptain deleted the add-metadata-to-dd-traces branch November 11, 2020 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants