Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: tracking issues with Datadog tracing #309

Closed
rexnp opened this issue Jan 26, 2021 · 1 comment
Closed

Bug: tracking issues with Datadog tracing #309

rexnp opened this issue Jan 26, 2021 · 1 comment
Assignees
Labels
Bug Something isn't working

Comments

@rexnp
Copy link

rexnp commented Jan 26, 2021

Summary
We wanted to broadcast some recently-discovered issues around Datadog tracing integration in App Mesh and provide information on each of them.

  1. Configuring the Datadog agent as a daemonset:
  • Context:
    The Datadog agent receiving trace data from an Envoy can be deployed in three deployment formats: as a cluster agent (one per cluster), as a sidecar container, and as a daemonset (in K8s). To configure the Datadog agent in Envoy, users need to specify the address and port of the Datadog agent (see https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy.html section Datadog tracing variables).
  • Problem:
    For deploying the agent as a daemonset specifically, the agent's address will vary for each node as it is bounded to the node local IP address. Presently App Mesh assumes that the address is either a service endpoint or localhost (for sidecars) and offers no way to dynamically configure the node local IP address, blocking the daemonset use case.
  • Next Steps:
    We have provided the capability to configure the Datadog address as status.hostIP to handle the daemonset use case: Allow configuring tracing address as status.hostIP using the downward API aws-app-mesh-controller-for-k8s#425. This will be included in the upcoming 1.3.0 App Mesh controller release.
  1. Missing url information when inspecting traces in the Datadog UI
  • Problem:
    Users have identified an issue where the Http url shows as ? in the Datadog UI when expanding on the traces.
  • Next Steps:
    The Datadog team has identified the root cause in their tracing library used by the Datadog plugin in Envoy. We are coordinating with them to track their fix and will release a new Envoy image with that fix.
  1. Missing egress traces for select clusters (Applies to all tracing and not just Datadog)
  • Context:
    For Envoy to perform tracing, the request must route through Envoy's HTTP filter (as opposed to TCP filter since we cannot inspect traces at the TCP level). In App Mesh, this is dependent on how the cluster type of the destination of the egress traffic is configured, which currently consists of the following ways:
    • Explicitly defined as a Virtual Node's backend: If Virtual Node A has a backend Virtual Node B, we can get traces between Envoy A and Envoy B if Virtual Node B uses a non-TCP listener.
    • An implicit AWS cluster is automatically created for each Envoy which is currently modeled as a TCP cluster.
    • If the mesh has an ALLOW_ALL egress filter, a catch-all TCP cluster is generated.
  • Problem:
    Based on the context, currently all egress traffic using the ALLOW_ALL egress filter and calls to AWS will not generate tracing data. Traces between application and Envoy as well as Envoy-to-Envoy via Http are still generated.
  • Next Steps:
    We've cut an issue to allow traces to be generated when calling other AWS services: Feature Request: Way to enable tracing on the default *.amazonaws.com cluster #308. For a general workaround, users can model the egress destination as a Virtual Node. Please let us know if you have use cases where an alternative solution would be preferred.
@rexnp rexnp added the Bug Something isn't working label Jan 26, 2021
@rexnp rexnp changed the title Bug: issues with Datadog tracing Bug: tracking issues with Datadog tracing Jan 26, 2021
@LancerRainier LancerRainier self-assigned this Feb 17, 2021
@lavignes
Copy link

This was resolved in the 1.16.1.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants