Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exporter to Datadog #572

Merged
merged 33 commits into from
May 13, 2020

Conversation

majorgreys
Copy link
Contributor

Add an exporter to Datadog. This implementation makes use of ddtrace to handle the creation of Datadog traces and writing them to the Datadog agent.

@majorgreys majorgreys force-pushed the majorgreys/exporter-datadog branch 2 times, most recently from e3a3619 to e44f29b Compare April 14, 2020 19:38
@majorgreys majorgreys force-pushed the majorgreys/exporter-datadog branch from e44f29b to f56b985 Compare April 14, 2020 19:52
@majorgreys majorgreys force-pushed the majorgreys/exporter-datadog branch from 4405774 to 7664e6d Compare April 15, 2020 21:16
@c24t c24t added the needs reviewers PRs with this label are ready for review and needs people to review to move forward. label Apr 16, 2020
@majorgreys majorgreys marked this pull request as ready for review April 16, 2020 22:51
@majorgreys majorgreys requested a review from a team April 16, 2020 22:51
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments about details around. I tried it and I was able to see the spans in datadog.

Are the events missing?

tox.ini Show resolved Hide resolved
ext/opentelemetry-ext-datadog/setup.cfg Show resolved Hide resolved
docs/examples/datadog_exporter/README.rst Outdated Show resolved Hide resolved
docs/examples/datadog_exporter/README.rst Outdated Show resolved Hide resolved

.. code-block:: sh

pip install opentelemetry-api

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can roll these up into a single pip install.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment


assert len(argv) == 2

with tracer.start_as_current_span("client"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have this example live within a while True loop with a slight delay? Might better demonstrate practical usage in the product.



if __name__ == "__main__":
app.run(port=8082)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: might be nice to have the port easily configurable via environment variable in case there's a conflict.

self._agent_writer = None

@property
def agent_writer(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a private method. We keep it internal within ddtrace.

else span.attributes["http.method"]
)

return span.attributes.get("component", span.name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the otel spec, please excuse the naivety, but will component have a low-ish cardinality?

def shutdown(self):
if self.agent_writer.started:
self.agent_writer.stop()
self.agent_writer.join(self.agent_writer.exit_timeout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the otel folks cool with having a blocking operation like this? It could block for up to a second potentially. Might be worth adding a comment to state that it's required to ensure that the traces are flushed before shutdown. 🙂



def _get_span_type(span):
"""Map span to Datadog span type"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other span_types possible? Hopefully we're able to do http, web, worker, cache as well.

@c24t c24t changed the title Add exporter to Datadog [WIP] Add exporter to Datadog Apr 23, 2020
@majorgreys majorgreys changed the title [WIP] Add exporter to Datadog Add exporter to Datadog Apr 30, 2020
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the span processor piece and left some comments. It's looking pretty good.

while self.check_traces_queue:
trace_id = self.check_traces_queue.pop()
if trace_id is self._FLUSH_TOKEN:
notify_flush = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should break the loop here to do the force flushing ASAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following here the approach in BatchExportSpanProcessor where force flushing still exports all the spans that had been queued up. Similarly, in the Datadog exporter we want to export all ready to be exported traces. Breaking the loop when we get the flush token would mean we drop ready to be exported traces. But I might have misunderstood how the batch exporter was functioning with these condition variable locks.

Comment on lines +155 to +157
# check whether trace is exportable again in case that new
# spans were started since we last concluded trace was
# exportable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case possible? Can't we consider that when all started spans are finished the trace is done and no more spans will come from the same trace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking here of the async case where parent spans will exit before all child spans have done so.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the number of started spans would be greater than the number of ended spans (there are some child spans that are not ended yet).

My point is, is the following case possible in a real scenario?

parent = tracer.start_span("parent")
with tracer.use_span(parent, end_on_exit=True):
  with tracer.start_as_current_span("foo"):
    print("Hello world from OpenTelemetry Python!")

# Is the trace done here?

with tracer.start_as_current_span("child", parent=parent):
  print("here we go again")

Is it possible to start a span using a parent that is ended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to start a span using a parent that is ended?

Looks to be the case. The exporter output of the above are these three spans:

# "foo"
Span(name="foo", context=SpanContext(trace_id=0xc979ccb3a89fa1edacef2e9bd1e6255b, span_id=0x1589d32ad80407e0, trace_state={}, is_remote=False), kind=SpanKind.INTERNAL, parent=Span(name="parent", context=SpanContext(trace_id=0xc979ccb3a89fa1edacef2e9bd1e6255b, span_id=0x4943984635edaebb, trace_state={}, is_remote=False)), start_time=2020-05-08T19:47:44.261087Z, end_time=2020-05-08T19:47:44.261138Z)
# "parent"
Span(name="parent", context=SpanContext(trace_id=0xc979ccb3a89fa1edacef2e9bd1e6255b, span_id=0x4943984635edaebb, trace_state={}, is_remote=False), kind=SpanKind.INTERNAL, parent=None, start_time=2020-05-08T19:47:44.260045Z, end_time=2020-05-08T19:47:44.261158Z)
# child
Span(name="child", context=SpanContext(trace_id=0xc979ccb3a89fa1edacef2e9bd1e6255b, span_id=0x05c69b112b2910f8, trace_state={}, is_remote=False), kind=SpanKind.INTERNAL, parent=Span(name="parent", context=SpanContext(trace_id=0xc979ccb3a89fa1edacef2e9bd1e6255b, span_id=0x4943984635edaebb, trace_state={}, is_remote=False)), start_time=2020-05-08T19:47:46.524331Z, end_time=2020-05-08T19:47:46.524393Z)

Comment on lines +106 to +107
if not self._flushing:
with self.condition:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BatchSpanProcessor has a logic to avoid waiting in the condition if there are too many spans to be sent. Do you think a similar approach here could help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Datadog exporter does not have a notion of "max batch size" as of yet. There is a limit on the number of spans to be stored for a trace (4096) but that's handled as spans are started. At export time we export individual traces. Under the hood the writing to a Datadog agent is handled by the ddtrace library which maintains its own queue of traces and has logic around payload sizes.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, merging this for the 0.7b.0 release.

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted so we can start verifying interaction with the datadog client with users. Thanks!

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern around merging this PR that I would like addressed. The spec specifically states the following here:

Other vendor-specific exporters (exporters that implement vendor protocols) should not be included in language libraries and should be placed elsewhere (the exact approach for storing and maintaining vendor-specific exporters will be defined in the future).

We asked the azure exporter be moved out specifically because of this and it came up in the stackdriver PR as well. I'd like my comment addressed before moving to approve.

@codeboten
Copy link
Contributor

Just wanted to followup, here's the link to the azure exporter repo that was moved as per #193. I think long term we'll want all exporters to live in the contrib repo, so as long as we're ok with merging this in until we move all exporters into the contrib, then I'm fine to approve. @c24t @toumorokoshi thoughts?

@c24t
Copy link
Member

c24t commented May 13, 2020

so as long as we're ok with merging this in until we move all exporters into the contrib, then I'm fine to approve.

That sounds good to me. As we talked about a bit in the SIG call, it's much easier to develop exporters in the main repo since we don't have the same tests and tooling available in other repos. I think it's fine in general to develop in this repo and then move packages out once they're stable, like we did for the azure exporter.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #691 to track moving the exporter code out of this repo.

@codeboten codeboten merged commit f1a7feb into open-telemetry:master May 13, 2020
@majorgreys majorgreys deleted the majorgreys/exporter-datadog branch May 13, 2020 22:14
owais pushed a commit to owais/opentelemetry-python that referenced this pull request May 22, 2020
Add an exporter to Datadog. This implementation makes use of ddtrace to handle the creation of Datadog traces and writing them to the Datadog agent.

Co-Authored-By: Mauricio Vásquez <mauricio@kinvolk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs reviewers PRs with this label are ready for review and needs people to review to move forward.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants