Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IdsGenerator interface for TracerProvider and include default RandomIdsGenerator #1153

Merged
merged 2 commits into from
Oct 1, 2020

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Sep 24, 2020

Description

This PR moves in the direction of supporting custom forms of trace & span ID generation. The opentelemtry-java SDK and opentelemetry-js SDK do this as well.

Fixes #1152

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

I was able to setup a simple small script that output the traces to the console as expected:

#!/usr/bin/env python3

# OTel Tracing
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.trace import SpanKind
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleExportSpanProcessor


if __name__ == '__main__':
    # Console Exporter Setup
    trace.set_tracer_provider(TracerProvider())

    trace.get_tracer_provider().add_span_processor(
        SimpleExportSpanProcessor(ConsoleSpanExporter())
    )

    # Final Tracer Setup
    tracer = trace.get_tracer(__name__)

    with tracer.start_span('my_first_span', kind= SpanKind.SERVER):
        with tracer.start_span('my_second_span', parent=trace.get_current_span(), kind= SpanKind.SERVER):
            print('Hello, world!')

to get output:

Hello, world!
{
    "name": "my_second_span",
    "context": {
        "trace_id": "0xdb6a810342a9a20651eb96829f0fa3f4",
        "span_id": "0xa986b94247d4bc48",
        "trace_state": "{}"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2020-09-24T04:05:04.528721Z",
    "end_time": "2020-09-24T04:05:04.528738Z",
    "status": {
        "canonical_code": "OK"
    },
    "attributes": {},
    "events": [],
    "links": [],
    "resource": {
        "telemetry.sdk.language": "python",
        "telemetry.sdk.name": "opentelemetry",
        "telemetry.sdk.version": "0.14.dev0"
    }
}
{
    "name": "my_first_span",
    "context": {
        "trace_id": "0x85354f7a3a0f9e091d54b8d1560d32c0",
        "span_id": "0x1a39976cb621bcbb",
        "trace_state": "{}"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2020-09-24T04:05:04.528678Z",
    "end_time": "2020-09-24T04:05:04.529366Z",
    "status": {
        "canonical_code": "OK"
    },
    "attributes": {},
    "events": [],
    "links": [],
    "resource": {
        "telemetry.sdk.language": "python",
        "telemetry.sdk.name": "opentelemetry",
        "telemetry.sdk.version": "0.14.dev0"
    }
}

However, I did not think it was necessary to include tests for the RandomIdsGenerator.py file since it is just doing the same thing the SDK was always doing before and the method implementations are just 2 lines overall.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
    - [ ] Unit tests have been added No unit tests included for this change
  • Documentation has been updated

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 24, 2020

CLA Check
The committers are authorized under a signed CLA.

Copy link

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just suggestions on docs


from opentelemetry import trace as trace_api

class RandomIdsGenerator(trace_api.IdsGenerator):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs

class IdsGenerator(abc.ABC):
@abc.abstractmethod
def generate_span_id(self) -> int:
"""Get a new random span ID.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think random is a feature of this interface, that's the implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, makes sense, will make the change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elected to say "Get a new span ID." instead of "Get a new unique span ID." because I don't want to assume how random implementations will be.


from opentelemetry import trace as trace_api

class RandomIdsGenerator(trace_api.IdsGenerator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we have this class in the same file in ids_generator.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely get what you're saying, but maybe we would want to keep it like this for these reasons:

  1. Right now we have this:
.
├── opentelemetry-api
│   └── src
│       └── opentelemetry
│           └── trace
│               └── ids_generator.py
└── opentelemetry-sdk
    └── src
        └── opentelemetry
            └── sdk
                └── trace
                    └── random_ids_generator.py

I was thinking that it would be good to have random_ids_generator.py separate because subsequent generators could be found in the same directory level as the default "random" one. That way, all the generators would be in the same place:

.
├── opentelemetry-api
│   └── src
│       └── opentelemetry
│           └── trace
│               └── ids_generator.py
└── opentelemetry-sdk
    └── src
        └── opentelemetry
            └── sdk
                └── trace
                    ├── aws_xray_ids_generator.py
                    ├── bar_ids_generator.py
                    ├── foo_ids_generator.py
                    └── random_ids_generator.py

But as a concession java actually looks like this right now:

.
├── sdk
│   └── tracing
│       └── src
│           └── main
│               └── java
│                   └── io
│                       └── opentelemetry
│                           └── sdk
│                               └── trace
│                                   ├── IdsGenerator.java
│                                   └── RandomIdsGenerator.java
└── sdk_extensions
    └── aws_v1_support
        └── src
            └── main
                └── java
                    └── io
                        └── opentelemetry
                            └── sdk
                                └── extensions
                                    └── trace
                                        └── aws
                                            └── AwsXRayIdsGenerator.java
  1. This actually follows what we already have with span.py because while the span.py has the interface in the api, the actual implementation is in the opentelemetry.span.trace package as below:
.
├── opentelemetry-api
│   └── src
│       └── opentelemetry
│           └── trace
│               └── span.py
└── opentelemetry-sdk
    └── src
        └── opentelemetry
            └── sdk
                └── trace
                    └── __init__.py

I like the separation of "interface" in API and "implementation" in SDK that this models!

Let me know what you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my understanding that we would not have vendor specific generators exist in any of our packages. Ideally if users wanted to use aws_xray_ids_generator, the would create their own and extend from ids_generator. Notice how in java, the AwsXRayIdsGenerator is in the aws specific extension. This also forces users to take a dependency on the sdk package if they want to use the random_ids_generator.

Also, the design pattern of span.py is a special case. I believe it was done because it was just getting too large. If you look at the rest of the codebase, pretty much all classes that "relate" to each other in terms of functionality are in the same file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I think Java did it well too, and I see what you mean about span.py, so I'll move RandomIdsGenerator into the ids_generator.py file like you mentioned.

Copy link
Contributor

@lzchen lzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Can you add an entry in the CHANGELOG?

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch 2 times, most recently from 30f5eb0 to 4a643e0 Compare September 24, 2020 16:12
@codeboten
Copy link
Contributor

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch from 4a643e0 to f18b705 Compare September 24, 2020 21:27
@@ -8,6 +8,7 @@ Submodules

trace.status
trace.span
trace.ids_generator
Copy link
Contributor Author

@NathanielRN NathanielRN Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding it to the docs here for visibility. Also because both span.py is added and they are in the same directory level and similar in that they both define an interface.

docs/sdk/trace.rst Outdated Show resolved Hide resolved
@NathanielRN NathanielRN force-pushed the provider-ids-generator branch 12 times, most recently from 66e1fa0 to 4acc788 Compare September 25, 2020 02:56
@NathanielRN NathanielRN marked this pull request as ready for review September 25, 2020 02:58
@NathanielRN NathanielRN requested a review from a team September 25, 2020 02:58
@NathanielRN
Copy link
Contributor Author

Just realized I need to follow up with checking on if "the randomness of the id is important for the sampler". Will do so tomorrow!

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch from 4acc788 to 33879d4 Compare September 25, 2020 17:49
@anuraaga
Copy link

@lzchen Thanks for the comments. w3c does not define a strong requirement for randomness. Randomly generated are preferred with a SHOULD, not MUST requirement

https://github.com/w3c/trace-context/blob/c0de4e0527f55018a51af1b5f66000b375ab5fc3/spec/60-trace-id-format.md#randomness-of-trace-id

This seems to not have been PR'd yet but already agreed on that there won't be randomness requirements

w3c/trace-context#412

/cc @dyladan So I think we don't have to worry about w3c incompliance here. Hope this helps :)

@NathanielRN
Copy link
Contributor Author

@lzchen Thanks for your followup! Definitely a good point to bring up, I'll read up on W3C and understand it better for the future. Regardless, thanks for your help here!

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch from 0febe93 to e6a9701 Compare September 29, 2020 16:43
@@ -733,7 +715,7 @@ def start_span( # pylint: disable=too-many-locals

if parent_context is None or not parent_context.is_valid:
parent = parent_context = None
trace_id = generate_trace_id()
trace_id = self.source.ids_generator.generate_trace_id()
Copy link
Contributor

@owais owais Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't part of the tracer provider API as far as I can tell from this PR, we can't guarantee that source will have an ids_generator attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but this is just being consistent with the assumption later in this file for the sampler, resource, and _active_span_processor variables.

Even so, I think we can guarantee the source will have an ids_generator attribute because source is typed to be of the same type as TracerProvider which will always have an ids_generator because of the change in this PR.

I think that the only danger that could come about would be if you were to implement your own Tracer class that expects an ids_generator, in which case you would also be in danger of assuming there were a sampler or resource as well because they're also not in the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even so, I think we can guarantee the source will have an ids_generator attribute because source is typed to be of the same type as TracerProvider which will always have an ids_generator because of the change in this PR.

I don't think this guarantees anything really. It's only a type hint and makes mypy or other checkers complain if they are run. There is no runtime or build/package time guarantees.

I think that the only danger that could come about would be if you were to implement your own Tracer class that expects an ids_generator, in which case you would also be in danger of assuming there were a sampler or resource as well because they're also not in the API.

Another instance would be if someone implements a custom TracerProvider to add some functionality but wanted to return instance of the stock Tracer. I think both cases should be supported without surprises. Looks like SDK Tracer has taken an implicit dependency on the SDK TracerProvider which shouldn't be the case. Users and vendors should be able to implement just one or the other without having to satisfy undocumented APIs.

I think we should either update the TracerProvider API to specify all these properties or go all in on DI and modify Tracer so that it requires all these dependencies to be explicitly specified by a tracer provider implementation. Meaning IDGenerator would be an initialization argument to the Tracer class. TracerProvider can pass it down explicitly. That way people can implement custom Tracer or custom TracerProvider without surprises.

Since there are other instances of such implicit dependencies, may be we can do this in another PR but if all maintainers agree some solution now, perhaps we can start with IDGenerator now and follow up with the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like SDK Tracer has taken an implicit dependency on the SDK TracerProvider which shouldn't be the case.

Yeah we should probably remove source from both Meter and Tracer SDK implementations and then pass in some sort of "config" object upon creation. Tracer and TracerProvider shouldn't be coupled. But yeah, might be for another PR. @owais can you create an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, yes it makes more sense to have Tracer and TraceProvider not be requirements of each other.

I would also prefer to have that as a separate PR, and minimize the new things this PR is trying to do. The PR is pretty straightforward the way it is right now, and that kind of refactor could be lost in the IDs Generator specific log messages.

I wouldn't mind helping with doing that separate PR either if we can decide on what we want to do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Tracking here: #1181

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch 3 times, most recently from 52497ab to 38a82c7 Compare September 30, 2020 21:19
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question otherwise the change looks good.

@NathanielRN NathanielRN force-pushed the provider-ids-generator branch 2 times, most recently from 27e0278 to e8f4c2c Compare October 1, 2020 16:36
@NathanielRN NathanielRN force-pushed the provider-ids-generator branch 2 times, most recently from db3437b to 43c64ec Compare October 1, 2020 17:36
@NathanielRN NathanielRN force-pushed the provider-ids-generator branch from 43c64ec to 56e9ac5 Compare October 1, 2020 18:21
@NathanielRN
Copy link
Contributor Author

@codeboten Changes have been made!

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@codeboten codeboten merged commit c8df54b into open-telemetry:master Oct 1, 2020
@NathanielRN NathanielRN deleted the provider-ids-generator branch October 1, 2020 21:31
@lzchen lzchen mentioned this pull request Oct 6, 2020
8 tasks
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
Co-authored-by: Daniel Dyla <dyladan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for Custom Trace and Span IDs Generation - IdsGenerator for TracerProvider
6 participants