Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for Trace ID #1992

Merged
merged 10 commits into from
Aug 2, 2021
Merged

Add validation for Trace ID #1992

merged 10 commits into from
Aug 2, 2021

Conversation

ymotongpoo
Copy link
Contributor

Description

This adds the validator in the constructor of SpanContext so that we can detect the invalid trace ID as early as possible.

Fixes #1991

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • TestSpanContext

I added the section in TestSpanContext in opentelemetry-api/tests/trace/test_span_context.py to validate Trace ID.

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@ymotongpoo ymotongpoo requested review from a team, ocelotl and srikanthccv and removed request for a team July 27, 2021 04:11
@owais
Copy link
Contributor

owais commented Jul 27, 2021

How does this work with pluggable ID generators? Will we be locking users out of using ID generators that do not strictly follow Otel spec? //cc @NathanielRN

@ymotongpoo
Copy link
Contributor Author

When we just focus on the current implementations and other Python pluggable ID generators, this may introduce some breaking changes. But as I mentioned in #1991, some other languages such as Go and Java already introduced strict validation for the Trace ID format and the current Python implementation is causing incompatibility with other languages in that sense.

Comment on lines +444 to +445
trace_id != INVALID_TRACE_ID
and span_id != INVALID_SPAN_ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add move these two checks into _validate_trace_id() so it's all together?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we should just put the code of _validate_trace_id here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id does is trace_id < 2 ** 128.

"""
if not trace_id:
return False
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work and be a bit faster?

# constant somewhere
_MAX_TRACE_ID = (1 << 128) - 1
Suggested change
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH:
if trace_id > _MAX_TRACE_ID:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also prefer to make sure the trace id value is lesser than a certain value. Also, instead of (1 << 128) - 1 we can do 2 ** 128 - 1 which is subjectively easier to understand.

@NathanielRN
Copy link
Contributor

@owais Thanks for the ping! Based on the PR right now, I think this should be fine, since it's only being strict about the length of the ID. It isn't giving restrictions about the bytes that actually make up the ID.

For example, in the AwsXRayIdGenerator we do this:

@staticmethod
def generate_trace_id() -> int:
    trace_time = int(time.time())
    trace_identifier = random.getrandbits(96)
    return (trace_time << 96) + trace_identifier

We remove some randomness from the ID in order to use some bits for the time stamp, but as far as OTel Python is concerned, this trace ID just as valid as an all random trace ID.

Then on the AWS backend, we can parse out this "OTel" ID as a "AWS ID" having expected the user used AwsXRayIdGenerator.

I think pluggable ID generators should follow the restrictions on ID length (so that we can count on this in the rest of OTel), but how systems encode/decode those bits should not be restricted so that they can add information they want.

Comment on lines 406 to 411
if not trace_id:
return False
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH:
return False

return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building off of what @aabmass said. I'm okay with constant/no constant since it's only used once but having a _MAX_TRACE_ID_LENGTH is probably easier to read!

Suggested change
if not trace_id:
return False
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH:
return False
return True
return trace_id and trace_id < (1 << 128) - 1 and trace_id != INVALID_TRACE_ID

@owais
Copy link
Contributor

owais commented Jul 27, 2021

As far as I could tell, Go only validates the ID during propagated context injection/extraction and I'm sure Python does it too already. Do we need this additional check?

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid single use functions and single use constants. Remember that the validation of trace_id is just making sure it is not greater than a certain specific value.

opentelemetry-api/src/opentelemetry/trace/span.py Outdated Show resolved Hide resolved
Comment on lines +444 to +445
trace_id != INVALID_TRACE_ID
and span_id != INVALID_SPAN_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we should just put the code of _validate_trace_id here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id does is trace_id < 2 ** 128.

@ymotongpoo
Copy link
Contributor Author

Thank you for reviewing. Now I changed the code accordingly.

@lzchen
Copy link
Contributor

lzchen commented Jul 30, 2021

@owais

As far as I could tell, Go only validates the ID during propagated context injection/extraction and I'm sure Python does it too already. Do we need this additional check?

I believe we only convert it to a base 16 int but not actually check the length.

@aabmass
Copy link
Member

aabmass commented Jul 30, 2021

ID generators should follow the restrictions on ID length

As far as I could tell, Go only validates the ID during propagated context injection/extraction and I'm sure Python does it too already. Do we need this additional check?

I think this should be a requirement as it is in the OTel spec, OTLP proto semantics, and the W3C trace context. Whether or not it was wise to lock down the spec, idk.

@ymotongpoo pointed out that Go API uses a fixed 16 byte array here. @owais did you understand something different from the Go code?

@aabmass
Copy link
Member

aabmass commented Jul 30, 2021

I believe we only convert it to a base 16 int but not actually check the length.

@lzchen we check it on extract() in the regex only, not inject:

"^[ \t]*([0-9a-f]{2})-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})"

is_valid = (
trace_id != INVALID_TRACE_ID
and span_id != INVALID_SPAN_ID
and trace_id < 2 ** 128 - 1
Copy link
Member

@aabmass aabmass Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocelotl nit regarding the single use constant, I think it's worth having the constant as it's easier to understand what the magic number means and for the speedup of not calculating the value every time in this hot code path.

(I was curious so checked and CPython is not smart enough to optimize this into a constant on its own (funny enough, it does do it for the bit shifting approach)):

In [2]: def f(trace_id):
   ...:     return trace_id < 2 ** 128 - 1
   ...:

In [3]: dis(f)
  2           0 LOAD_FAST                0 (trace_id)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (128)
              6 BINARY_POWER
              8 LOAD_CONST               3 (1)
             10 BINARY_SUBTRACT
             12 COMPARE_OP               0 (<)
             14 RETURN_VALUE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's a good point. In that case, the constant should be added as a private attribute of the class to keep it as close as where it is being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we have specs somewhere, but my 2 cents is that I think (1 << 128) - 1 is easier to read.

I can go either way on the constant though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I put the private const variable for readabiliy. As per bit shift vs multiplication, I leave the decision.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 2, 2021

CLA Signed

The committers are authorized under a signed CLA.

@ymotongpoo
Copy link
Contributor Author

I'm not sure what to do for EasyCLA. Because the merge from main was introduced here twice (75b5f4a and 9b086d8) after my last commit, I rebased and pushed b41b2d2. I don't know what to do to resolve this.

@ocelotl
Copy link
Contributor

ocelotl commented Aug 2, 2021

I'm not sure what to do for EasyCLA. Because the merge from main was introduced here twice (75b5f4a and 9b086d8) after my last commit, I rebased and pushed b41b2d2. I don't know what to do to resolve this.

Please try again, @ymotongpoo

@ymotongpoo
Copy link
Contributor Author

@ocelotl Thanks! Now EasyCLA is fine.

@lzchen lzchen merged commit 65670cf into open-telemetry:main Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The implementation of Trace ID is incompatible with the spec
6 participants