-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for Trace ID #1992
Conversation
How does this work with pluggable ID generators? Will we be locking users out of using ID generators that do not strictly follow Otel spec? //cc @NathanielRN |
When we just focus on the current implementations and other Python pluggable ID generators, this may introduce some breaking changes. But as I mentioned in #1991, some other languages such as Go and Java already introduced strict validation for the Trace ID format and the current Python implementation is causing incompatibility with other languages in that sense. |
trace_id != INVALID_TRACE_ID | ||
and span_id != INVALID_SPAN_ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add move these two checks into _validate_trace_id()
so it's all together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we should just put the code of _validate_trace_id
here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id
does is trace_id < 2 ** 128
.
""" | ||
if not trace_id: | ||
return False | ||
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work and be a bit faster?
# constant somewhere
_MAX_TRACE_ID = (1 << 128) - 1
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | |
if trace_id > _MAX_TRACE_ID: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also prefer to make sure the trace id value is lesser than a certain value. Also, instead of (1 << 128) - 1
we can do 2 ** 128 - 1
which is subjectively easier to understand.
@owais Thanks for the ping! Based on the PR right now, I think this should be fine, since it's only being strict about the length of the ID. It isn't giving restrictions about the bytes that actually make up the ID. For example, in the @staticmethod
def generate_trace_id() -> int:
trace_time = int(time.time())
trace_identifier = random.getrandbits(96)
return (trace_time << 96) + trace_identifier We remove some randomness from the ID in order to use some bits for the time stamp, but as far as OTel Python is concerned, this trace ID just as valid as an all random trace ID. Then on the AWS backend, we can parse out this "OTel" ID as a "AWS ID" having expected the user used I think pluggable ID generators should follow the restrictions on ID length (so that we can count on this in the rest of OTel), but how systems encode/decode those bits should not be restricted so that they can add information they want. |
if not trace_id: | ||
return False | ||
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | ||
return False | ||
|
||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building off of what @aabmass said. I'm okay with constant/no constant since it's only used once but having a _MAX_TRACE_ID_LENGTH
is probably easier to read!
if not trace_id: | |
return False | |
if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | |
return False | |
return True | |
return trace_id and trace_id < (1 << 128) - 1 and trace_id != INVALID_TRACE_ID |
As far as I could tell, Go only validates the ID during propagated context injection/extraction and I'm sure Python does it too already. Do we need this additional check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid single use functions and single use constants. Remember that the validation of trace_id
is just making sure it is not greater than a certain specific value.
trace_id != INVALID_TRACE_ID | ||
and span_id != INVALID_SPAN_ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we should just put the code of _validate_trace_id
here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id
does is trace_id < 2 ** 128
.
Thank you for reviewing. Now I changed the code accordingly. |
I believe we only convert it to a base 16 int but not actually check the length. |
I think this should be a requirement as it is in the OTel spec, OTLP proto semantics, and the W3C trace context. Whether or not it was wise to lock down the spec, idk. @ymotongpoo pointed out that Go API uses a fixed 16 byte array here. @owais did you understand something different from the Go code? |
@lzchen we check it on extract() in the regex only, not inject: opentelemetry-python/opentelemetry-api/src/opentelemetry/trace/propagation/tracecontext.py Line 31 in 13f09db
|
is_valid = ( | ||
trace_id != INVALID_TRACE_ID | ||
and span_id != INVALID_SPAN_ID | ||
and trace_id < 2 ** 128 - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ocelotl nit regarding the single use constant, I think it's worth having the constant as it's easier to understand what the magic number means and for the speedup of not calculating the value every time in this hot code path.
(I was curious so checked and CPython is not smart enough to optimize this into a constant on its own (funny enough, it does do it for the bit shifting approach)):
In [2]: def f(trace_id):
...: return trace_id < 2 ** 128 - 1
...:
In [3]: dis(f)
2 0 LOAD_FAST 0 (trace_id)
2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 (128)
6 BINARY_POWER
8 LOAD_CONST 3 (1)
10 BINARY_SUBTRACT
12 COMPARE_OP 0 (<)
14 RETURN_VALUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's a good point. In that case, the constant should be added as a private attribute of the class to keep it as close as where it is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have specs somewhere, but my 2 cents is that I think (1 << 128) - 1
is easier to read.
I can go either way on the constant though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I put the private const variable for readabiliy. As per bit shift vs multiplication, I leave the decision.
Please try again, @ymotongpoo |
@ocelotl Thanks! Now EasyCLA is fine. |
Description
This adds the validator in the constructor of SpanContext so that we can detect the invalid trace ID as early as possible.
Fixes #1991
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
I added the section in TestSpanContext in
opentelemetry-api/tests/trace/test_span_context.py
to validate Trace ID.Does This PR Require a Contrib Repo Change?
Checklist: