-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cold start detection and marking #52
Conversation
Forgot to update tests for coldStart tag. |
1ab8e47
to
1b168e7
Compare
@wu-sheng WDYT of the cold start concept and mechanism? |
In the code start case, is there any chance to separate the booting process and request handling process? These are typical differences in most cases I am familiar with. Or do you mean, this code starting begins after receiving a request? I don't know this kind of case. |
The
Of course you have to select the parent times which have cold vs. non-cold children. The /child and /child are subtracted from /parent because they MAY have statistically significant difference in execution time.. In any case this is just something we do on our side because someone requested it, thought it might be useful info for your side as well. |
f0dfe41
to
314f195
Compare
Closed accidentally... |
I am good with a new endpoint with |
That part is optional for visual inspection as well as separating statistics for warm vs. cold endpoints (or maybe help with some DB analysis queries as well). What is always added is the
This is more for @kezhenxu94 I think? Or if you want me to document this then tell me where @kezhenxu94. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don't think I've seen this mechanism in the other agents but I was asked to implement this. A cold start is when an instance is spun up to serve a request, for this reason the response time of the endpoint includes the spinup time of the instance, which is not ideal. This PR detects this condition and marks spans to which it applies.
Cold start detection is as follows: First span to run within 1 second of skywalking init is considered a cold start. This span gets the tag 'coldStart' set to 'true'. This span also optionally gets the text '<cold>' appended to the endpoint name if env var SW_COLD_ENDPOINT is set to 'true' or config option 'coldEndpoint', the default is off (normal endpoint names). A dummy span running first due to ignore is considered a valid first run and eats the cold start. Note: 'coldStart' tag is not added at all if cold start is not detected, which means most spans will not have anything extra added.
The metrics of the endpoint that was cold-started do not change (much) so this tag is not for the endpoint itself but for analysis of spans upstream which will be significantly slower due to the cold start. The visual part via 'coldEndpoint' is more for quick visual analysis in the trace view.
The target environment for this was Azure Functions which we are working on now but the mechanism is general so it should apply in any environment where instances are spun up on demand (Kubernetes). Performance-wise it is just a flag check so impact should be nothing.