Cold start detection and marking #52

tom-pytel · 2021-05-17T12:38:10Z

I don't think I've seen this mechanism in the other agents but I was asked to implement this. A cold start is when an instance is spun up to serve a request, for this reason the response time of the endpoint includes the spinup time of the instance, which is not ideal. This PR detects this condition and marks spans to which it applies.

Cold start detection is as follows: First span to run within 1 second of skywalking init is considered a cold start. This span gets the tag 'coldStart' set to 'true'. This span also optionally gets the text '<cold>' appended to the endpoint name if env var SW_COLD_ENDPOINT is set to 'true' or config option 'coldEndpoint', the default is off (normal endpoint names). A dummy span running first due to ignore is considered a valid first run and eats the cold start. Note: 'coldStart' tag is not added at all if cold start is not detected, which means most spans will not have anything extra added.

The metrics of the endpoint that was cold-started do not change (much) so this tag is not for the endpoint itself but for analysis of spans upstream which will be significantly slower due to the cold start. The visual part via 'coldEndpoint' is more for quick visual analysis in the trace view.

The target environment for this was Azure Functions which we are working on now but the mechanism is general so it should apply in any environment where instances are spun up on demand (Kubernetes). Performance-wise it is just a flag check so impact should be nothing.

tom-pytel · 2021-05-17T13:13:00Z

Forgot to update tests for coldStart tag.

README.md

kezhenxu94 · 2021-05-19T03:49:11Z

@wu-sheng WDYT of the cold start concept and mechanism?

wu-sheng · 2021-05-19T04:14:26Z

In the code start case, is there any chance to separate the booting process and request handling process? These are typical differences in most cases I am familiar with. Or do you mean, this code starting begins after receiving a request? I don't know this kind of case.

tom-pytel · 2021-05-19T13:00:58Z

In the code start case, is there any chance to separate the booting process and request handling process? These are typical differences in most cases I am familiar with. Or do you mean, this code starting begins after receiving a request? I don't know this kind of case.

The coldStart tag doesn't really give information about the endpoint where it applies in isolation but rather explains the slower than normal execution of an upstream parent span which waits on this child. For one cold started child alone there is no way to determine the length of the bootup since the span starts once the boot is complete and in theory will execute more or less the same as if it was not a cold start (except for any unfilled caches and things like that). If you have both cold and non-cold versions of an endpoint with their parent then you can estimate the time of bootup alone as shown below:

              0   1   2   3   4

/parent       +-------+
/child            +---+

/parent       +---------------+
/child<cold>              +---+

child boot time = (/parent - /child<cold>) - (/parent - /child) = (4 - 1) - (2 - 1) = 2

Of course you have to select the parent times which have cold vs. non-cold children. The /child and /child are subtracted from /parent because they MAY have statistically significant difference in execution time..

In any case this is just something we do on our side because someone requested it, thought it might be useful info for your side as well.

tom-pytel · 2021-05-19T13:34:07Z

Closed accidentally...

wu-sheng · 2021-05-19T14:08:17Z

I am good with a new endpoint with <cold> suffix.
But this PR reminds me, I can't find the plugin dev doc about APIs explanation in NodeJS agent. Could you submit an issue to track this, and try to submit a PR to fix?

tom-pytel · 2021-05-19T14:12:07Z

I am good with a new endpoint with <cold> suffix.

That part is optional for visual inspection as well as separating statistics for warm vs. cold endpoints (or maybe help with some DB analysis queries as well). What is always added is the coldStart tag.

But this PR reminds me, I can't find the plugin dev doc about APIs explanation in NodeJS agent. Could you submit an issue to track this, and try to submit a PR to fix?

This is more for @kezhenxu94 I think? Or if you want me to document this then tell me where @kezhenxu94.

kezhenxu94

LGTM

kezhenxu94 requested review from kezhenxu94 and wu-sheng May 17, 2021 13:04

tom-pytel force-pushed the master branch 2 times, most recently from 1ab8e47 to 1b168e7 Compare May 17, 2021 17:04

kezhenxu94 reviewed May 19, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

tom-pytel force-pushed the master branch 3 times, most recently from f0dfe41 to 314f195 Compare May 19, 2021 13:13

tom-pytel closed this May 19, 2021

tom-pytel force-pushed the master branch from 314f195 to cc14cbe Compare May 19, 2021 13:30

tom-pytel reopened this May 19, 2021

cold start detection and marking

ea22eae

tom-pytel force-pushed the master branch from c163fcd to ea22eae Compare May 19, 2021 13:35

kezhenxu94 approved these changes May 20, 2021

View reviewed changes

kezhenxu94 merged commit 8e72aad into apache:master May 20, 2021

kezhenxu94 mentioned this pull request May 23, 2021

Add ioredis plugin #53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cold start detection and marking #52

Cold start detection and marking #52

tom-pytel commented May 17, 2021 •

edited

Loading

tom-pytel commented May 17, 2021

kezhenxu94 commented May 19, 2021

wu-sheng commented May 19, 2021

tom-pytel commented May 19, 2021 •

edited

Loading

tom-pytel commented May 19, 2021

wu-sheng commented May 19, 2021

tom-pytel commented May 19, 2021 •

edited

Loading

kezhenxu94 left a comment

Cold start detection and marking #52

Cold start detection and marking #52

Conversation

tom-pytel commented May 17, 2021 • edited Loading

tom-pytel commented May 17, 2021

kezhenxu94 commented May 19, 2021

wu-sheng commented May 19, 2021

tom-pytel commented May 19, 2021 • edited Loading

tom-pytel commented May 19, 2021

wu-sheng commented May 19, 2021

tom-pytel commented May 19, 2021 • edited Loading

kezhenxu94 left a comment

Choose a reason for hiding this comment

tom-pytel commented May 17, 2021 •

edited

Loading

tom-pytel commented May 19, 2021 •

edited

Loading

tom-pytel commented May 19, 2021 •

edited

Loading