Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(opentelemetry-node): add cloud/container resource detectors #214

Merged
merged 19 commits into from
Jun 11, 2024

Conversation

david-luna
Copy link
Member

Closes #128

This PR adds the container and the cloud resource detectors (alibaba, aws, azure & gcp). By default all are included unless the env var OTEL_NODE_RESOURCE_DETECTORS is set. I that case the distro includes according the value set in the environment. Some highlights:

  • resolution of which detectors is moved to a separate module instead like it was done for instrumentations
  • env var affects to all detectors. So if uses sets a list of detectors our distro detector may be excluded.
  • looking at @opentelemetry/auto-instrumentations-node there are a couple of detectors not related to cloud/containers left to add

Question
This detectors resolution gives more priority to setup via code rather than the environment setup via environment vars. I wonder if that's the behaviour we want 🤔

Vanilla SDK seems to have this behaviour too (link) and its @opentelemetry/auto-instrumentations-node the one looking up to env var OTEL_NODE_RESOURCE_DETECTORS

@david-luna david-luna requested a review from trentm June 6, 2024 10:47
@david-luna
Copy link
Member Author

david-luna commented Jun 6, 2024

Tests are failing since some detectors generate spans. An example is HTTP spans being generated by GCP detector

node -r @elastic/opentelemetry-node ./test/fixtures/use-http-server.js

------ trace 350a53 (2 spans) ------
       span 1edd33 "GET" (10.9ms, STATUS_CODE_ERROR, SPAN_KIND_CLIENT, GET http://127.0.0.1:56695/ -> 404)
  +6ms `- span 059043 "GET" (1.9ms, SPAN_KIND_SERVER, GET http://127.0.0.1:56695/ -> 404)
------ trace 68c04e (1 span) ------
       span 190981 "GET" (6.3ms, STATUS_CODE_ERROR, SPAN_KIND_CLIENT, GET http://metadata.google.internal./computeMetadata/v1/instance)
------ trace f32a41 (2 spans) ------
       span 8af26e "POST" (3.3ms, SPAN_KIND_CLIENT, POST http://127.0.0.1:56695/echo -> 200)
  +2ms `- span 2df652 "POST" (0.8ms, SPAN_KIND_SERVER, POST http://127.0.0.1:56695/echo -> 200)
------ trace 3230c3 (1 span) ------
       span b94337 "GET" (3004.0ms, STATUS_CODE_ERROR, SPAN_KIND_CLIENT, GET http://169.254.169.254/computeMetadata/v1/instance)

ba8681a adds a proposal to filter out these spans when testing. To be discussed

@david-luna
Copy link
Member Author

david-luna commented Jun 6, 2024

As discussed in today's meeting with @trentm

  • distro resource detector is added always
  • check auto-instrumentations if GCP spans are also created and create an issue for that
  • filter out the GCP spans from the sortedSpans getter in tests
  • change those warnings from AWS & container detectors in upstream

@trentm
Copy link
Member

trentm commented Jun 7, 2024

@david-luna This is still draft. I wasn't sure if this was ready for review yet.

@david-luna
Copy link
Member Author

@david-luna This is still draft. I wasn't sure if this was ready for review yet.

There was a linting issue happening in CI but not in local so it wasn't ready yet. Last commit contains a fix for gen:types npm script.

@david-luna david-luna marked this pull request as ready for review June 10, 2024 08:39
@david-luna
Copy link
Member Author

Note on the lint failure.

npm ls --all

...
 ├─┬ @opentelemetry/resource-detector-gcp@0.29.9
  │ ├── @opentelemetry/api@1.8.0 deduped
  │ ├── @opentelemetry/core@1.24.1 deduped
  │ ├── @opentelemetry/resources@1.24.1 deduped
  │ ├── @opentelemetry/semantic-conventions@1.24.1 deduped
  │ └─┬ gcp-metadata@6.1.0
  │   ├─┬ gaxios@6.6.0
  │   │ ├── extend@3.0.2
  │   │ ├─┬ https-proxy-agent@7.0.4
  │   │ │ ├─┬ agent-base@7.1.1
  │   │ │ │ └── debug@4.3.4 deduped
  │   │ │ └── debug@4.3.4 deduped
  │   │ ├── is-stream@2.0.1 deduped
  │   │ ├─┬ node-fetch@2.7.0
  │   │ │ ├── UNMET OPTIONAL DEPENDENCY encoding@^0.1.0
  │   │ │ └─┬ whatwg-url@5.0.0
  │   │ │   ├── tr46@0.0.3.       <----- THIS VERSION OF `tr46` HAS NOT LICENSE FILE
  │   │ │   └── webidl-conversions@3.0.1
  │   │ └── uuid@9.0.1
  │   └─┬ json-bigint@1.0.0
  │     └── bignumber.js@9.1.2
 ...

Linting if failing because of a missing license file, was added in https://github.com/jsdom/tr46/tree/v1.0.0. The package.json file has the field "license" set correctly but I'm not sure if we should add a fallback to that field if the file is missing.

Any thoughts @trentm ?

@trentm
Copy link
Member

trentm commented Jun 10, 2024

The package.json file has the field "license" set correctly but I'm not sure if we should add a fallback to that field if the file is missing.

I think it is fine to add a fallback in the lint script for this. The package.json "license" field says MIT. The very next commit on the tr46 repo adds the license file that is then included in subsequent published versions.

I think it would be fine to try a PR on node-fetch to update their whatwg-url dep, but I don't think that needs to block us here.

@david-luna
Copy link
Member Author

@trentm I just realized ./script/gen-notice.sh already has a fallback mechanism for packages that do not have a license file. I've added a new entry for this fallback in 6888419

Comment on lines 104 to 114
const detectorsFromEnv =
process.env['OTEL_NODE_RESOURCE_DETECTORS'] || 'all';

if (detectorsFromEnv === 'none') {
return [];
}

const detectorKeys =
detectorsFromEnv === 'all'
? Object.keys(defaultDetectors)
: detectorsFromEnv.split(',');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is slightly difference in auto-instr-node: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/dadf30828e7008df7b85c8bd7eaacb90770aac5c/metapackages/auto-instrumentations-node/src/utils.ts#L235-L244

E.g. OTEL_NODE_RESOURCE_DETECTORS=host,os,none with a "none" in there the behaviour differs. Also if 'all' is one of the values in there.

Personally, I think it would be nice to .trim() each value so OTEL_NODE_RESOURCE_DETECTORS=host, os works. auto-instr-node does trim in its handling of process.env.OTEL_NODE_ENABLED_INSTRUMENTATIONS, FWIW.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied your suggestions. Thanks :)

packages/opentelemetry-node/lib/detectors.js Outdated Show resolved Hide resolved
@@ -76,6 +76,7 @@ const testFixtures = [
// ------ trace b8467d (2 spans) ------
// span bc8a2c "POST" (3.8ms, SPAN_KIND_CLIENT, POST http://127.0.0.1:64972/echo -> 200)
// +2ms `- span 4e7adf "POST" (1.1ms, SPAN_KIND_SERVER, POST http://127.0.0.1:64972/echo -> 200)
// const spans = col.sortedSpans.slice(RESOURCE_DETECTOR_SPAN_COUNT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I've missed that

packages/opentelemetry-node/test/testutils.js Outdated Show resolved Hide resolved
Co-authored-by: Trent Mick <trent.mick@elastic.co>
@david-luna david-luna requested a review from trentm June 11, 2024 09:21
@david-luna
Copy link
Member Author

david-luna commented Jun 11, 2024

@trentm I've added some tests since we can inspect the warnings when passing bogus values. With this I've realized the SDK was logging some errors that shouldn't be logged.

I'm hesitant if wee need to add more tests for other scenarios like the behaviour when the value all or none is present in the list. WDYT?

// hostDetectorSync is not currently in the OTel default, but may be added
hostDetectorSync,
],
resourceDetectors: resolveDetectors(opts.resourceDetectors),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR. I wonder if eventually we may want to export something similar to getInstrumentations() that enables a user to user our SDK and get the default set of detectors plus one or more custom detectors. Or perhaps this is something that is more generally solved with some concept of SDK plugins or extensions as briefly discussed on the OTel JS SIG.

Copy link
Member Author

@david-luna david-luna Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say if the number of detectors grow enough we will have a case for that API to be exported. We can discuss it and create an issue if we agree it's necessary

@trentm
Copy link
Member

trentm commented Jun 11, 2024

I'm hesitant if wee need to add more tests for other scenarios like the behaviour when the value all or none is present in the list. WDYT?

I think we are fine without the tests. Your call if you'd like to add more.

@david-luna david-luna merged commit f89bf32 into main Jun 11, 2024
12 checks passed
@david-luna david-luna deleted the dluna/128-cloud-resource-detectors branch June 11, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add cloud/container resource detectors by default
2 participants