Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/k8sattribute] support attribute k8s.deployment.uid #14003

Conversation

fatsheep9146
Copy link
Contributor

Signed-off-by: Ziqi Zhao zhaoziqi9146@gmail.com

fix #12936

@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from 5955b90 to fbe2f33 Compare September 11, 2022 04:11
@fatsheep9146 fatsheep9146 marked this pull request as ready for review September 11, 2022 04:13
@fatsheep9146 fatsheep9146 requested a review from a team September 11, 2022 04:13
@fatsheep9146
Copy link
Contributor Author

@dmitryax The pr is ready for review.

@djaglowski djaglowski added the processor/k8sattributes k8s Attributes processor label Sep 19, 2022
@fatsheep9146
Copy link
Contributor Author

ping @dmitryax, could you help review this?

Copy link
Contributor

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe instead of going this route, we should discover ReplicaSets and get rid of Deployment regex?

The idea behind this, is that regex can match things that are not managed by Deployment, and we would still set Deployment name incorrectly.

So I think we should instead discover ReplicaSets and parse the Pod's Owner References to find the ReplicaSet. Example Pod Owner references:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app.kubernetes.io/instance: opentelemetry-collector
    app.kubernetes.io/name: opentelemetry-collector
    component: standalone-collector
    pod-template-hash: 6c45f8d6f6
  name: opentelemetry-collector-6c45f8d6f6-s4jq4
  ownerReferences:                                                  
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true 
    kind: ReplicaSet
    name: opentelemetry-collector-6c45f8d6f6
    uid: ee75293d-14ec-42a0-9548-a768d9e07c48

And from ReplicaSet's Owner Reference we can get Deployment name, and id:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
 labels:
   app.kubernetes.io/instance: opentelemetry-collector
   app.kubernetes.io/name: opentelemetry-collector
   component: standalone-collector
   pod-template-hash: 6c45f8d6f6
 name: opentelemetry-collector-6c45f8d6f6
 namespace: sys-mon
 ownerReferences:
 - apiVersion: apps/v1
   blockOwnerDeletion: true
   controller: true
   kind: Deployment
   name: opentelemetry-collector
   uid: f875e269-0e76-4735-a02d-a254ed056111

@fatsheep9146
Copy link
Contributor Author

I think maybe instead of going this route, we should discover ReplicaSets and get rid of Deployment regex?

The idea behind this, is that regex can match things that are not managed by Deployment, and we would still set Deployment name incorrectly.

So I think we should instead discover ReplicaSets and parse the Pod's Owner References to find the ReplicaSet. Example Pod Owner references:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app.kubernetes.io/instance: opentelemetry-collector
    app.kubernetes.io/name: opentelemetry-collector
    component: standalone-collector
    pod-template-hash: 6c45f8d6f6
  name: opentelemetry-collector-6c45f8d6f6-s4jq4
  ownerReferences:                                                  
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true 
    kind: ReplicaSet
    name: opentelemetry-collector-6c45f8d6f6
    uid: ee75293d-14ec-42a0-9548-a768d9e07c48

And from ReplicaSet's Owner Reference we can get Deployment name, and id:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
 labels:
   app.kubernetes.io/instance: opentelemetry-collector
   app.kubernetes.io/name: opentelemetry-collector
   component: standalone-collector
   pod-template-hash: 6c45f8d6f6
 name: opentelemetry-collector-6c45f8d6f6
 namespace: sys-mon
 ownerReferences:
 - apiVersion: apps/v1
   blockOwnerDeletion: true
   controller: true
   kind: Deployment
   name: opentelemetry-collector
   uid: f875e269-0e76-4735-a02d-a254ed056111

I think this make senses, and it is the most accurate way to implement this, I'll change like this.
@povilasv @dmitryax

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 11, 2022
@fatsheep9146
Copy link
Contributor Author

I will start fixing this

@github-actions github-actions bot removed the Stale label Oct 12, 2022
@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from fbe2f33 to 83a3e0d Compare October 12, 2022 05:59
@fatsheep9146
Copy link
Contributor Author

@povilasv I already changed to replicaset, could you help review about this again?

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, just a few comments. Nice work with tracking the relationship between Pods and Deployments using ReplicaSets, I think that's definitely an improvement.

processor/k8sattributesprocessor/internal/kube/client.go Outdated Show resolved Hide resolved
processor/k8sattributesprocessor/internal/kube/client.go Outdated Show resolved Hide resolved
processor/k8sattributesprocessor/internal/kube/kube.go Outdated Show resolved Hide resolved
@fatsheep9146
Copy link
Contributor Author

could you help review this again? @evan-bradley

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, just a few nitpicks mainly around spelling.

@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from e11da04 to bf5afeb Compare October 18, 2022 22:56
@fatsheep9146
Copy link
Contributor Author

could you help review this again? @evan-bradley

@fatsheep9146
Copy link
Contributor Author

fatsheep9146 commented Oct 19, 2022

could you help review this again? @evan-bradley @povilasv @dmitryax

Copy link
Contributor

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that LGTM, but I am not a maintainer :)

processor/k8sattributesprocessor/internal/kube/client.go Outdated Show resolved Hide resolved
@fatsheep9146
Copy link
Contributor Author

Other than that LGTM, but I am not a maintainer :)

Any reviews are all welcome, thank you. =D

@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from 44e5ab9 to 88308e1 Compare October 24, 2022 11:08
@fatsheep9146
Copy link
Contributor Author

Could you help review this again? @evan-bradley

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Great find with the controller flag in the owner references.

@fatsheep9146
Copy link
Contributor Author

fatsheep9146 commented Oct 25, 2022

Do you have time to review this pr ? @dmitryax

@dmitryax
Copy link
Member

Sorry for the delay. Given that we got some significant changes to this processor that introduces bugs, I'm a bit skeptical about rushing to merge this one.

We discussed with @povilasv plans about introducing e2e tests for k8s related functionality of the collector. Can we delay this PR until the tests are introduced? Let me know if you want to help with that.

@dmitryax
Copy link
Member

dmitryax commented Oct 25, 2022

@fatsheep9146 I filed an issue to start with #15651

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@fatsheep9146 fatsheep9146 added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Nov 10, 2022
@swiatekm
Copy link
Contributor

@dmitryax @fatsheep9146 looks like we now have E2E tests for the k8sattributes processor, so can we revisit this PR?

Having skimmed the code, I have two concerns about the current change:

  • the Replicaset informer is enabled unconditionally, so users who don't need it will pay the cost regardless
  • the WatchClient is growing a bit large - we might want to consider moving the responsibility of managing the owner data into a separate component

I don't think these are obstacles to merging this though.

@fatsheep9146
Copy link
Contributor Author

@dmitryax @fatsheep9146 looks like we now have E2E tests for the k8sattributes processor, so can we revisit this PR?

Having skimmed the code, I have two concerns about the current change:

  • the Replicaset informer is enabled unconditionally, so users who don't need it will pay the cost regardless
  • the WatchClient is growing a bit large - we might want to consider moving the responsibility of managing the owner data into a separate component

I don't think these are obstacles to merging this though.

Yeah, you are right, I'will push this pr to be merged in these few days. @swiatekm-sumo

@dmitryax
Copy link
Member

the Replicaset informer is enabled unconditionally, so users who don't need it will pay the cost regardless

I'd say at least this one should be addressed in this PR. We should avoid putting extra pressure on k8s API and compute resources for the functionality that isn't being used

@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from e817a19 to 07504e7 Compare April 12, 2023 05:38
@runforesight
Copy link

runforesight bot commented Apr 12, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(5 seconds) has decreased 30 minutes 41 seconds compared to main branch avg(30 minutes 46 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 5 seconds (30 minutes 41 seconds less than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  check-links workflow has finished in 42 seconds and finished at 12th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 1 minute 2 seconds and finished at 12th Apr, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 32 seconds (2 minutes 54 seconds less than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  N/A See Details

✅  load-tests workflow has finished in 6 minutes 49 seconds (3 minutes 40 seconds less than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
loadtest (TestIdleMode) -     🔗  N/A See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  N/A See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  N/A See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  N/A See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  N/A See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  N/A See Details
loadtest (TestTraceAttributesProcessor) -     🔗  N/A See Details

✅  changelog workflow has finished in 7 minutes 36 seconds (⚠️ 5 minutes 20 seconds more than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

❌  e2e-tests workflow has finished in 10 minutes 38 seconds (3 minutes 27 seconds less than main branch avg.) and finished at 12th Apr, 2023. 1 job failed.


Job Failed Steps Tests
kubernetes-test (v1.26.0) run e2e tests     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

❌  build-and-test workflow has finished in 36 minutes 20 seconds (10 minutes 28 seconds less than main branch avg.) and finished at 12th Apr, 2023. 2 jobs failed.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
govulncheck -     🔗  N/A See Details
check-collector-module-version Check Collector Module Version     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
checks Check gendependabot     🔗  N/A See Details
unittest-matrix (1.20, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.20, processor) -     🔗  N/A See Details
unittest-matrix (1.20, exporter) -     🔗  N/A See Details
unittest-matrix (1.20, extension) -     🔗  N/A See Details
unittest-matrix (1.20, connector) -     🔗  N/A See Details
unittest-matrix (1.20, internal) -     🔗  N/A See Details
unittest-matrix (1.20, other) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.19, processor) -     🔗  N/A See Details
unittest-matrix (1.19, exporter) -     🔗  N/A See Details
unittest-matrix (1.19, extension) -     🔗  N/A See Details
unittest-matrix (1.19, connector) -     🔗  N/A See Details
unittest-matrix (1.19, internal) -     🔗  N/A See Details
unittest-matrix (1.19, other) -     🔗  N/A See Details
correctness-metrics -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
integration-tests -     🔗  N/A See Details
correctness-traces -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
lint -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
rotate-milestone -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

@fatsheep9146
Copy link
Contributor Author

fatsheep9146 commented Apr 12, 2023

the Replicaset informer is enabled unconditionally, so users who don't need it will pay the cost regardless

I'd say at least this one should be addressed in this PR. We should avoid putting extra pressure on k8s API and compute resources for the functionality that isn't being used

I already solved this one.

About another improvement, I think I can submit another pr, since this one already has many modifications.

@dmitryax @swiatekm-sumo

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fatsheep9146
Copy link
Contributor Author

@swiatekm-sumo Thx! The failed test are still needed to be resolved, I'm already doing this.

@fatsheep9146
Copy link
Contributor Author

@dmitryax The pr is ready to be reviewed.
The failed test is not related to this pr.

@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from d64f9d7 to edd9da1 Compare April 23, 2023 04:45
@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from 848b710 to b915c18 Compare May 15, 2023 14:21
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise LGTM

processor/k8sattributesprocessor/internal/kube/kube.go Outdated Show resolved Hide resolved
@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from b915c18 to 8fd3dfb Compare May 16, 2023 01:47
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
@fatsheep9146 fatsheep9146 force-pushed the k8sattributeprocessor-support-deployment-uid branch from 2bbd9b5 to a6539c6 Compare May 16, 2023 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed processor/k8sattributes k8s Attributes processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processor/k8sattributes] k8s.deployment.name attribute can be set to a wrong value
7 participants