Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/signalfx] Failing goleak test #30864

Closed
mx-psi opened this issue Jan 30, 2024 · 3 comments · Fixed by #30887 or #31772
Closed

[exporter/signalfx] Failing goleak test #30864

mx-psi opened this issue Jan 30, 2024 · 3 comments · Fixed by #30887 or #31772
Labels
bug Something isn't working ci-cd CI, CD, testing, build issues exporter/signalfx priority:p1 High

Comments

@mx-psi
Copy link
Member

mx-psi commented Jan 30, 2024

Component(s)

exporter/signalfx

What happened?

Description

The signalfx exporter is failing goleak tests with

goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 9 in state chan send, with github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/requests.(*ReqSender).Send on top of the stack:
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/requests.(*ReqSender).Send(0xc00007c660, 0xc0001cca00)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/requests/sender.go:52 +0x12c
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations.(*Client).makeRequest(0xc000411e10, 0xc000036690)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations/client.go:334 +0x14a8
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations.(*Client).processChan(0xc000411e10)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations/client.go:355 +0x39a
created by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations.(*Client).Start in goroutine 7
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/apm/correlations/client.go:387 +0xa6
]

job log

Collector version

816671e

Additional context

Started happening after #30457 cc @crobert-1

@mx-psi mx-psi added bug Something isn't working needs triage New item requiring triage priority:p1 High ci-cd CI, CD, testing, build issues exporter/signalfx and removed needs triage New item requiring triage labels Jan 30, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for exporter/signalfx: @dmitryax @crobert-1. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

I'm investigating but we can skip the check for now.

cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Feb 1, 2024
mx-psi pushed a commit that referenced this issue Mar 1, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
`goleak` was detecting leaking goroutines in tests, this attempts to
resolve. I found what appeared to be a couple races but can't reproduce
locally so I'll run CI a few times to ensure this works as expected.

Changes in PR:
1. Add correlation client Shutdown function that blocks on the
waitgroup. This is the main fix of this PR that should fix the leaking
goroutines.
2. Re-organize the shutdown process of the apm client correlation test
suite to properly synchronize the shutting down process.
3. Fix typo
4. Add goleak checks to exporter/signalfx/internal/correlation` and
`exporter/signalfx/internal/apm/correlations`

**Link to tracking Issue:** <Issue number if applicable>
Resolves #30864
#30438

**Testing:** <Describe what testing was performed and which tests were
added.>
All existing and added tests should be passing. Since this has only
failed in CI I'm going to try to run it a few times before marking as
ready for review.
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
`goleak` was detecting leaking goroutines in tests, this attempts to
resolve. I found what appeared to be a couple races but can't reproduce
locally so I'll run CI a few times to ensure this works as expected.

Changes in PR:
1. Add correlation client Shutdown function that blocks on the
waitgroup. This is the main fix of this PR that should fix the leaking
goroutines.
2. Re-organize the shutdown process of the apm client correlation test
suite to properly synchronize the shutting down process.
3. Fix typo
4. Add goleak checks to exporter/signalfx/internal/correlation` and
`exporter/signalfx/internal/apm/correlations`

**Link to tracking Issue:** <Issue number if applicable>
Resolves open-telemetry#30864
open-telemetry#30438

**Testing:** <Describe what testing was performed and which tests were
added.>
All existing and added tests should be passing. Since this has only
failed in CI I'm going to try to run it a few times before marking as
ready for review.
mx-psi pushed a commit that referenced this issue Mar 25, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Changes in PR:

1. Add correlation client Shutdown function that blocks on the
waitgroup. This is the main fix of this PR that should fix the leaking
goroutines.
2. Re-organize the shutdown process of the apm client correlation test
suite to properly synchronize the shutting down process.
3. Fix typo
4. Only block request sender until context is cancelled. The request
processor is shutdown when the context is cancelled, so this would
result in `Shutdown` waiting forever, since the request would never be
processed.
5. Enable goleak in some more packages.

**Note**: This is contains the exact same contents as
#30887,
but change number 4 is new, and should resolve the test issue the
original PR was causing.

**Link to tracking Issue:**
Resolves
#30864

#30438

**Testing:** <Describe what testing was performed and which tests were
added.>
All existing tests are passing, as well as added goleak checks. I'm
going to run this a number of times to try to help ensure it's not flaky
anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci-cd CI, CD, testing, build issues exporter/signalfx priority:p1 High
Projects
None yet
2 participants