Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sumologicextension): End the heartbeatloop immediately on shutdown using the parent context #32931

Closed

Conversation

rnishtala-sumo
Copy link
Contributor

Description:

  • Cancel the heartbeatloop immediately on shutdown using the parent context. The reason for a flaky test could be because the collector is started and shutdown twice in a test. When this happens the last heartbeat from the previous execution could leak in as the first request from the next collection execution

  • Restart the test server to ignore any possible heartbeats from a previous heartbeatloop that was shutdown

Link to tracking Issue: #32785

Testing: Unit tests and manual testing of heartbeats

Comment on lines +568 to 570
// When the context is done ...
<-ctx.Done()
// ... cancel the ongoing heartbeat request.
cancel()
Copy link
Member

@crobert-1 crobert-1 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this solution the way it's currently written. The goroutine won't get past <-ctx.Done() until cancel has already been called. What's added value of calling it twice (once here, and once in the above defer?

What's the purpose of the goroutine here?

Copy link
Contributor Author

@rnishtala-sumo rnishtala-sumo May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove defer cancel(). Essentially what I'm trying to do here is pass the parent context into the heartbeat loop so that when the component stops the heartbeats stop immediately.

if se.registrationInfo.CollectorCredentialID == "" || se.registrationInfo.CollectorCredentialKey == "" {
se.logger.Error("Collector not registered, cannot send heartbeat")
return
}

ctx, cancel := context.WithCancel(context.Background())
ctx, cancel := context.WithCancel(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For long running background processes started during Start, the spec requires using context.Background() instead of the passed in context, as explained here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for sharing this. I wanted the long running operation (heartbeat) to be cancelled immediately on calling shutdown (hence the parent context). The reason for this is because I think there's some delay before the heartbeats are cancelled on component shutdown, that's causing this flaky test issue

Copy link
Contributor Author

@rnishtala-sumo rnishtala-sumo May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that perhaps this isn't the best way to do this. Maybe we're better off removing the test until the root cause is confirmed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR- #32937 removes the flaky test.

@rnishtala-sumo rnishtala-sumo marked this pull request as draft May 7, 2024 23:39
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 22, 2024
Copy link
Contributor

github-actions bot commented Jun 6, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants