Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(reaper): refactor to allow retries and fix races #2728

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stevenh
Copy link
Collaborator

@stevenh stevenh commented Aug 9, 2024

Refactor how the reaper is created to allow for proper retries of temporary errors such as container not found issues during startup or shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more easily.

Fix race condition in port wait when loading from container by always waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper connection handling which could never recover correctly from a partial read.

Move the reaper creation just before connections are established in compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which means it could have already shutdown before connections are made during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped when the reaper is disabled by excluding session id when reaper is disabled.

Error when creating a reaper when the config says it's disabled so that we avoid hard to debug issues because a reaper is running when it shouldn't be.

@stevenh stevenh requested a review from a team as a code owner August 9, 2024 15:48
Copy link

netlify bot commented Aug 9, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit cf852e8
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/66ed33b7438196000842869d
😎 Deploy Preview https://deploy-preview-2728--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@stevenh stevenh marked this pull request as draft August 9, 2024 15:55
@stevenh
Copy link
Collaborator Author

stevenh commented Aug 9, 2024

Depends on #2738

Copy link
Collaborator

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit on top to satisfy the usage of several missing functions (see comments).

Other than than, I merged the branch with current main and resolved conflicts.

docker.go Show resolved Hide resolved
docker.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 8, 2024

@mdelapenya reminder this is dependent on #2728 which is why it has missing components still.

I'll rebase once #2738 is merged.

@stevenh
Copy link
Collaborator Author

stevenh commented Sep 12, 2024

Rebased and picking up again, hopefully wont require many fixes. Will convert from draft when ready.

@stevenh stevenh force-pushed the fix/reaper-retries-race branch 3 times, most recently from bf961f0 to 7649e0c Compare September 13, 2024 08:34
@stevenh stevenh marked this pull request as ready for review September 13, 2024 13:12
Copy link
Collaborator

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added a few super minor comments, other than that, my only concern is about the implications of these changes with the reuse mode we are proposing in #2768

I'm fine merging this, although I'd like to discuss with you about reuse more in depth, as it could apply to the reaper code as well.

reaper.go Outdated Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper_test.go Show resolved Hide resolved
@stevenh stevenh marked this pull request as draft September 13, 2024 23:16
@mdelapenya
Copy link
Collaborator

@stevenh is there anything I can do to move forward with this PR?

@stevenh stevenh marked this pull request as ready for review September 19, 2024 13:24
@stevenh stevenh marked this pull request as draft September 19, 2024 14:17
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 19, 2024

Blocked on #2786

Refactor how the reaper is created to allow for proper retries of
temporary errors such as container not found issues during startup or
shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at
hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more
easily.

Fix race condition in port wait when loading from container by always
waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper
connection handling which could never recover correctly from a partial
read.

Move the reaper creation just before connections are established in
compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which
means it could have already shutdown before connections are made
during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped
when the reaper is disabled by excluding session id when reaper is
disabled.

Error when creating a reaper when the config says it's disabled so that
we avoid hard to debug issues because a reaper is running when it
shouldn't be.
func GenericLabels() map[string]string {
return core.DefaultLabels(core.SessionID())
}

// AddGenericLabels adds the generic labels to target.
func AddGenericLabels(target map[string]string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leverage the core.MergeCustomLabels(dst, src) function addedin #2775

}

// AddDefaultLabels adds the default labels for sessionID to target.
func AddDefaultLabels(sessionID string, target map[string]string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: #2775 added a very similar method to merge dst and src labels

Comment on lines +30 to +31
if !config.Read().RyukDisabled {
labels[LabelSessionID] = sessionID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not remove the session ID, as this label could be used in the backend to connect test sessions with remote VM executions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants