Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lots of random failures #2664

Closed
wants to merge 58 commits into from

Conversation

stevenh
Copy link
Collaborator

@stevenh stevenh commented Jul 20, 2024

This is a combination PR of a large amount of fixes.

These are all in one branch for now so we can see if there are any more tests which don't succeed first time.

DO NOT MERGE!

stevenh added 4 commits July 19, 2024 19:48
Ensure build errors are returned to the caller when PrintBuildLog is not
set to true.

DisplayJSONMessagesStream and JSONMessage.Display  performs processing of
the stream to catch command failures, so needs to always be called.
Fix port forwarding race condition by waiting for the exposed port which
was causing random test failures.

Make errors and parameters more consumable while there.
Add a test to cover the retry on system error test.

Update test to use empty context to prevent knock on failures being
logged confusing the output and check the error returned.
Fix racy creation of free ports in port forwarding tests which were
causing random failures.
Copy link

netlify bot commented Jul 20, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 129f047
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/66bbc9e5c45df60008d17770
😎 Deploy Preview https://deploy-preview-2664--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

stevenh added 16 commits July 20, 2024 22:31
Fix test for image label setting which could fail if the volume already
existed.
Expose the POP3 port and wait for all ports which fixes the race
condition in the tests.
Add the ability to skip the internal host port check when performing as
HostPortStrategy check. This is useful when a container does bind to the
internal port until additional conditions are met.
Use ForExposedPortOnly to wait for ports to be bound externally only
before trying to get the mapped port. This fixes a race condition in the
setup of the container which was causing random test failures.
Fix race in container setup caused by copying the files into the
container manually.

While here update tests to use require so they are easier to follow.
Fix race condition on port availability by switching to port instead of
log check.
Wait for exposed ports instead of log to address race on start up.
Fix the port race on start up by explicitly waiting for the port to be
exposed. Once this is successful run the wait, this prevents the wait
always being run due to way hooks are processed.
Fix being able to stop and restart a container by moving the stop
processing from PreTerminates to PostStops hooks.

Fix log consumer being duplicated when restarting.

Fix race condition in log consumer test, cleaning up to flow to validate
correctly.
Fix the data race on access to health in TestWaitForHealthSucceeds.
Check and return errors in the process of determining authentication
configs so that unexpected failures don't occur latter in the process
of build an image or creating a container.

BuildOptions will now return an empty result on error instead of an
incomplete one, to ensure that consumers don't use partial data.

Fix builds with different config or override environments failing when
the authentication configuration changes, which was introduced by #2646.

Report errors from GetRegistryCredentials calls to avoid unexpected
failures latter on in the authentication process.

Split out the functionality to read a Dockerfile from an io.Reader into
ExtractImagesFromReader, as required when processing from a tar archive.

Deprecated function ContainerRequest.GetAuthConfigs will now panic if
an error occurs, so that callers understand that an failure occured.

BREAKING CHANGE Add support for determining the required authentication
in when building an image from a ContextArchive, this requires ContextArchive
support io.Seeker.
Disable test retries so we can see if there are any remaining tests that
don't pass first time.
Fix mongodb replica test failures due to client using the internal
container addresses when direct connection is not requested.

Fix invalid use of log.Fatal instead of tt.Fatal which was causing
failures to no be output correctly due to os.Exit.
Use local IP instead of localhost to access registry container
to address failures under WSL where the docker daemon can't
access localhost.

Refactor tests to use helpers and require reducing code and
simplifying the flow.

Add a default image which can be used be consumers.

Add HostAddress which returns host : port compatible with
WSL.

Add SetDockerAuthConfig and DockerAuthConfig methods that can
be used to easily configure authentication via DOCKER_AUTH_CONFIG
environment variable.

Fix container clean up which was missed in a number of case.
Fix docker auth tests to run on WSL by using the local IP instead
of localhost, which can be unreliable.

While there refactor the tests to use a helper instead of hand coding
each environment setup.

Remove duplicate logging about terminating container as default hooks
already provide this.
Add a missing error check in the mysql example for connection setup.
@stevenh stevenh force-pushed the fix/random-test-failures branch 2 times, most recently from f654a84 to e314320 Compare July 20, 2024 22:25
stevenh added 7 commits July 21, 2024 16:37
Improve the lifecycle errors with wrapping so that its easier to
determine the cause of an error.
Fix port race on start up of valkey containers by waiting for the port
instead of the log entry.
Remove the obsolete version field from all docker compose files to stop
it warning.
Silence the warning about being unable to check internal port by
switching to ForExposedPortOnly.
Disable build log so its easier to see where failures occur.
Remove test volumes so that we don't leave orphaned resources after
tests have run.

Remove really unused code as there is no need for it.

Add missing error checks on pipe setup.
Improve the documentation for container Terminate and IsRunning methods.
Switch registry containers to wait for http response to ensure the
container is ready to serve requests, which was causing random test
failures.

Also remove unnecessary context creation.
@stevenh stevenh force-pushed the fix/random-test-failures branch 2 times, most recently from adfc528 to a24f745 Compare August 1, 2024 08:21
@kiview
Copy link
Member

kiview commented Aug 1, 2024

Alright, I just want to make sure that we are aligned on an approach to merge small and atomic changes, so your effort is not invested in vain.

It might be good to sync with @mdelapenya, since we also have a bigger v1 branch in the works with substantial changes, so I we might run into more merge problems down the road, if we don't time things appropriately.

@stevenh stevenh force-pushed the fix/random-test-failures branch from a24f745 to 00d6b27 Compare August 1, 2024 12:46
@stevenh
Copy link
Collaborator Author

stevenh commented Aug 1, 2024

Alright, I just want to make sure that we are aligned on an approach to merge small and atomic changes, so your effort is not invested in vain.

It might be good to sync with @mdelapenya, since we also have a bigger v1 branch in the works with substantial changes, so I we might run into more merge problems down the road, if we don't time things appropriately.

Thanks for the heads up, I already kicked off a conversation in slack, which we'll pick up when he's available.

Would love to be part of the v1 discussion as there's a few things I've seen that could really benefit from some breaking changes.

stevenh added 2 commits August 1, 2024 15:05
Wait for the HTTP port to be available to prevent random failures when
the container isn't fully started and returns 503 errors.
Wait for the admin interface to response to HTTP to avoid failures in
configuring the instance when its not fully ready.
@stevenh stevenh force-pushed the fix/random-test-failures branch from 00d6b27 to ff35f85 Compare August 1, 2024 14:10
@stevenh
Copy link
Collaborator Author

stevenh commented Aug 1, 2024

Every test passing first time!!!

@webstradev
Copy link

webstradev commented Aug 3, 2024

I can say, some of the fixes here will make the use of testcontainers a LOT better for us, we use it a lot in our CI and run into a lot of failures a day causing pretty big delays for us. My finger is numb from cliking retry in gitlab.

I've tried a few of our smaller pipelines with this version and I'm not seeing any errors yet.

@mdelapenya
Copy link
Member

WOW, thanks for this! I wonder if we could take small chunks of it in several PRs so that 1) the review process take less time, 2) the review is simpler and 3) the merge with v1 is simpler too.

Do you think we can coordinate and work in separating concerns? I'm more than happy to help in that.

Add output of the docker config if we hit rate limiting to aid in
debugging why this is happening on github actions.
@stevenh stevenh force-pushed the fix/random-test-failures branch from f9b3996 to fc7eaab Compare August 13, 2024 20:57
Updating comment to see if we can unstick github action which is sat
pending:
test (1.x, ubuntu-latest) / ./ubuntu-latest/1.x Expected — Waiting for status to be reported
@mdelapenya
Copy link
Member

@stevenh do you think we can close this one? We addressed it in small bits

@stevenh
Copy link
Collaborator Author

stevenh commented Oct 18, 2024

It's just waiting on the reaper PR

@mdelapenya
Copy link
Member

Given the reaper PR was already merged, I think we can close this one.

Please reopen it if you consider there is more work to do here 🙏

@mdelapenya mdelapenya closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants