[Bug]: Version 0.32.0 seems to ignore WithStartupTimeout #2633

KenjiTakahashi · 2024-07-09T12:13:25Z

Testcontainers version

0.32.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host arch

x86

Go version

1.22

Docker version

Docker info

What happened?

After updating to 0.32.0, we started getting errors like this:

can not create container: failed to start container: all exposed ports, [127.0.0.1:35533:35533/tcp], were not mapped in 5s: port 127.0.0.1:35533:35533/tcp is not mapped yet

Our setup code is like this:

req := testcontainers.ContainerRequest{
    Image:        "docker.elastic.co/elasticsearch/elasticsearch:8.14.2",
    ExposedPorts: []string{fmt.Sprintf("127.0.0.1:%[1]d:%[1]d/tcp", port)},
    WaitingFor: wait.ForHTTP("/_cluster/health").
        WithStartupTimeout(6 * time.Minute).
        WithPort(internalPort).
        WithResponseMatcher(func(body io.Reader) bool {
            response, err := io.ReadAll(body)
            if err != nil {
                return false
            }
            return strings.Contains(string(response), "green")
        }),
    AutoRemove: true,
}

container, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
    ContainerRequest: req,
    Started:          true,
})
if err != nil {
    return nil, nil, fmt.Errorf("can not create container: %w", err)
}

We've tried changing the value of WithStartupTimeout, but the error always says 5s.

Version 0.31.0 still works as expected.

Relevant log output

No response

Additional information

No response

The text was updated successfully, but these errors were encountered:

mdelapenya · 2024-07-09T12:20:17Z

It's weird because the wait package has not been modified in this release, and looking at the commits I cannot see any one yet that could affect that part: v0.31.0...v0.32.0

Can you verify if this issue is present with the elasticsearch module too?

I'm in Gophercon, but will double check this later

Thanks for the report!

andsleonardo · 2024-07-09T12:36:07Z

I'm facing the same error after updating to v0.32.0.

perhaps it's coming from these changes made to lifecycle.go? there's a hardcoded max retry interval of 5 seconds there.

EsterfanoLopes · 2024-07-09T13:28:55Z

Yeap, having the same issue here

mdelapenya · 2024-07-11T11:32:13Z

perhaps it's coming from these changes made to lifecycle.go

Indeed, I think you are right and that block is not honoring the wait strategies at all. I did forget the possibility of having other wait strategies with custom timeouts, so that fixed 5s one seems incorrect. Let's use this issue for ideas for the fix: I can do a patch release right after that for the fix

andsleonardo · 2024-07-11T13:09:34Z

@mdelapenya would it make sense to inject the container request, or its wait strategy, into that defaultReadinessHook and use the timeout/deadline details in the backoff/retry? 🤔

stevenh · 2024-08-08T20:28:55Z

Should it be just the default if dockerContainer.WaitingFor is nil?

mdelapenya · 2024-08-21T09:18:49Z

Hi @KenjiTakahashi thanks for opening this issue. I'm trying to reproduce it and, before hitting the bug, I have a few questions:

you are using a fixed port for the elasticsearch's 9200 well-known port. Is there a reason to not used the random ports that testcontainers-go offers?
in the wait strategy, why checking for an internal port? what's the value in your code? Also I think there is no need to wait for 6 minutes for cluster healthiness.
using AutoRemove is deprecated. I'd prefer relying on Ryuk to prune the running containers.

When I remove all of them in a local test, I get elasticsearch is failing with the following error:

{"@timestamp":"2024-08-21T09:06:54.732Z", "log.level":"ERROR", "message":"node validation exception\n[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch. For more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.14/bootstrap-checks.html]\nbootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]; for more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.14/_maximum_map_count_check.html]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log
{"@timestamp":"2024-08-21T09:06:54.737Z", "log.level": "INFO", "message":"stopping ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2024-08-21T09:06:54.763Z", "log.level": "INFO", "message":"stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2024-08-21T09:06:54.763Z", "log.level": "INFO", "message":"closing ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2024-08-21T09:06:54.777Z", "log.level": "INFO", "message":"closed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2024-08-21T09:06:54.780Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.NativeController","elasticsearch.node.name":"c8499dc4c8f9","elasticsearch.cluster.name":"docker-cluster"}

ERROR: Elasticsearch died while starting up, with exit code 78

I verified that the configuration of the container is incorrect, please compare with the current Elasticsearch module we have: https://github.com/testcontainers/testcontainers-go/blob/main/modules/elasticsearch/elasticsearch.go#L43-L67

Is there a reason to not use it? It's already been tuned up by the Elastic folks. The code would just like this:

container, err := elasticsearch.Run(ctx, "docker.elastic.co/elasticsearch/elasticsearch:8.14.2")
if err != nil {
	t.Fatalf("Could not start container: %s", err)
}

Let's address that first, and then we can move on with the startup timeout

KenjiTakahashi · 2024-08-22T15:28:56Z

Well, our internal CI is kinda slow and is effectively a "Docker-in-Docker-in-Docker". Things like RYUK tend to not play well in such setup (I'd argue that it rarely plays well in general, but that's unrelated) and the timeouts have to be long to ensure success (maybe not 6m long, but 🤷).

We do not listen on 9200, we do listen on whatever the internalPort var is pointing to, which is some port that the kernel told us is free to use. It is set up like this for unrelated reasons.

I have left out some of the env config for brevity, but here is the actual one:

Env: map[string]string{
    "discovery.type":                  "single-node",
    "http.publish_host":               "_local_",
    "http.port":                       strconv.Itoa(port),
    "xpack.security.enabled":          "false",
    "xpack.security.http.ssl.enabled": "false",
    "ES_JAVA_OPTS":                    "-Xmx256m",
}

For the error you're getting, did you get a chance to read it? It complains about the value of vm.max_map_count in your kernel being too low. Nothing to do with container configuration.

Sorry, but none of this sounds related to the issue at hand: That WithStartupTimeout stopped doing what it was doing.

mdelapenya · 2024-08-22T16:18:04Z

@KenjiTakahashi I'm sorry if my comment caused confusion. I was trying to prune the errors I found locally while debugging the repro code snippet in this issue, which is what I usually do with all the PRs and issues we receive. So it could be the case I missed the error here. I'd appreciate completeness in the repro code so I can try to reproduce it with more precision 🙏

Going back to the issue, #2691 #2718 could be related to a potential fix, so I'd suggest bumping to the latest release, v0.33.0 if possible. The suggestions in #2633 (comment) could also help. I wonder if you would have time for contributing them.

I'd like to also understand the reasons and use case for you to use a fixed port for the well-known port of Elasticsearch (9200/tcp ?), as Testcontainers advocates against it. I'm more curious about the use case where the internal port is being obtained as a free port from the OS before hand, and used it as part of the same wait strategy for the cluster health. The current Elasticsearch module does it already for you.

zregvart · 2024-08-23T10:00:24Z

We had this error reported with updating to 0.32.0, updating to 0.33.0 fixed the issue for us.

stevenh · 2024-10-04T14:31:53Z

I'm going to close this based on the previous post, if there is still an issue there please reply and we can re-open

KenjiTakahashi added the bug An issue with the library label Jul 9, 2024

zregvart mentioned this issue Jul 10, 2024

build(deps): bump github.com/testcontainers/testcontainers-go from 0.31.0 to 0.32.0 in /acceptance in the all group enterprise-contract/ec-cli#1749

Closed

nikpivkin mentioned this issue Jul 11, 2024

chore(deps): bump github.com/docker/docker from 26.1.3+incompatible to 27.0.3+incompatible aquasecurity/trivy-checks#171

Closed

codeboten mentioned this issue Aug 8, 2024

fix(deps): update module github.com/docker/docker to v27 open-telemetry/opentelemetry-collector-contrib#33757

Closed

1 task

This was referenced Aug 9, 2024

fix(redis): port race #2694

Closed

fix: readiness hook back off #2718

Merged

stevenh closed this as completed Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Version 0.32.0 seems to ignore WithStartupTimeout #2633

[Bug]: Version 0.32.0 seems to ignore WithStartupTimeout #2633

KenjiTakahashi commented Jul 9, 2024

mdelapenya commented Jul 9, 2024

andsleonardo commented Jul 9, 2024

EsterfanoLopes commented Jul 9, 2024

mdelapenya commented Jul 11, 2024

andsleonardo commented Jul 11, 2024 •

edited

Loading

stevenh commented Aug 8, 2024

mdelapenya commented Aug 21, 2024

KenjiTakahashi commented Aug 22, 2024

mdelapenya commented Aug 22, 2024

zregvart commented Aug 23, 2024

stevenh commented Oct 4, 2024

[Bug]: Version 0.32.0 seems to ignore WithStartupTimeout #2633

[Bug]: Version 0.32.0 seems to ignore WithStartupTimeout #2633

Comments

KenjiTakahashi commented Jul 9, 2024

Testcontainers version

Using the latest Testcontainers version?

Host OS

Host arch

Go version

Docker version

Docker info

What happened?

Relevant log output

Additional information

mdelapenya commented Jul 9, 2024

andsleonardo commented Jul 9, 2024

EsterfanoLopes commented Jul 9, 2024

mdelapenya commented Jul 11, 2024

andsleonardo commented Jul 11, 2024 • edited Loading

stevenh commented Aug 8, 2024

mdelapenya commented Aug 21, 2024

KenjiTakahashi commented Aug 22, 2024

mdelapenya commented Aug 22, 2024

zregvart commented Aug 23, 2024

stevenh commented Oct 4, 2024

andsleonardo commented Jul 11, 2024 •

edited

Loading