Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection refused right after container start #71

Open
seregamorph opened this issue Nov 25, 2021 · 15 comments
Open

Connection refused right after container start #71

seregamorph opened this issue Nov 25, 2021 · 15 comments
Labels
enhancement New feature or request
Milestone

Comments

@seregamorph
Copy link

seregamorph commented Nov 25, 2021

There is a problem with migration from Docker Desktop to Colima docker environment.
When the container is just started, it's mapped port is not available immediately.
I'm not quite sure about all what happens underneath (is it an issue of Lima/Colima/jna/docker-java), but let me give my assumption, the demo is provided as well.

We use testcontainers, a popular framework that bootstraps docker containers for test purposes, in our case it's Postgres. We use image postgres:11.6-alpine.
When the container starts, it allows to retrieve mapped port and right after that the connection is set up.
With Docker desktop it works just fine, but with colima the retry-with-interval workaround is required. Default logic was not expecting this, it was just working fine for years.

My assumption is that the logic that retrieves mapped port is not blocked by startup await, while in Docker desktop there is a strong guarantee about it.

I've created a demo project that shows the issue.
If you start with Docker Desktop, you will see log:

...testcontainers initialization without issues...
...
jdbc:postgresql://localhost:55001/test?loggerLevel=OFF
>>>> Successfully first time

If you start the same test with Colima, you will see log:

...testcontainers initialization with many retry failures (bootstraps service Ryuk container first)
...
jdbc:postgresql://localhost:49154/test?loggerLevel=OFF
>>>> First exception: org.postgresql.util.PSQLException: Connection to localhost:49154 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
...sleep here...
>>>> Successfully second time

The mapped port is obtained in internal logic in this line:

String jdbcUrl = container.getJdbcUrl();

Worth to mention, that on Apple M1 this failure is more explicit (probably because it's faster), with Intel I see only testcontainers Ryuk failures (most probably they added this retry policy to cover the described issue), but still there is an obvious problem.

Let me know if I can provide more details or more convenient example (not sure that Java example is good enough for you).

@seregamorph
Copy link
Author

The full log for Colima execution:

[main] INFO org.testcontainers.dockerclient.DockerClientProviderStrategy - Loaded org.testcontainers.dockerclient.UnixSocketClientProviderStrategy from ~/.testcontainers.properties, will try it first
[main] INFO org.testcontainers.dockerclient.DockerClientProviderStrategy - Found Docker environment with local Unix socket (unix:///var/run/docker.sock)
[main] INFO org.testcontainers.DockerClientFactory - Docker host IP address is localhost
[main] INFO org.testcontainers.DockerClientFactory - Connected to docker: 
  Server Version: 20.10.7
  API Version: 1.41
  Operating System: Ubuntu 21.04
  Total Memory: 1966 MB
[main] INFO org.testcontainers.utility.ImageNameSubstitutor - Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
	at java.base/java.lang.Thread.run(Thread.java:829)
[main] INFO org.testcontainers.DockerClientFactory - Ryuk started - will monitor and terminate Testcontainers containers on JVM exit
[main] INFO org.testcontainers.DockerClientFactory - Checking the system...
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker server version should be at least 1.6.0
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker environment should have more than 2GB free disk space
[main] INFO 🐳 [postgres:11.6-alpine] - Creating container for image: postgres:11.6-alpine
[main] INFO 🐳 [postgres:11.6-alpine] - Starting container with ID: 879500d74a3efe810293edf00c890507d7a7818964d040adcab280bc64b41855
[main] INFO 🐳 [postgres:11.6-alpine] - Container postgres:11.6-alpine is starting: 879500d74a3efe810293edf00c890507d7a7818964d040adcab280bc64b41855
[main] INFO 🐳 [postgres:11.6-alpine] - Container postgres:11.6-alpine started in PT1.299692S
jdbc:postgresql://localhost:49154/test?loggerLevel=OFF



>>>> First exception: org.postgresql.util.PSQLException: Connection to localhost:49154 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
>>>> Successfully second time

Process finished with exit code 0

@jandubois
Copy link

When the container is just started, it's mapped port is not available immediately.

This is expected, and due to the way port forwarding in Lima works: the guest agent running inside the VM checks every 5 seconds for open ports. It sends any changes since the previous check to the host agent (running on the host), which will in turn create a tunnel for each new port, and tear down the tunnel for each port that has been closed.

So you should expect to wait on average about 2½s before the port becomes available on the host, and potentially up to 5s (or more if the machine is busy).

To avoid the delay you would need to use an additional vmnet port, via vde_vmnet. I don't know if Colima exposes support for this, but Lima can manage the additional daemons automatically. It does require root to set up initially.

@abiosoft
Copy link
Owner

To avoid the delay you would need to use an additional vmnet port, via vde_vmnet. I don't know if Colima exposes support for this, but Lima can manage the additional daemons automatically. It does require root to set up initially.

This is something to consider for future releases

@abiosoft abiosoft added the enhancement New feature or request label Nov 30, 2021
@abiosoft
Copy link
Owner

The plan is to assign reachable IP addresses to the VM so they can be accessed directly in scenarios like this.
It is currently doable with lima override and network config but it is not user friendly.

JeanHuguesdeRaigniac added a commit to JeanHuguesdeRaigniac/persistent that referenced this issue Jan 14, 2022
@a-inacio
Copy link

a-inacio commented Feb 23, 2022

On my case (MacOS), due to dropping docker desktop I realised that my /var/run/docker.sock was still targeting the old file (this would then be making other stuff to fail).

Solved it by removing the old symlink and then sudo ln -s /Users/$USER/.colima/docker.sock /var/run/docker.sock.

@abiosoft
Copy link
Owner

Kindly install the current development version with brew install --HEAD colima.
You can retrieve the IP address via colima ls, the VM should always be reachable via the IP address.

@cristianrgreco
Copy link

I am running into this on a Linux machine, where although the port is bound, I'm unable to connect right away.

@divStar
Copy link

divStar commented May 8, 2023

I'm afraid I am also experiencing this issue - with a postgres image (11-alpine). It seems flaky though and I am not sure where to even address this, but the issues only started occurring after installing colima (instead of Docker Desktop).

Ryuk starts and the database seems to start, too, but then sometimes it'll go through with the tests and sometimes I receive:

2023-05-09 00:16:58.113  INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.925 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-05-09 00:16:58.113  INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.925 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-05-09 00:16:58.113  INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.935 UTC [51] LOG:  database system was shut down at 2023-05-08 22:16:57 UTC
2023-05-09 00:16:58.113  INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.939 UTC [1] LOG:  database system is ready to accept connections
2023-05-09 00:16:58.157  INFO 25373 --- [           main] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2023-05-09 00:16:58.193  INFO 25373 --- [           main] org.hibernate.Version                    : HHH000412: Hibernate ORM core version 5.4.32.Final
2023-05-09 00:16:58.247  INFO 25373 --- [           main] o.hibernate.annotations.common.Version   : HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
2023-05-09 00:16:58.295  WARN 25373 --- [           main] o.h.e.j.e.i.JdbcEnvironmentInitiator     : HHH000342: Could not obtain connection to query metadata
java.net.ConnectException: Connection refused
	at org.postgresql.core.PGStream.createSocket(PGStream.java:241) [6 skipped]
	at org.postgresql.core.PGStream.<init>(PGStream.java:98)
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:109)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:235)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
	at org.postgresql.Driver.makeConnection(Driver.java:402)
	at org.postgresql.Driver.connect(Driver.java:261)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:229)
	... 27 common frames omitted
Wrapped by: org.postgresql.util.PSQLException: Connection to localhost:49160 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

I am not sure this is a colima issue and I tried it with various configurations. Sometimes deleting the image and pulling it anew helps, sometimes it does not.

Does anyone here perhaps have an idea?

@seregamorph
Copy link
Author

seregamorph commented May 9, 2023

@divStar you can add a retry get connection logic before using the DataSource wrapping the postgres container. That's my code:

/**
 * The colima docker environment has a bit another behavior when bootstrapping container. It is possible to fail to make a connection right after
 * PostgreSQLContainer.getMappedPort(). Most probably it's an issue of Lima/Colima that should be addressed, but right now the workaround suggested: retry few times with interval.
 * <p>
 * https://github.com/abiosoft/colima/issues/71
 */
public final class DockerDataSourceUtils {

    private static final int MAX_RETRY_CONNECTION = 4;

    public static void checkGetConnectionWithRetry(DataSource dataSource) throws IllegalStateException {
        try (Connection connection = dataSource.getConnection()) {
            // no sleep by default
            return;
        } catch (SQLException originalException) {
            for (int i = 1; i <= MAX_RETRY_CONNECTION; i++) {
                try {
                    Thread.sleep(500L * i);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    throw new IllegalStateException("Interrupted", e);
                }
                try (Connection connection = dataSource.getConnection()) {
                    return;
                } catch (SQLException skip) {
                    continue;
                }
            }
            throw new IllegalStateException(originalException);
        }
    }

@abiosoft
Copy link
Owner

abiosoft commented May 9, 2023

It is due to the way port forwarding works. There is a chance of up to 3s delay before a port is forwarded from the container to your localhost.

An option for a scenario like this would be to enable IP address for the VM and you can simply use the VM IP to access it. However, it is currently erratic on macOS ventura and would not be a recommended alternative.

@Zordid
Copy link

Zordid commented Mar 8, 2024

Kindly install the current development version with brew install --HEAD colima. You can retrieve the IP address via colima ls, the VM should always be reachable via the IP address.

I am currently also experiencing the issue that colima is running fine, I can start containers using docker-compose (A Kafka cluster) - but from the outside, I am unable to reach the forwarded ports. It's just connection refused. Then I stumbled on this comment and wanted to see my colima VM address, but there is none:

❯ colima ls
PROFILE    STATUS     ARCH      CPUS    MEMORY    DISK     RUNTIME    ADDRESS
default    Running    x86_64    4       4GiB      20GiB    docker

What does that mean?

My real problem I try to get solved though is this:

❯ docker port b4cf984adcf0
2181/tcp -> 0.0.0.0:22181
2181/tcp -> [::]:22181
❯ nc -zv localhost 22181
nc: connectx to localhost port 22181 (tcp) failed: Connection refused
nc: connectx to localhost port 22181 (tcp) failed: Connection refused

Any help is appreciated!

@rfay
Copy link
Contributor

rfay commented Mar 8, 2024

We see this intermittently in about half of DDEV's automated test runs on Colima. It's a persistent problem.

@jithinlal
Copy link

why is this closed?

@KholkinDmitrii
Copy link

The problem is still here in .NET with TestContainers. Half of the starts fail.

@abiosoft
Copy link
Owner

There needs to be a proper monitoring of Docker containers and dynamic port forwarding handled by Colima itself.
Rather than falling back to Lima's which can have a delay of up to 3 seconds.

@abiosoft abiosoft reopened this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants