Proper way to handle closed connections? #467

jared-mackey · 2021-06-03T03:37:41Z

Hi, first thanks for the awesome library. Love this project and all the work put into it to make it great. Thank you so much. (Ditto for Mint and Finch 😁)

Ok, onto the meat of the issue. When using the finch adapter we are seeing mint socket closed errors and was wondering the best way to handle these. The issue is two part though:

There was previously a bug in mint that didn't handle closed connections after a successful requests (reportedly fixed in 1.2.1).
There is still cases where we still get occasional closed errors from mint. I'm guessing they really are closed and the request was not successful but haven't personally validated that is the case as I haven't tried to reproduce against a sever I can introspect.

Regardless of the cause, would a retry middleware be the best approach for handling these in general? Or would trying to handle this in either Tesla, Finch or Mint be best? I would likely lean towards a retry middleware for a quick fix, but is that the most ideal solution long term?

I would be happy to help dive into the appropriate place to resolve this if it is determined it should be handled in one of these libs.

Thanks!

/cc @keathley -- might be relevant to you as well for Finch. 😁

keathley · 2021-06-03T14:10:38Z

@jared-mackey A couple of questions:

Quick check that you have the latest versions of Mint and Finch installed.
Do these connection closed errors occur on a regular cadence? Or are they infrequent?
Are you using http/1.1 connection pools?
Do you have a maximum idle time set on your connection pool? If this is not set to a value lower than any potential idle times that the server has then this can lead to race conditions and connection closed errors.

@teamon I could be wrong but it seems like this probably isn't related to Tesla so if you'd rather close this discussion then we can transfer it to the finch repo.

jared-mackey · 2021-06-03T21:39:02Z

@keathley

We have hit this in several scenarios. The one I am on currently is a bit dated (1.2.0 for mint) so I'll update and do some more testing. However, I know others were on 1.2.1 and hit this but I can't verify for your other questions.
These are pretty frequent. It seems like a decent percent (>10%).
Just the default Finch configuration.
We do not

I'll update the versions for the one I am able to verify on and report back.

jared-mackey · 2021-06-03T23:04:40Z

@keathley I have updated to the latest versions from hex for Mint, Finch, Tesla and Nimblepool and the issue has persisted. I can't seem to reproduce it locally but the gist of what we are doing is this: Inside a Phoenix request we start a task that makes an API call over to the have I been pwned API. We do not await the task inside the Phoenix request. (We are not linking the task, so it should carry on in the background).

jared-mackey · 2021-06-04T16:01:13Z

After seeing some trace data on these it is clear the request is not being sent based on the duration of the trace being sub millisecond. So I don't think it is the same as the issue in mint previously reported. To work around this, I am going to add a retry middleware to our Telsa pipeline.

teamon · 2021-12-17T14:23:24Z

Closing this one, do let me know if there is anything actionable on tesla side.

atomkirk · 2023-10-31T13:56:36Z

I'm seeing a case where socket closed is returned but the request IS sent, so my retry middleware is causing the request to happen multiple times.

Billzabob · 2023-12-12T17:29:24Z

We are also seeing these "socket closed" errors. I can't reproduce them locally very reliably but they happen frequently in production for us. Interestingly, they seem to happen more often on clients that have relatively low traffic. It's enough of an issue that we will probably have to move away from Finch until it gets fixed.

jared-mackey · 2023-12-12T17:33:21Z

@Billzabob I was able to work around this via a retry middleware. It’s not ideal, but did solve the problem for me.

  plug Tesla.Middleware.Retry,
    delay: 10,
    max_retries: 2,
    should_retry: &should_retry?/1

and

  defp should_retry?({:ok, %{status: status}}) when status in 400..599, do: true
  defp should_retry?({:ok, _}), do: false
  defp should_retry?({:error, _}), do: true

Customize to your specific needs, of course. 😁

Billzabob · 2023-12-12T17:46:11Z

We'll give that a try. Any chance that setting the pool_max_idle_time to something like a minute or two could fix this issue? Just guessing since it seems to happen mostly when there hasn't been activity in a while for the client.

salex89 · 2024-03-22T15:52:36Z

We'll give that a try. Any chance that setting the pool_max_idle_time to something like a minute or two could fix this issue? Just guessing since it seems to happen mostly when there hasn't been activity in a while for the client.

@Billzabob Hi mate, just checking in, did you find a solution for this issue? We're seeing pretty much the same thing as you are, also only in production servers. I was thinking what might be the route cause, can you tell me what operating system are you using, Erlang/Elixir version etc.? We're on Alma Linux, Elixir 1.15.7, Erlang 26.1.2.

What I concluded that it's the Tesla Finch adapter doing this kind of funky translation of the %Mint.TransportError{} to text, returning {:error, "socket closed"}. But the root cause still eludes me, it looks like connections get closed while they're in the pool and Finch is not aware of that.

Billzabob · 2024-03-22T16:43:39Z

@salex89
This was the change we made and we haven't seen the issue since:

  # In our application.ex

  pool_opts = %{default: [pool_max_idle_time: 60_000]}

  children = [
    # ...
    {Finch, name: PlaidClient, pools: pool_opts},
   # ...
  ]

It still feels like there's an underlying bug but this workaround works pretty well.

salex89 · 2024-03-22T17:50:29Z

@salex89 This was the change we made and we haven't seen the issue since:
  # In our application.ex

  pool_opts = %{default: [pool_max_idle_time: 60_000]}

  children = [
    # ...
    {Finch, name: PlaidClient, pools: pool_opts},
   # ...
  ]
It still feels like there's an underlying bug but this workaround works pretty well.

@Billzabob Interesting, thanks. We'll try that. Did you, perhaps, try with configurations like conn_max_idle_time or max_idle_time (their description is identical)? The pool_max_idle_time basically purges the pool, if I'm not mistaken. These might clean individual "faulty" sockets.

atomkirk · 2024-03-28T13:52:59Z

those pool_opts seem to be working great for me. Great find! Agreed seems like an underlying bug.

djbberko · 2024-04-30T01:06:01Z

I am also running into these errors. Polling an endpoint every X seconds and then randomly, socket: closed. These pool_opts (and various other opts) are not getting rid of the issue. Might be reducing it, hard to tell. I have also implemented a basic retry mechanism, which drastically reduces the fail rate, but still some get through.

Do we have any idea what the root cause of the Mint error is?

teamon added the mint Issues related to mint adapter label Jul 26, 2021

teamon closed this as completed Dec 17, 2021

salex89 mentioned this issue May 14, 2024

Connection closed when request is attempted sneako/finch#272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper way to handle closed connections? #467

Proper way to handle closed connections? #467

jared-mackey commented Jun 3, 2021 •

edited

Loading

keathley commented Jun 3, 2021

jared-mackey commented Jun 3, 2021

jared-mackey commented Jun 3, 2021

jared-mackey commented Jun 4, 2021

teamon commented Dec 17, 2021

atomkirk commented Oct 31, 2023

Billzabob commented Dec 12, 2023

jared-mackey commented Dec 12, 2023

Billzabob commented Dec 12, 2023

salex89 commented Mar 22, 2024

Billzabob commented Mar 22, 2024

salex89 commented Mar 22, 2024

atomkirk commented Mar 28, 2024

djbberko commented Apr 30, 2024

Proper way to handle closed connections? #467

Proper way to handle closed connections? #467

Comments

jared-mackey commented Jun 3, 2021 • edited Loading

keathley commented Jun 3, 2021

jared-mackey commented Jun 3, 2021

jared-mackey commented Jun 3, 2021

jared-mackey commented Jun 4, 2021

teamon commented Dec 17, 2021

atomkirk commented Oct 31, 2023

Billzabob commented Dec 12, 2023

jared-mackey commented Dec 12, 2023

Billzabob commented Dec 12, 2023

salex89 commented Mar 22, 2024

Billzabob commented Mar 22, 2024

salex89 commented Mar 22, 2024

atomkirk commented Mar 28, 2024

djbberko commented Apr 30, 2024

jared-mackey commented Jun 3, 2021 •

edited

Loading