Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper way to handle closed connections? #467

Closed
jared-mackey opened this issue Jun 3, 2021 · 14 comments
Closed

Proper way to handle closed connections? #467

jared-mackey opened this issue Jun 3, 2021 · 14 comments
Labels
mint Issues related to mint adapter

Comments

@jared-mackey
Copy link

jared-mackey commented Jun 3, 2021

Hi, first thanks for the awesome library. Love this project and all the work put into it to make it great. Thank you so much. (Ditto for Mint and Finch 😁)

Ok, onto the meat of the issue. When using the finch adapter we are seeing mint socket closed errors and was wondering the best way to handle these. The issue is two part though:

  1. There was previously a bug in mint that didn't handle closed connections after a successful requests (reportedly fixed in 1.2.1).
  2. There is still cases where we still get occasional closed errors from mint. I'm guessing they really are closed and the request was not successful but haven't personally validated that is the case as I haven't tried to reproduce against a sever I can introspect.

Regardless of the cause, would a retry middleware be the best approach for handling these in general? Or would trying to handle this in either Tesla, Finch or Mint be best? I would likely lean towards a retry middleware for a quick fix, but is that the most ideal solution long term?

I would be happy to help dive into the appropriate place to resolve this if it is determined it should be handled in one of these libs.

Thanks!

/cc @keathley -- might be relevant to you as well for Finch. 😁

@keathley
Copy link
Contributor

keathley commented Jun 3, 2021

@jared-mackey A couple of questions:

  • Quick check that you have the latest versions of Mint and Finch installed.
  • Do these connection closed errors occur on a regular cadence? Or are they infrequent?
  • Are you using http/1.1 connection pools?
  • Do you have a maximum idle time set on your connection pool? If this is not set to a value lower than any potential idle times that the server has then this can lead to race conditions and connection closed errors.

@teamon I could be wrong but it seems like this probably isn't related to Tesla so if you'd rather close this discussion then we can transfer it to the finch repo.

@jared-mackey
Copy link
Author

@keathley

  • We have hit this in several scenarios. The one I am on currently is a bit dated (1.2.0 for mint) so I'll update and do some more testing. However, I know others were on 1.2.1 and hit this but I can't verify for your other questions.
  • These are pretty frequent. It seems like a decent percent (>10%).
  • Just the default Finch configuration.
  • We do not

I'll update the versions for the one I am able to verify on and report back.

@jared-mackey
Copy link
Author

@keathley I have updated to the latest versions from hex for Mint, Finch, Tesla and Nimblepool and the issue has persisted. I can't seem to reproduce it locally but the gist of what we are doing is this: Inside a Phoenix request we start a task that makes an API call over to the have I been pwned API. We do not await the task inside the Phoenix request. (We are not linking the task, so it should carry on in the background).

@jared-mackey
Copy link
Author

After seeing some trace data on these it is clear the request is not being sent based on the duration of the trace being sub millisecond. So I don't think it is the same as the issue in mint previously reported. To work around this, I am going to add a retry middleware to our Telsa pipeline.

@teamon teamon added the mint Issues related to mint adapter label Jul 26, 2021
@teamon
Copy link
Member

teamon commented Dec 17, 2021

Closing this one, do let me know if there is anything actionable on tesla side.

@teamon teamon closed this as completed Dec 17, 2021
@atomkirk
Copy link

I'm seeing a case where socket closed is returned but the request IS sent, so my retry middleware is causing the request to happen multiple times.

@Billzabob
Copy link

We are also seeing these "socket closed" errors. I can't reproduce them locally very reliably but they happen frequently in production for us. Interestingly, they seem to happen more often on clients that have relatively low traffic. It's enough of an issue that we will probably have to move away from Finch until it gets fixed.

@jared-mackey
Copy link
Author

@Billzabob I was able to work around this via a retry middleware. It’s not ideal, but did solve the problem for me.

  plug Tesla.Middleware.Retry,
    delay: 10,
    max_retries: 2,
    should_retry: &should_retry?/1

and

  defp should_retry?({:ok, %{status: status}}) when status in 400..599, do: true
  defp should_retry?({:ok, _}), do: false
  defp should_retry?({:error, _}), do: true

Customize to your specific needs, of course. 😁

@Billzabob
Copy link

We'll give that a try. Any chance that setting the pool_max_idle_time to something like a minute or two could fix this issue? Just guessing since it seems to happen mostly when there hasn't been activity in a while for the client.

@salex89
Copy link

salex89 commented Mar 22, 2024

We'll give that a try. Any chance that setting the pool_max_idle_time to something like a minute or two could fix this issue? Just guessing since it seems to happen mostly when there hasn't been activity in a while for the client.

@Billzabob Hi mate, just checking in, did you find a solution for this issue? We're seeing pretty much the same thing as you are, also only in production servers. I was thinking what might be the route cause, can you tell me what operating system are you using, Erlang/Elixir version etc.? We're on Alma Linux, Elixir 1.15.7, Erlang 26.1.2.

What I concluded that it's the Tesla Finch adapter doing this kind of funky translation of the %Mint.TransportError{} to text, returning {:error, "socket closed"}. But the root cause still eludes me, it looks like connections get closed while they're in the pool and Finch is not aware of that.

@Billzabob
Copy link

@salex89
This was the change we made and we haven't seen the issue since:

  # In our application.ex

  pool_opts = %{default: [pool_max_idle_time: 60_000]}

  children = [
    # ...
    {Finch, name: PlaidClient, pools: pool_opts},
   # ...
  ]

It still feels like there's an underlying bug but this workaround works pretty well.

@salex89
Copy link

salex89 commented Mar 22, 2024

@salex89 This was the change we made and we haven't seen the issue since:

  # In our application.ex

  pool_opts = %{default: [pool_max_idle_time: 60_000]}

  children = [
    # ...
    {Finch, name: PlaidClient, pools: pool_opts},
   # ...
  ]

It still feels like there's an underlying bug but this workaround works pretty well.

@Billzabob Interesting, thanks. We'll try that. Did you, perhaps, try with configurations like conn_max_idle_time or max_idle_time (their description is identical)? The pool_max_idle_time basically purges the pool, if I'm not mistaken. These might clean individual "faulty" sockets.

@atomkirk
Copy link

those pool_opts seem to be working great for me. Great find! Agreed seems like an underlying bug.

@djbberko
Copy link

I am also running into these errors. Polling an endpoint every X seconds and then randomly, socket: closed. These pool_opts (and various other opts) are not getting rid of the issue. Might be reducing it, hard to tell. I have also implemented a basic retry mechanism, which drastically reduces the fail rate, but still some get through.

Do we have any idea what the root cause of the Mint error is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mint Issues related to mint adapter
Projects
None yet
Development

No branches or pull requests

7 participants