Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes Net::OpenTimeout / EOFError #29

Open
ledermann opened this issue Dec 31, 2023 · 10 comments
Open

Sometimes Net::OpenTimeout / EOFError #29

ledermann opened this issue Dec 31, 2023 · 10 comments

Comments

@ledermann
Copy link

Since yesterday (2023-12-30 20:49:00 UTC+1) I get this in all my apps using RorVsWild:

ERROR -- : /usr/local/lib/ruby/3.3.0/net/protocol.rb:43:in `ssl_socket_connect': Net::OpenTimeout from /usr/local/lib/ruby/3.3.0/net/http.rb:1674:in `connect'
from /usr/local/lib/ruby/3.3.0/net/http.rb:2388:in `begin_transport'
from /usr/local/lib/ruby/3.3.0/net/http.rb:2332:in `transport_request'
from /usr/local/lib/ruby/3.3.0/net/http.rb:2306:in `request'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:24:in `block in request_with_rorvswild'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/agent.rb:65:in `measure_section'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:23:in `request_with_rorvswild'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:17:in `request'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:67:in `transmit'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:40:in `post'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:71:in `flush'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:62:in `flush_indefinetely'
from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:76:in `block in start_thread'

Since I don't know if you have a status page, I'll report it here.

Happy new year ;-)

@alexisbernard
Copy link
Member

alexisbernard commented Jan 1, 2024

I have no idea. During this period we received the same volume of requests as usual. Our load average and response time are as usual. I checked the open source app and it received at least a request every minute (https://www.rorvswild.com/applications/136101/requests?range=2023-12-30T19:20:00Z..2023-12-30T20:20:00Z). The throughput is even a little bit higher than the daily average.

The timeout is set to 3 seconds. Maybe it's too short when they are perturbations on the network.

Thanks, happy new year !

ledermann added a commit to solectrus/solectrus that referenced this issue Jan 2, 2024
ledermann added a commit to templatus/templatus-hotwire that referenced this issue Jan 2, 2024
ledermann added a commit to templatus/templatus-vue that referenced this issue Jan 2, 2024
ledermann added a commit to templatus/templatus-inertia that referenced this issue Jan 2, 2024
@ledermann
Copy link
Author

@alexisbernard Strange, I continue to see this in the logs of all my applications. I'm using dockerized apps, running on Hetzner Cloud Shared VMs, located in Nuremberg, Germany. Low traffic, I can't see any bottlenecks on my side.

I just increased the timeout to 10 seconds and will see if that solves it.

@ledermann
Copy link
Author

After the timeout was increased to 10 seconds, the error no longer occurred. Nice, but now I see this EOFError a few times in the logs:

 E, [2024-01-03T08:31:03.278931 #1] ERROR -- : /usr/local/bundle/gems/net-protocol-0.2.2/lib/net/protocol.rb:237:in `rbuf_fill': end of file reached (EOFError) 
 from /usr/local/bundle/gems/net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil'
 from /usr/local/bundle/gems/net-protocol-0.2.2/lib/net/protocol.rb:209:in `readline'
 from /usr/local/lib/ruby/3.2.0/net/http/response.rb:158:in `read_status_line'
 from /usr/local/lib/ruby/3.2.0/net/http/response.rb:147:in `read_new'
 from /usr/local/lib/ruby/3.2.0/net/http.rb:1862:in `block in transport_request'
 from /usr/local/lib/ruby/3.2.0/net/http.rb:1853:in `catch'
 from /usr/local/lib/ruby/3.2.0/net/http.rb:1853:in `transport_request'
 from /usr/local/lib/ruby/3.2.0/net/http.rb:1826:in `request'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:24:in `block in request_with_rorvswild'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/agent.rb:65:in `measure_section'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:23:in `request_with_rorvswild'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:17:in `request'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:67:in `transmit'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:40:in `post'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:71:in `flush'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:62:in `flush_indefinetely'
 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:76:in `block in start_thread'

@ledermann ledermann changed the title Net::OpenTimeout since 2023-12-30 Sometimes Net::OpenTimeout / EOFError Jan 5, 2024
@alexisbernard
Copy link
Member

Sorry for the delay. I suspect that the timeout has been triggered from the server, but the latency between our servers is very constant :

$ ping solectrus.ledermann.dev
[...]
--- solectrus.ledermann.dev ping statistics ---
337 packets transmitted, 337 received, 0% packet loss, time 336452ms
rtt min/avg/max/mdev = 13.963/14.045/14.293/0.043 ms

Do you have an idea of the frequency of EOFError ?

@ledermann
Copy link
Author

The EOFError rarely occurs for me, about 1-2 times a day in most (but not all) apps that use RorVsWild. It happens with Ruby 3.2.2 and Ruby 3.3.0 - others not tested.

Further observation: In a low-traffic app, the error can occur even when there have been no requests for several minutes. I do not understand why the NetHttp plugin was activated at all at this time.

By the way, the exception does not appear in RorVsWild's exception list and also not in HoneyBadger, which I usually have installed in parallel for bug tracking.

@alexisbernard
Copy link
Member

Thanks for those details.

Further observation: In a low-traffic app, the error can occur even when there have been no requests for several minutes. I do not understand why the NetHttp plugin was activated at all at this time.

Because server metrics are sent each minute.

By the way, the exception does not appear in RorVsWild's exception list and also not in HoneyBadger, which I usually have installed in parallel for bug tracking.

Because data are sent in a separate thread, and the error is caught directly (https://github.com/BaseSecrete/rorvswild/blob/master/lib/rorvswild/client.rb#L71).

@ledermann
Copy link
Author

Because server metrics are sent each minute.

Hm, I don't think this explains the issue. Please take a look at this snippet from the logs with timestamps:

Jan 09 11:25:46 I, [2024-01-09T10:25:46.106549 #1] INFO -- : [3.120.15.28] [cd9199d2-df50-4aad-ab0c-a2dc39a24edb] method=GET path=/ format=*/* controller=VueController action=index status=200 allocations=1333 duration=5.06 view=4.51 db=0.00 user_agent=Honeybadger Uptime Check
Jan 09 11:30:03 E, [2024-01-09T10:30:03.653642 #1] ERROR -- : /usr/local/lib/ruby/3.3.0/net/protocol.rb:237:in `rbuf_fill': end of file reached (EOFError)
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/protocol.rb:199:in `readuntil'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/protocol.rb:209:in `readline'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http/response.rb:158:in `read_status_line'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http/response.rb:147:in `read_new'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http.rb:2342:in `block in transport_request'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http.rb:2333:in `catch'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http.rb:2333:in `transport_request'
Jan 09 11:30:03 from /usr/local/lib/ruby/3.3.0/net/http.rb:2306:in `request'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:24:in `block in request_with_rorvswild'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/agent.rb:65:in `measure_section'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:23:in `request_with_rorvswild'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/plugin/net_http.rb:17:in `request'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:67:in `transmit'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/client.rb:40:in `post'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:71:in `flush'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:62:in `flush_indefinetely'
Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:76:in `block in start_thread'

At 11:25 a request comes in. Five minutes (!) later, without further requests, the EOFError is raised. Do you really think this is caused by the server metrics running each minute? As I understand the source, the NetHttp plugin handles requests, not server metrics. This is the line which raises the EOFError:
https://github.com/BaseSecrete/rorvswild/blob/1.6.3/lib/rorvswild/plugin/net_http.rb#L24

@alexisbernard
Copy link
Member

Jan 09 11:30:03 from /usr/local/bundle/gems/rorvswild-1.6.3/lib/rorvswild/queue.rb:71:in `flush'

https://github.com/BaseSecrete/rorvswild/blob/master/lib/rorvswild/queue.rb#L71

Yes, it's the server metrics :-)

You can disable the NetHttp plugin and you should get the same error and same stack trace minus the 3 rorvswild/plugin/net_http.rb lines. I have to monkey patch Net::HTTP, so when RorVsWild sends data it goes through the plugin, but it does not measure anything because we are not in the context of a request or a job.

# config/rorvswild.yml
production:
  api_key: xxx
  ignore_plugins:
    - NetHttp

I should also add an option to disable server metrics.

I increased the open timeout as you and I got 3 times the EOFError on the 9th January. Compare to the volume it's extremely low percentage but that is still embarrassing to miss data. It's the same stack trace as you, except that it comes from lib/rorvswild/queue.rb:70 instead of lib/rorvswild/queue.rb:71. That means it's not related specifically to either server metrics or request data. It would be interesting if you could do the same to compare timestamps :

$ grep EOFError file.log
Jan  9 04:05:25 [...]
Jan  9 04:36:58 [...]
Jan  9 05:54:25 [...]

@ledermann
Copy link
Author

Thanks for clarification, I understand that the EOFError comes from sending server metrics.

Yes, disabling server metrics could be useful. In my case, I run multiple dockerized applications on the same host, so the server metrics are all the same.

Here is a list of timestamps (in UTC) where an EOFError was triggered yesterday in my applications:

2024-01-09 04:57:05	App 1
2024-01-09 05:38:02	App 1
2024-01-09 06:45:02	App 1
2024-01-09 10:30:03	App 2
2024-01-09 16:15:03	App 2
2024-01-09 19:41:01	App 3

I'm currently using RorVsWild in 11 applications (with timeout=10), all on the same Docker host. Yesterday, the EOFError was triggered in 3 of them.

@alexisbernard
Copy link
Member

Thanks for the grep. So it does not seem to happen at the same time. I'm currently running out of ideas for this issue. I will come back if I have something new to try.

I'm currently using RorVsWild in 11 applications (with timeout=10), all on the same Docker host. Yesterday, the EOFError was triggered in 3 of them.

Indeed, in your case, it sends at least 11 times the same server metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants