-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAProxy reload spawning new processes #218
Comments
I also had some troubles to keep the correct configuration with haproxy on all 3 masters nodes. Sometimes because I have to reboot the server (hard disk full) Sometimes because a port is already allocated by docker-proxy after a service crashed and the new services try to reuse that port. I found sometimes different haproxy_a.cfg/haproxy_b.cfg from a server to another but I didn't noticed these multiple haproxy_reload process. I have to restart hard reset all the 3 masters nodes and relaunch all the application. If you know how to patch this :) |
Hi,
Actually is was a known issue, but it was related with consul-template - we had to downgraded it to ver. 0.10 To avoid SOME issues mentioned by @kopax it is very important to detect flapping services in paas, Anyway, I can recommend to use fabio (which is also included in this image) instead - that has much less moving parts depended on external projects. And Fabio is supported by us. |
Fabio looks nice but it doesn't support TCP proxying. It's useful for most of services you can run. |
Is there any update on this? Also, is this project still being maintained? |
@tsyardley It is maintained, but as you see this project is rather an integration of few other projects, as I suggested, we recommended to use fabio instead of haproxy due to lot of SPoF that can happen. We are also using HAproxy in Prod, we did not have issue you have described, but we have restricted ourself to avoid flapping services - which is main reason of race any condition in HAproxy parallel reload. I can only suggest to you to use different load balancer instead, that we fully support - ebay fabio |
@tsyardley I was just thinking, can you provide me consul TAGs you use for application. |
@tsyardley can you provide something ? |
@sielaq apologies for not replying - I have been on Holiday 🌞 I will be able to provide you with the information tomorrow |
Hi @sielaq, In the ps listing given above, the strange thing is haproxy_reload.sh. In a normal system, haproxy goes into background releasing haproxy_reload.sh. I've noticed that on our system, where there are remaining haproxy_reload scripts, the child haproxy processes have stopped (state T). This can happen as a race condition when two haproxy processes launch at the the same time. During early processing of haproxy, it sends a SIGTTOU to the previous PIDs to pause the listeners while it attempts to start its own listeners. Only if that is successful does it then set up the signal handlers (ie quite late in the processing) and detach into background releasing haproxy_reload.sh. However, when two haproxies are starting at the same time a race condition can occur. When the first has not yet set up signal handlers to handle the SIGTTOU and SIGTTIN signals, the default behaviour for these signals is to pause the process, so the process slightly ahead sends a SIGTTOU to the process slightly behind, which then stops. Once stopped, it is stopped forever unless sent a SIGCONT. See this ps listing with more flags showing the stopped processes marked by state T:
We are exploring how to slow down consul-template so that it does not reload too quickly to hopefully avoid this race. Any advice on optimum values of the wait parameters to use? |
Which version of PanteraS you run ? seems like the issue we had already with consul-template few months ago. Nevertheless I will provide more robust reload script soon with new release, to no to reload when configuration is broken. |
Hi Sielaq, We are currently using v0.2.3 - your suggested change sounds good, thanks! |
just to pin |
0.3.0 has been released |
We are using panteras (0.2.3) and deploying services to marathon. After some time our services stop talking to one another using the haproxy ports - but can using the marathon ports.
After some digging around we've found that on node where this occurs there are also multiple haproxy reload processes - to fix this these have to be killed and consul-template haproxy needs to be restarted in supervisor.
Some evidence follows:
Is this a known issue?
The text was updated successfully, but these errors were encountered: