-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the loading to each worker are slightly different for the multi process worker feature #3346
Comments
@chikinchoi Can you set the following environmental variable and $ export SERVERENGINE_USE_SOCKET_REUSEPORT=1
$ fluentd -c your-config.conf Here is some background note:
This is actually a common issue among server products on Linux. https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/ The core problem is that Fluentd itself has no load-balancing mechanism. This model works poorly on Linux, because Linux often wakes the busiest The This is experimental and not well documented, but it's worth a try if the above |
Hi @fujimotos , Thank you for your quick reply!
|
@chibicode Right. The uneven worker load is an open issue on Linux. One proposed solution is
In your use case, I think the best point to set the env is Here is an example: #!/bin/bash
export SERVERENGINE_USE_SOCKET_REUSEPORT=1
fluentd -c /fluentd/etc/fluent.conf |
Hi @fujimotos , Thank you for your replying.
for the uneven worker load issue, I read the fluentd document and saw that there is a "worker N-M directive". may I know what is the purpose of the "worker N-M" if the uneven worker load issue is expected behavior? |
@fujimotos ,
|
@chikinchoi I think a small difference is expected. You originally reported that the space usage
So worker1 was obviously overworking. On the other hand,
so I consider this as a progress, better than 0% vs 98% usage. |
@fujimotos I found worker0 is 71% and worker1 is 0% today, seems it is still a progress, but do you think there is any way to make it better? |
@chikinchoi As far as I know, there is no other option that can improve Edit: There is a fix being proposed in the Linux kernel level. So I believe |
Thanks for the resolution, I have tried using it but after making the change in the "export SERVERENGINE_USE_SOCKET_REUSEPORT=1", the other workers (I am using 6 worker node in my configuration) started utilizing CPU for a very short period of time, ~2 minutes and after that everything reverted back as earlier. Also I am sending the logs to NewRelic using Fluentd, and for most of the server/cluster it is working fine but for few of them it is showing lags from 2 hours and goes even beyond 48 hours. Suprisingly the logs for one of the namespace I have in my K8s cluster streaming live in the NewRelic however for one of the namespace I am facing this issue. I have tried using directive as well as the solution provided above that reduced the latency from hours to somewhat close to 10-15 minutes but I am still not getting the logs without lag. Any troubleshooting step would be appreciated. |
Im facing with the same problem, any other solution aditional to SERVERENGINE_USE_SOCKET_REUSEPORT ? |
So, the load is unbalanced even if setting |
@jvs87 Thanks! |
Thanks. I see... |
Yes, I'm a little blinded and dont know if the problem is related to multi process or in the other hand to bad use of buffer. |
Hi. Do you need any other test? |
Describe the bug
The "multi process workers" feature is not working. I have defined 2 workers in the system directive of the fluentd config. However, when I use the Grafana to check the performance of the fluentd, the fluentd_output_status_buffer_available_space_ratio metrics of each worker are slightly different. For example, worker0 is 98% and worker1 is 0%.
To Reproduce
To Reproduce, please use the below fluentd config:
Expected behavior
I expect that the fluentd_output_status_buffer_available_space_ratio should be evenly as the distribution of the loading to each workers should be evenly too.
Your Environment
The text was updated successfully, but these errors were encountered: