-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load being distributed to missing workers too Version 2.7.4.dev14 #2008
Comments
Hi! @mboutet is the person most with most insight into user distribution, but I think he'll need more to go on. Perhaps you can write a failing unit test that shows your issue? Look here for inspiration, and run tox or pytest to execute it. https://github.com/locustio/locust/blob/master/locust/test/test_dispatch.py (my opinion has always been that it is ok for locust to fail miserably if workers go missing in the middle of a test, because IMO that test is already invalid :) |
@radhakrishnaakamat, I also used Locust in k8s at the previous company I worked for. One of the thing I did to prevent restarted pods from going missing forever and reappearing as another worker was to run the workers as a stateful set instead of a deployment. Doing so made Locust less flaky because the worker would reconnect with the same ID. Also, there's already logic in the master runner that removes missing workers from the dispatcher as you can see here: Lines 849 to 856 in 440c612
Unless there's a bug in this logic (which is completely possible), the logic proposed in #2010 might be redundant. Am I missing something? |
Closed this issue as it is linked with 2010. |
Did you want to link the issue with the PR? If so, then you should simply add "Fixes #<the_issue_number>" to the PR's description. |
Though this is removed from the dispatcher, it is getting used in distribution. So did some debugging with print statements ( any resource on how to debug in distributed setup locally will help a lot, as I am unable to attach it to debugger due to gevent) and found that the redistribution didn't happen. only after adding this logic it started working. I am trying to optimize my changes to remove the worker from runner env too. so that it is also consistant in ui. Also I feel that running as deployment saves resources( not significantly in this case) over stateful set. |
Thank you, I was trying the same. |
If you're using PyCharm, you need to enable gevent compatibility in the settings. For debugging the distributed scenario, your best bet is to write integration tests such as the ones in: locust/locust/test/test_runners.py Line 685 in 440c612
I don't think you're fixing the issue in the right place. The Try to design integration tests such as the ones in locust/locust/test/test_dispatch.py Line 2220 in 440c612
that will reproduce the issue. That being said, I experienced various issues like you have when running in k8s at my last job. I made a few "improvements" on my fork at master...mboutet:mboutet-master (I did not update the tests so they are failing at the moment). The main changes are:
Obviously, these changes are highly experimental, but I found that they made Locust a lot more stable in a k8s environment with a lot of dynamic events such as restarting pods and auto-scaling workers. Perhaps you could try my fork in your environment? |
Thank you.. This is really helpful.. |
I couldn't find information on this environment variable, is it still supported? Currently, I am facing the issue that occasionally a Locust Worker loses connection to the Master and forever continues executing requests as an orphan. |
Disconnected workers should probably stop themselves after a while (maybe 60s?) but that is a different issue. |
When running Locust in a distributed mode in Kubernetes, if the pod restarts then the worker status gets changed to missing. I can see that spawning distributes the users equally to all workers, but in this case it distributes to missing workers as well which leads to reduced spwaning rate.
Kindly help!!
I have made some changes by adding Kubernetes support natively, have shared the details in slack channel. kindly suggest on how we can take this further.
The text was updated successfully, but these errors were encountered: